FORTH: BEGIN as compile-time and no run-time?!?!?!?!? Help

Stonelock

unread,

Aug 17, 2004, 3:18:07 PM8/17/04

to

Hello,

I have to say I'm a little bit confused about the way FORTH works
under the hood now. I am implementing my own forth and had a
completely different idea of how it was to be implanted, but after
reading "Leo Brodie's Starting FORTH" online, I see the way it was
implanted is very different than what I had envisioned.

Could somebody explain to me the benefits of having run-time and
compile-time behaviors? Is it useful at all to have compile-time
behavior? I implanted BEGIN as pure run-time, no compile-time, whereas
in Leo Brodie's Starting FORTH chapter 11, BEGIN is pure compile-time
and no run-time??

I simply don't get how a compile time flow control instruction can be
useful at all. How does it even work and do its job as a flow control
instruction in the first place if it has no run-time behavior????

My guess is I'm probably missing something big about the way words are
processed using the stack.

In chapter 11, BEGIN is defined as such

: BEGIN HERE ; IMMEDIATE

Personnally, This is how I thought : worked; I thought that ':' was
"scanning" and compiling (writting down the addresses of each word
into a new dictionary entry) all the words after itself in the input
stream and stopping at the "marker" ';' byte compiling an EXIT last in
the dictionary entry. I see this doesn't seem to be how it is done.

>From what I see of how it is done, : basically switches to COMPILE
mode and EACH word executes their on COMPILE-TIME code. Would I be
correct? How is this useful?

"Mind boggles"

I included here my inner interpreter ( 12 lines of assembly code
spread in 3 pretty easy to understand pieces) and a few assembly words
written using AT&T syntax (GAS assembler under linux); and also some
higher words like 2DUP so you see what's going on. The inner
interpreter basically reads the address of each word constituting a
word and executes their respective cfa in turn. For higher level words
EXECUTE_ASM is executed which results in a push of the current next
word to execute on the return stack and making the current word
pointer point to the first word to execute in the list of words of the
new higher level word it has to execute. Whenever assembly wrappers
are encountered, their assembly code is executed immediately and
control is passed on to the following word in the list through NEXT.
EXIT, as should, returns control to the next word in the list to the
last higher level word executed by poping the return stack.

# %ebp is the parameters stack
# %edi is the return stack ( for nesting )
# %esi is the next word to execute

#-------------------- Inner Interpreter
.macro NEXT
movl (%esi), %edx
addl $4, %esi
jmp *8(%edx)
.endm

# ( -- )
EXECUTE_ASM:
addl $4, %edi # We go on to the next 32 bit block for the
return stack
movl %esi, (%edi) # We put the current list pointer on the return
stack

movl %esi, %ebx
subl $4, %ebx # We correct the works of NEXT
movl (%ebx), %edx # We get the word to execute's address
addl $12, %edx # We want the first word in the list
movl %edx, %esi # This is our new next word to execute
NEXT # We execute that next word

# ( -- )
EXIT_ASM:
movl (%edi), %esi
subl $4, %edi
NEXT
#-------------------- END

# ( n -- ) ( PSTACK OP )
DROP_ASM:
subl $4, %ebp
NEXT

# ( n1 n2 -- n1 n2 n1 ) ( PSTACK OP )
OVER_ASM:
movl -4(%edi), %edx
addl $4, %edi
movl %edx, (%edi)
NEXT

# ( -- a ) ( FLOW CONTROL OP )
BEGIN_ASM:
movl %esi, %edx
subl $4, %edx
movl %edx, 4(%ebp)
addl $4, %ebp
NEXT

# ( -- ) EXIT_ASM Wrapper ( This is part of the interpreter )
EXIT:
.long 0x00000000 # Previous link
.long EXIT_NAME
.long EXIT_ASM

# ( n1 n2 -- n1 n2 n1 ) ( PSTACK OP ) OVER_ASM Wrapper
OVER:
.long 0x00000000
.long OVER_NAME
.long OVER_ASM

# ( n1 n2 -- n1 n2 n1 n2 ) ( PSTACK OP )
TWO_DUP:
.long 0x00000000
.long TWO_DUP_NAME
.long EXECUTE_ASM
.long OVER, OVER, EXIT

# ( n -- ) ( R: -- n ) ( RSTACK OP ) GREATER_R_ASM Wrapper
GREATER_R:
.long 0x00000000
.long GREATER_R_NAME
.long GREATER_R_ASM

# ( -- a ) ( FLOW CONTROL OP ) BEGIN_ASM Wrapper
BEGIN:
.long 0x00000000
.long BEGIN_NAME
.long BEGIN_ASM

# ( a -- ) ( R: -- a ) ( FLOW CONTROL OP )AGAIN:
.long 0x00000000
.long AGAIN_NAME
.long EXECUTE_ASM
.long GREATER_R, EXIT

Any help would be very much appreciated.

Crypto

Jeff

unread,

Aug 17, 2004, 4:26:09 PM8/17/04

to

You need to take a small step backwards. I had to do this, too, when
implementing my first Forth. I swear, Forth must be the only language
that people learn by creating their own :)

First off, let's start by stating that having compile-time execution of
code is a beautiful thing. Until you really start using a language that
has it (Forth, Lisp, etc) you won't realize this. For now, take it on
faith. Now, onto BEGIN.

* The stack is /always/ there. Even at compile time.
* BEGIN is nothing more than a marker. It doesn't do anything at
runtime.

I can see you scratching your head now. But bear with me. I'll assume
you know C, so look at this piece of code:

do { /* something */ } while(true);

What does "do" do at runtime? Nothing. It is simply a marker for
"while", which does the action: branch back to the marker ("do"). In a
C compiler, "do" will create a temporary label in the generated
assembly code, and the "while" will generate a branch statement back to
it.

In Forth, however, BEGIN does this without a temporary label. Instead,
all its information is stored on the stack. Let's consider it in
combination with AGAIN:

: BEGIN ( -- addr ) HERE ; IMMEDIATE
: AGAIN ( addr -- ) ['] (BRANCH) COMPILE, , ; IMMEDIATE

Now let's step through what happens when we hit these words inside a
definition:

: GO ( -- ) BEGIN AGAIN ;

As GO compiles, it sees BEGIN (an IMMEDIATE word) which executes and
places the current dictionary pointer (HERE) onto the stack. This is
our temporary label.

Then we see AGAIN (another IMMEDIATE word), which also executes. When
it executes, the first thing it does is compile the (BRANCH) word into
the current definition. It then compiles the address that is on the
stack (put there by BEGIN) into the current definition. This is like
the "while" in the C compiler that compiles the branch back to the
temporary label.

If you still aren't following, perhaps it may be that you don't
understand the definition of (BRANCH). Let's take a look at it:

\ assumes that ESI is the instruction pointer
CODE (BRANCH)
[ESI] ESI MOV,
END-CODE

When (BRANCH) executes (at run time), it will get the address that is
at the current instruction pointer (the one put there by BEGIN and
compiled into the definition by AGAIN). It will load that into the
instruction pointer -- effectively performing a branch.
Hope this helps!

Jeff

Elizabeth D Rather

unread,

Aug 17, 2004, 4:25:44 PM8/17/04

to

"Stonelock" <spam...@crayne.org> wrote in message
news:64f2b1e9.04081...@posting.google.com...

> Hello,
>
> I have to say I'm a little bit confused about the way FORTH works
> under the hood now. I am implementing my own forth and had a
> completely different idea of how it was to be implanted, but after
> reading "Leo Brodie's Starting FORTH" online, I see the way it was
> implanted is very different than what I had envisioned.

Starting Forth is an interesting book, but don't take it too seriously as
the only way Forth can be implemented. It's actually pretty dated. Many
modern Forths use optimizing compilers to generate machine code. If you're
interested in learning Forth, I'd really recommend getting a modern Forth
system and developing skills in programming Forth. Then look at how that
system is implemented, and maybe then try your own if you think you can
improve on it.

> Could somebody explain to me the benefits of having run-time and
> compile-time behaviors? Is it useful at all to have compile-time
> behavior? I implanted BEGIN as pure run-time, no compile-time, whereas
> in Leo Brodie's Starting FORTH chapter 11, BEGIN is pure compile-time
> and no run-time??
>
> I simply don't get how a compile time flow control instruction can be
> useful at all. How does it even work and do its job as a flow control
> instruction in the first place if it has no run-time behavior????

Basically, BEGIN is executed at compile time to push on the stack an address
to which a later "branch back" instruction (such as UNTIL, REPEAT, or AGAIN)
can branch. Those words expect the destination address on the stack when
they execute (at compile time) and generate a branch to it (either a
relative or absolute branch, depending on the implementation). It's better
to do this at compile time because it avoids cluttering up the runtime
stack, and is also faster (and probably smaller).

> My guess is I'm probably missing something big about the way words are
> processed using the stack.
>
> In chapter 11, BEGIN is defined as such
>
> : BEGIN HERE ; IMMEDIATE

IMMEDIATE identifies words to be executed at compile time. All the
structure words, plus some others, are commonly IMMEDIATE.

> Personnally, This is how I thought : worked; I thought that ':' was
> "scanning" and compiling (writting down the addresses of each word
> into a new dictionary entry) all the words after itself in the input
> stream and stopping at the "marker" ';' byte compiling an EXIT last in
> the dictionary entry. I see this doesn't seem to be how it is done.

Well, that's one way of doing it, and there are some implementations that
work that way, but they still must execute IMMEDIATE words instead of
compiling them.

> >From what I see of how it is done, : basically switches to COMPILE
> mode and EACH word executes their on COMPILE-TIME code. Would I be
> correct? How is this useful?

Not exactly. The text is being processed by a word commonly called
INTERPRET. It takes a word, looks it up in the dictionary, and in most
cases will either execute it or compile some reference to it depending on a
system variable called STATE. If STATE is non-zero, INTERPRET will compile
the reference, unless the word is IMMEDIATE, in which case it will execute
the word.

> I included here my inner interpreter ( 12 lines of assembly code
> spread in 3 pretty easy to understand pieces) and a few assembly words
> written using AT&T syntax (GAS assembler under linux); and also some
> higher words like 2DUP so you see what's going on.

Not at all familiar with that syntax, but it looks like in most cases more
instructions than necessary.

> The inner
> interpreter basically reads the address of each word constituting a
> word and executes their respective cfa in turn. For higher level words
> EXECUTE_ASM is executed which results in a push of the current next
> word to execute on the return stack and making the current word
> pointer point to the first word to execute in the list of words of the
> new higher level word it has to execute. Whenever assembly wrappers
> are encountered, their assembly code is executed immediately and
> control is passed on to the following word in the list through NEXT.
> EXIT, as should, returns control to the next word in the list to the
> last higher level word executed by poping the return stack.

Implementations I'm familiar with are somewhat simpler, but there are many
ways to implement Forth. Since you're clearly interested in this topic, I'd
really recommend that you study some existing implementations.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310-491-3356
5155 W. Rosecrans Ave. #1018 Fax: +1 310-978-9454
Hawthorne, CA 90250
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Julian V. Noble

unread,

Aug 17, 2004, 10:52:38 PM8/17/04

to

Stonelock wrote:
>
> Hello,
>
> I have to say I'm a little bit confused about the way FORTH works
> under the hood now. I am implementing my own forth and had a
> completely different idea of how it was to be implanted, but after
> reading "Leo Brodie's Starting FORTH" online, I see the way it was
> implanted is very different than what I had envisioned.
>
> Could somebody explain to me the benefits of having run-time and
> compile-time behaviors? Is it useful at all to have compile-time
> behavior? I implanted BEGIN as pure run-time, no compile-time, whereas
> in Leo Brodie's Starting FORTH chapter 11, BEGIN is pure compile-time
> and no run-time??

I'll take a pop at an explanation.

In Forth decision structures (such as IF or BEGIN) are very different
than in more conventional languages. In Forth the compiler is part of the
language, not a black box that you feed a source code into to get an
executable file. And the advantage of THAT is you get to use the components
of the compiler for your own purposes.

So in a conventional compiler, all the operators and decision structures
are simply tokens. When the compiler encounters IF or ELSE it has to install
appropriate code, looking ahead to resolve the addresses. If the compiler
encounters --say-- a + sign, it has to decide what is being added (that is,
which of the defined types: integers, reals, complex, doubles, or concatenated
strings, etc.) and install the appropriate code. This is why you have to assign
types to variable and function names before you can compile.

In order to understand the decision structures you have to know how
the Forth compiler works. In its simplest form, Forth reads the input stream
and executes whatever Forth subroutines it encounters. You can see a flow
chart describing this at

http://Galileo.phys.Virginia.EDU/classes/551.jvn.fall01/primer.htm

Look at the link "The structure of Forth". In particular, look at what Forth
does if the input is a number or something undefined.

As you doubtless know, the compiler has two modes or states, "compile" and
"execute", and is normally in the latter. Certain words can switch the mode.
For example, ] switches to compile mode and [ to execute mode. Thus, in a
simple indirect-threaded Forth without error checking, the definition of
colon can be as simple as

: : CREATE ] ;

Basically this says "Take the next piece of text in the input stream and
make a new dictionary entry under that name. Then switch to compile mode.

In compile mode, when the compiler encounters pieces of text, it looks them
up in the dictionary and records their "execution tokens" (they could be
the addresses of some code). It continues with this until the input stream is
exhausted. So the body of a word definition is essentially a list of the
execution tokens of its parts. Now you may ask, how does Forth get out of
the compile state and return to execution mode? The answer is that certain
words have been marked IMMEDIATE (that's all the word IMMEDIATE does--it
marks the latest dictionary definition, perhaps by turning on some bit
in its name). When the compiler encounters an IMMEDIATE word it executes
it, even if it is currently in compiling mode.

Thus the terminal semicolon ; is IMMEDIATE and is executed, not compiled.
What does semicolon do? It installs code for handling the addressing book-
keeping and then puts the system back into execute mode, usually with [ .

Thus, e.g., saying SEE ; gets you

: ;
;-hook ?struc POSTPONE EXIT reveal POSTPONE [ ; immediate compile-only ok

in GForth 0.49, and

: ; ?COMP ?:M FALSE TO ?:M ABORT" Methods must END in ;M !" ?CSP
REVEAL PARMS
IF COMPILE UNNESTP
ELSE COMPILE UNNEST
THEN [ 0 TO PARMS DO-;CHAIN ; IMMEDIATE ok

in Win32Forth 4.2 . In both Forths the word [ is prominent.

So a new Forth subroutine (word) is just a list of previously defined
words that get executed sequentially when the new word is executed, just
as if you had entered them from the console. Really incredibly simple.
My favorite example is

3 4 5 * + . 23 ok

: *+ * + ; ok

3 4 5 *+ . 23 ok

Having explained all this, now let me explain the decision structures.
In Forth the general rule is that things are executed rather than decided.
Thus IF is an IMMEDIATE subroutine (as is BEGIN) that is executed when a word
is being compiled. IF installs ?BRANCH (which gets executed whenever the word
containing it is executed) that branches to a specific address
whenever there is a 0 (logical FALSE) on the stack. If something else
is on the stack, ?BRANCH does nothing. Now how does ?BRANCH know where to
branch to? IF also places the current address HERE (in the body of the word
being defined) on the stack. That's all it does.

Now an IF has to be resolved by a THEN (some people prefer to call it ENDIF
or FI). What does THEN (also IMMEDIATE) do? It takes the address that IF left
on the stack, compares it with the current address (the current value of
HERE) and computes how many bytes forward to jump. This number is stored
in the appropriate place, back where IF put in ?BRANCH. How does THEN know
where to put that relative address? Obviously, a particular Forth will
know where the appropriate memory cell is, relative to the address that
IF put on the stack.

Basically BEGIN...UNTIL works the same way, except backwards. BEGIN just
puts the current address in the body of the new word on the stack. When
the compilation gets to UNTIL (also IMMEDIATE) it executes the latter.
UNTIL installs ?BRANCH (or something like it), and uses the address on the
stack to compute how far _backward_ to jump. It then stores this number in
the appropriate place. Once again there have been no decisions taking place,
only the execution of specific subroutines.

More complicated decision structures involving ELSE or WHILE...REPEAT are
just variations on the theme.

--
Julian V. Noble
Professor Emeritus of Physics
j...@lessspamformother.virginia.edu
^^^^^^^^^^^^^^^^^^
http://galileo.phys.virginia.edu/~jvn/

"For there was never yet philosopher that could endure the
toothache patiently."

-- Wm. Shakespeare, Much Ado about Nothing. Act v. Sc. 1.

Stonelock

unread,

Aug 18, 2004, 2:59:49 AM8/18/04

to

"Jeff" <j...@insightbb.com> wrote in message news:<cftpl1$q...@odah37.prod.google.com>...

> You need to take a small step backwards. I had to do this, too, when
> implementing my first Forth. I swear, Forth must be the only language
> that people learn by creating their own :)

FORTH is an amazing language in many ways, and i'm very happy to have
crossed its path :).

> First off, let's start by stating that having compile-time execution of
> code is a beautiful thing. Until you really start using a language that
> has it (Forth, Lisp, etc) you won't realize this. For now, take it on
> faith. Now, onto BEGIN.

*Blindly leaps forward*

> * The stack is /always/ there. Even at compile time.
> * BEGIN is nothing more than a marker. It doesn't do anything at
> runtime.
>
> I can see you scratching your head now. But bear with me. I'll assume
> you know C, so look at this piece of code:
>
> do { /* something */ } while(true);
>
> What does "do" do at runtime? Nothing. It is simply a marker for
> "while", which does the action: branch back to the marker ("do"). In a
> C compiler, "do" will create a temporary label in the generated
> assembly code, and the "while" will generate a branch statement back to
> it.

Ahhhhhhhhhhh generated assembly code! There goes something i'm not
doing ;). If we are talking about code generation in FORTH too then
this is not the way I was going about. If we are talking about
assembly code generation then I can envision compile-time behavior
without run-time behavior.

> In Forth, however, BEGIN does this without a temporary label. Instead,
> all its information is stored on the stack. Let's consider it in
> combination with AGAIN:
>
> : BEGIN ( -- addr ) HERE ; IMMEDIATE

This is also what i do for begin, but at run time ;).

> : AGAIN ( addr -- ) ['] (BRANCH) COMPILE, , ; IMMEDIATE
>
> Now let's step through what happens when we hit these words inside a
> definition:
>
> : GO ( -- ) BEGIN AGAIN ;
>
> As GO compiles, it sees BEGIN (an IMMEDIATE word) which executes and
> places the current dictionary pointer (HERE) onto the stack. This is
> our temporary label.
>
> Then we see AGAIN (another IMMEDIATE word), which also executes. When
> it executes, the first thing it does is compile the (BRANCH) word into
> the current definition. It then compiles the address that is on the
> stack (put there by BEGIN) into the current definition. This is like
> the "while" in the C compiler that compiles the branch back to the
> temporary label.
>
> If you still aren't following, perhaps it may be that you don't
> understand the definition of (BRANCH). Let's take a look at it:
>
> \ assumes that ESI is the instruction pointer
> CODE (BRANCH)
> [ESI] ESI MOV,
> END-CODE
>
> When (BRANCH) executes (at run time), it will get the address that is
> at the current instruction pointer (the one put there by BEGIN and
> compiled into the definition by AGAIN). It will load that into the
> instruction pointer -- effectively performing a branch.
> Hope this helps!

Well if there is assembly code generation involved this all helped to
clarify the idea. In this case, it depends how the dictionary entries
are "compiled". My compilation doesn't involve assembly code
generation at all.

My compiling only involves a list of dictionary word addresses to 'do
something with'. The cfa, in the header of each entry,

( pointer to previous word, pointer to name of word, cfa )

tells the inner interpreter if if is a higher level word or not. In
the case of assembly wrappers, as I call them, the assembly code is
directly executed and we go on to the next word. Look at the next bit
of list of addresses:

# ( n1 n2 -- n1 n2 n1 ) ( PSTACK OP ) OVER_ASM Wrapper
OVER:
.long 0x00000000
.long OVER_NAME
.long OVER_ASM

# ( n1 n2 -- n1 n2 n1 n2 ) ( PSTACK OP )
TWO_DUP:
.long 0x00000000
.long TWO_DUP_NAME
.long EXECUTE_ASM
.long OVER, OVER, EXIT

The inner interpreter reads the cfa; for example 2DUP and executes
EXECUTE_ASM which is assembly code. EXECUTE_ASM tells that this is a
higher forth word and requires nesting and so the next word in the
current list of words being executed is pushed on the return stack and
2DUP's first word word's address in the list of words to execute (
OVER ) becomes the new next word to execute.

The inner interpreter goes on to read the cfa of OVER and it points to
the assembly code OVER_ASM which is executed and giving control to the
next word in the list; we go on to the next word; OVER again; then
EXIT pops the return stack and we are back up one level.

I told myself, why not use this same process to basically duplicate
the current flow of addreses so I can return to it when an AGAIN is
encountered at run-time!!

So my run-time BEGIN pushes the current list address on the stack
effectively duplicating the current flow.

--------BEGIN------------AGAIN
|
---------------AGAIN

When AGAIN is met, EXIT_ASM is executed and the flow is restored to
the original BEGIN where it is duplicated again and so FORTH :). I
don't know if this kind of implementation may cause problems later but
it is what naturally came to me when trying to figure out BEGIN and
AGAIN. What do you think?

Crypto

Stonelock

unread,

Aug 18, 2004, 3:27:51 AM8/18/04

to

"Elizabeth D Rather" <spam...@crayne.org> wrote in message news:<5MWdnU8LBqI...@vel.net>...

> "Stonelock" <spam...@crayne.org> wrote in message
> news:64f2b1e9.04081...@posting.google.com...
> > Hello,
> >
> > I have to say I'm a little bit confused about the way FORTH works
> > under the hood now. I am implementing my own forth and had a
> > completely different idea of how it was to be implanted, but after
> > reading "Leo Brodie's Starting FORTH" online, I see the way it was
> > implanted is very different than what I had envisioned.
>
> Starting Forth is an interesting book, but don't take it too seriously as
> the only way Forth can be implemented. It's actually pretty dated.

Yes, I'm aware its pretty old but it still gives interresting insights
into the inner workings of the language.

Many
> modern Forths use optimizing compilers to generate machine code. If you're
> interested in learning Forth, I'd really recommend getting a modern Forth
> system and developing skills in programming Forth.

This would not be my primary goal ( its my secondary one ) since the
language itself is just yet another language. Its strength, I think,
lies in its simplicity 'under the hood', in its inner workings as
oposed to other compilers/interpreters.

Then look at how that
> system is implemented, and maybe then try your own if you think you can
> improve on it.

I don't really intend to improve on it; I mostly only want to
understand its inner workings.

Ah! Good news for me :).

but they still must execute IMMEDIATE words instead of
> compiling them.

Not in my implementation ;).

> > >From what I see of how it is done, : basically switches to COMPILE
> > mode and EACH word executes their on COMPILE-TIME code. Would I be
> > correct? How is this useful?
>
> Not exactly. The text is being processed by a word commonly called
> INTERPRET. It takes a word, looks it up in the dictionary, and in most
> cases will either execute it or compile some reference to it depending on a
> system variable called STATE. If STATE is non-zero, INTERPRET will compile
> the reference, unless the word is IMMEDIATE, in which case it will execute
> the word.
>
> > I included here my inner interpreter ( 12 lines of assembly code
> > spread in 3 pretty easy to understand pieces) and a few assembly words
> > written using AT&T syntax (GAS assembler under linux); and also some
> > higher words like 2DUP so you see what's going on.
>
> Not at all familiar with that syntax, but it looks like in most cases more
> instructions than necessary.

Probably more than needed but making it fit in my pocket is not my
objective right now.

> > The inner
> > interpreter basically reads the address of each word constituting a
> > word and executes their respective cfa in turn. For higher level words
> > EXECUTE_ASM is executed which results in a push of the current next
> > word to execute on the return stack and making the current word
> > pointer point to the first word to execute in the list of words of the
> > new higher level word it has to execute. Whenever assembly wrappers
> > are encountered, their assembly code is executed immediately and
> > control is passed on to the following word in the list through NEXT.
> > EXIT, as should, returns control to the next word in the list to the
> > last higher level word executed by poping the return stack.
>
> Implementations I'm familiar with are somewhat simpler, but there are many
> ways to implement Forth. Since you're clearly interested in this topic, I'd
> really recommend that you study some existing implementations.

I've looked up ColorForth, eforth and fig-forth already and quite a
few references around the net ;).

Crypto

Thomas Pornin

unread,

Aug 18, 2004, 12:47:31 PM8/18/04

to

According to Stonelock <spam...@crayne.org>:

> Any help would be very much appreciated.

Here:

http://pornin.nerim.net/forth/

you may find two files, named "bsf-eng.c" and "bsforth". It is a
pedagogical implementation of Forth that I wrote a few months ago in
order to teach Forth to myself (a very good way to learn a programming
language, in my opinion). It is not complete (some standard words lack),
it is not fast, it might be wrong in some places, and it lacks proper
I/O code (there is some code for file access but nothing more). But it
is rather heavily commented and shows how things may work "under the
hood".

bsf-eng.c is the core engine; it is written in C and should work on just
any general-purpose modern machine (i.e., Windows, MacOS and Unix).

bsforth contains definitions for most standard Forth words, using the
few ones defined natively within bsf-eng.c.

If other people want to look at it, comments are welcome. The whole thing
is under Public Domain, which means that I don't care what anyone does
with the code as long as I don't get any blame for whatever reason.

--Thomas Pornin

Elizabeth D Rather

unread,

Aug 18, 2004, 1:34:26 PM8/18/04

to

"Stonelock" <crypto_s...@hotmail.com> wrote in message
news:64f2b1e9.04081...@posting.google.com...
...

> > What does "do" do at runtime? Nothing. It is simply a marker for
> > "while", which does the action: branch back to the marker ("do"). In a
> > C compiler, "do" will create a temporary label in the generated
> > assembly code, and the "while" will generate a branch statement back to
> > it.
>
> Ahhhhhhhhhhh generated assembly code! There goes something i'm not
> doing ;). If we are talking about code generation in FORTH too then
> this is not the way I was going about. If we are talking about
> assembly code generation then I can envision compile-time behavior
> without run-time behavior.

It doesn't have to be assembly code, it can be the address of a word which
will be executed at run-time, laid down by the compile-time code.

> > In Forth, however, BEGIN does this without a temporary label. Instead,
> > all its information is stored on the stack. Let's consider it in
> > combination with AGAIN:
> >
> > : BEGIN ( -- addr ) HERE ; IMMEDIATE
>
> This is also what i do for begin, but at run time ;).

The problem with that is that your address is now on the run-time stack, on
top of whatever data was there. This makes the code inside the loop more
complicated (as well as being non-standard, since BEGIN isn't supposed to
have any effect on the runtime stack).

> > When (BRANCH) executes (at run time), it will get the address that is
> > at the current instruction pointer (the one put there by BEGIN and
> > compiled into the definition by AGAIN). It will load that into the
> > instruction pointer -- effectively performing a branch.
> > Hope this helps!
>
> Well if there is assembly code generation involved this all helped to
> clarify the idea. In this case, it depends how the dictionary entries
> are "compiled". My compilation doesn't involve assembly code
> generation at all.

As I said above, it doesn't have to be machine code, that's just the way a C
compiler (which he was using for his example) works. IMMEDIATE words like
IF, ELSE, WHILE, REPEAT, and AGAIN can compile the address of a definition
to be executed at run-time (which is how the old indirect-threaded
implementations as described in Brodie's book worked).

Stonelock

unread,

Aug 18, 2004, 2:19:22 PM8/18/04

to

> I'll take a pop at an explanation.
>
> In Forth decision structures (such as IF or BEGIN) are very different
> than in more conventional languages. In Forth the compiler is part of the
> language, not a black box that you feed a source code into to get an
> executable file. And the advantage of THAT is you get to use the components
> of the compiler for your own purposes.

Ahhh, more good news and ideas!! And potential new words for me to
create to use to manipulate the inner interpreter as also being part
of the language ;). Or the outer interpreter for that matter (
probably less useful ).

[snip]

Yes, i'm pretty much aware how compilers work in general but thank you
for the info ;).

> In order to understand the decision structures you have to know how
> the Forth compiler works. In its simplest form, Forth reads the input stream
> and executes whatever Forth subroutines it encounters. You can see a flow
> chart describing this at

Hmmm isn't that what is called the outer interpreter? I thought there
was 3 distinct entities, the inner interpreter, the outer interpreter
and the compiler. The inner interpreter executing "compiled" words by
reading the list of addresses constituting a word and executing each
one in turn; the outer interpreter parsing the input stream for words
to execute, feeding them to the compiler in the case of a : ; pair or
feeding them to the inner interpreter in the case of any other word,
and the compiler converting words read from the input stream into a
list of addresses readable by the inner interpreter. Would this be
correct?

> http://Galileo.phys.Virginia.EDU/classes/551.jvn.fall01/primer.htm
>
> Look at the link "The structure of Forth". In particular, look at what Forth
> does if the input is a number or something undefined.
>
>
> As you doubtless know, the compiler has two modes or states, "compile" and
> "execute", and is normally in the latter. Certain words can switch the mode.
> For example, ] switches to compile mode and [ to execute mode.

Ah!! Didn't know those words :). Don't know if i'll need them in my
FORTH implementation but its good to know they exist.

Thus, in a
> simple indirect-threaded Forth without error checking, the definition of
> colon can be as simple as
>
> : : CREATE ] ;

Wonderful! this is very helpful, thank you.

> Basically this says "Take the next piece of text in the input stream and
> make a new dictionary entry under that name. Then switch to compile mode.
>
> In compile mode, when the compiler encounters pieces of text, it looks them
> up in the dictionary and records their "execution tokens" (they could be
> the addresses of some code).

Yes, right now i'm "compiling" everything manually under GAS assembly,
under linux and so the dictionary isn't formed automatically yet. I'm
almost there though ;).

It continues with this until the input stream is
> exhausted. So the body of a word definition is essentially a list of the
> execution tokens of its parts. Now you may ask, how does Forth get out of
> the compile state and return to execution mode? The answer is that certain
> words have been marked IMMEDIATE (that's all the word IMMEDIATE does--it
> marks the latest dictionary definition, perhaps by turning on some bit
> in its name). When the compiler encounters an IMMEDIATE word it executes
> it, even if it is currently in compiling mode.
>
> Thus the terminal semicolon ; is IMMEDIATE and is executed, not compiled.
> What does semicolon do? It installs code for handling the addressing book-
> keeping and then puts the system back into execute mode, usually with [ .

Ah, think i see clearly now how it can be used. I still wonder though;
is it really necessary to have compile time behavior?

[snip]

> Having explained all this, now let me explain the decision structures.
> In Forth the general rule is that things are executed rather than decided.
> Thus IF is an IMMEDIATE subroutine (as is BEGIN) that is executed when a word
> is being compiled. IF installs ?BRANCH (which gets executed whenever the word
> containing it is executed) that branches to a specific address
> whenever there is a 0 (logical FALSE) on the stack. If something else
> is on the stack, ?BRANCH does nothing. Now how does ?BRANCH know where to
> branch to? IF also places the current address HERE (in the body of the word
> being defined) on the stack. That's all it does.

Ahhhhhhhhhhh!!! Wonderful. So basically only 1 word is compiled in the
dictionary instead of 2 in the case of and IF THEN; IF is replaced by
?BRANCH. Yes, this makes alot of sense since my run time
implementation of THEN does nothing at run-time; it is merely used as
a marker by IF ( a 32 bit marker ouch ). Hmmm doesn't this involve
some form of machine code generation though? Or are we strictly
talking inner interpreter manipulation without any code generation? If
machine code generation is involved, i'm afraid my implementation of
FORTH has not yet reached that point. I'm working at a higher level
right now.

> Now an IF has to be resolved by a THEN (some people prefer to call it ENDIF
> or FI). What does THEN (also IMMEDIATE) do? It takes the address that IF left
> on the stack, compares it with the current address (the current value of
> HERE) and computes how many bytes forward to jump. This number is stored
> in the appropriate place, back where IF put in ?BRANCH. How does THEN know
> where to put that relative address? Obviously, a particular Forth will
> know where the appropriate memory cell is, relative to the address that
> IF put on the stack.
>
> Basically BEGIN...UNTIL works the same way, except backwards. BEGIN just
> puts the current address in the body of the new word on the stack. When
> the compilation gets to UNTIL (also IMMEDIATE) it executes the latter.
> UNTIL installs ?BRANCH (or something like it), and uses the address on the
> stack to compute how far _backward_ to jump. It then stores this number in
> the appropriate place. Once again there have been no decisions taking place,
> only the execution of specific subroutines.

Gotcha ;).

Thanks alot for the good info.

Crypto

Joe Knapka

unread,

Aug 18, 2004, 2:44:13 PM8/18/04

to

crypto_s...@hotmail.com (Stonelock) writes:

[snip]

> Well if there is assembly code generation involved this all helped to
> clarify the idea. In this case, it depends how the dictionary entries
> are "compiled". My compilation doesn't involve assembly code
> generation at all.

There's no assembly code generation in Jeff's proposal either. In
fact, BEGIN doesn't generate any code at all (not even threaded-code
tokens) at compile time; it just pushes the value of HERE onto the
stack. AGAIN pulls that address off and then compiles "BRANCH"
"address pushed by BEGIN". This isn't assembly, it's just ordinary
threaded code. The BRANCH token takes its branching target from the
following token in the compiled word's dictionary body.

I find it hard to understand how you can make any of this work
by doing all the control-flow work at runtime. Doesn't it
mess up your data stack in totally unmanageable ways? Or do
you use a separate stack for runtime control-flow data?

The return stack?

> effectively duplicating the current flow.
>
> --------BEGIN------------AGAIN
> |
> ---------------AGAIN
>
> When AGAIN is met, EXIT_ASM is executed and the flow is restored to
> the original BEGIN where it is duplicated again and so FORTH :). I
> don't know if this kind of implementation may cause problems later but
> it is what naturally came to me when trying to figure out BEGIN and
> AGAIN. What do you think?

It's a bit unconventional (which is cool), and probably less efficient
than resolving control-flow at compile time, but if it works for you,
excellent.

It seems your code works, and you're really just bothered by the fact
that you did it a different way than most threaded Forths. The only
response I have to that is, if you've gone this far, you shouldn't
have any trouble figuring out how to do control-flow at compile time
:-) Resolving control constructs at compile time seemed like the
proper approach to me after reading "Starting Forth"; apparently you
just drew a different conclusion about implementation strategy.

-- Joe

--
"We sat and watched as this whole <-- (Died Pretty -- "Springenfall")
blue sky turned to black..."
... Re-defeat Bush in '04.
--
pub 1024D/BA496D2B 2004-05-14 Joseph A Knapka
Key fingerprint = 3BA2 FE72 3CBA D4C2 21E4 C9B4 3230 94D7 BA49 6D2B
If you really want to get my attention, send mail to
jknapka .at. kneuro .dot. net.

Jerry Avins

unread,

Aug 18, 2004, 5:02:34 PM8/18/04

to

Stonelock wrote:

...

>>As you doubtless know, the compiler has two modes or states, "compile" and
>>"execute", and is normally in the latter. Certain words can switch the mode.
>>For example, ] switches to compile mode and [ to execute mode.
>
>
> Ah!! Didn't know those words :). Don't know if i'll need them in my
> FORTH implementation but its good to know they exist.

They make source easier to read and avoid having to scratch your right
ear with your left hand. Consider this embellished stack displayer:

\ Stack dump

2VARIABLE X.R 7 0 X.R 2! \ Argument to .R and U.R and pointer
\ to one of theme

: .S_WIDTH ( n -- ) X.R CELL+ ! ; \ Stores the width argument only.

: (.S) \ Prints the top 11 elements of the stack with each cell

\ occupying a fixed amount of screen.

DEPTH 0= IF CR ." Empty "
ELSE DEPTH DUP ." Depth (decimal): " .D CR 11 MIN DUP
0 DO DUP PICK X.R 2@ EXECUTE 1- LOOP DROP
THEN CR ;

: .S [ ' .R CFA ] LITERAL X.R ! (.S) ; \ Note the brackets.

\ [ Turn off the compiler.
\ ' Get the dictionary address of ...
\ .R ... the proper print routine.
\ CFA From that, compute the execution address.
\ ] Turn the compiler back on.
\ LITERAL Put the number on the stack into the code along with
\ code to put it back on the stack at run time.
\ (.S) Run the stack dump showing signed numbers.
\ (The other choice is unsigned.)

: U.S [ ' U.R CFA ] LITERAL X.R ! (.S) ;

You can also see why [ and ] were chosen. They make a "sidebar"
calculation inside a definition very clear.

Jerry

P.S. .D is defined so: : .D ( n -- ) BASE @ SWAP DECIMAL . BASE ! ;
--
... the worst possible design that just meets the specification - almost
a definition of practical engineering. .. Chris Bore
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Stonelock

unread,

Aug 18, 2004, 5:52:21 PM8/18/04

to

Joe Knapka <jkn...@kneuro.net> wrote in message news:<m3oel8c...@localhost.localdomain>...

> crypto_s...@hotmail.com (Stonelock) writes:
>
> [snip]
>
> > Well if there is assembly code generation involved this all helped to
> > clarify the idea. In this case, it depends how the dictionary entries
> > are "compiled". My compilation doesn't involve assembly code
> > generation at all.
>
> There's no assembly code generation in Jeff's proposal either. In
> fact, BEGIN doesn't generate any code at all (not even threaded-code
> tokens) at compile time; it just pushes the value of HERE onto the
> stack. AGAIN pulls that address off and then compiles "BRANCH"
> "address pushed by BEGIN". This isn't assembly, it's just ordinary
> threaded code. The BRANCH token takes its branching target from the
> following token in the compiled word's dictionary body.

Yeah, I think I get the general idea, but im not exactly sure I see
how this can be implemented in my implementation without adding more
code than there is already is in a word containing BEGIN ... AGAIN. I
need to see more on how BRANCH compiles a different jump for different
BEGIN's.

> I find it hard to understand how you can make any of this work
> by doing all the control-flow work at runtime. Doesn't it
> mess up your data stack in totally unmanageable ways? Or do
> you use a separate stack for runtime control-flow data?

Sorry, I wasn't very clear on that bit, I push the current list
address on the RETURN stack just as you would nest any other higher
level forth word. This duplicated the current flow as I drew in the
previous post

-----------BEGIN-----------------AGAIN
|
--------------------AGAIN

AGAIN simply pops back the return stack, returning to the original
flow at BEGIN where the flow is then duplicated again...and so forth.

Yes, sorry again If i wasnt very clear on that bit :).

> > effectively duplicating the current flow.
> >
> > --------BEGIN------------AGAIN
> > |
> > ---------------AGAIN
> >
> > When AGAIN is met, EXIT_ASM is executed and the flow is restored to
> > the original BEGIN where it is duplicated again and so FORTH :). I
> > don't know if this kind of implementation may cause problems later but
> > it is what naturally came to me when trying to figure out BEGIN and
> > AGAIN. What do you think?
>
> It's a bit unconventional (which is cool),

I love the way it turned out but I did want to understand how it is
usually done :).

> and probably less efficient
> than resolving control-flow at compile time,

Probably ;).

but if it works for you,
> excellent.
>
> It seems your code works, and you're really just bothered by the fact
> that you did it a different way than most threaded Forths.

Not so much bothered as intrigued since I solved the problem a
completely different way; and so I was wondering how it worked in more
common forth implementations.

The only
> response I have to that is, if you've gone this far, you shouldn't
> have any trouble figuring out how to do control-flow at compile time
> :-) Resolving control constructs at compile time seemed like the
> proper approach to me after reading "Starting Forth"; apparently you
> just drew a different conclusion about implementation strategy.

Personally, I prefer the run-time approach, but I still need to
understand quite a few little bits about the inner workings of the
FORTH approach and so I might change my mind later ;).

Crypto

Stonelock

unread,

Aug 18, 2004, 5:57:38 PM8/18/04

to

"Elizabeth D Rather" <era...@forth.com> wrote in message news:<GeqdnRZ-4aW...@vel.net>...

> "Stonelock" <crypto_s...@hotmail.com> wrote in message
> news:64f2b1e9.04081...@posting.google.com...
> ...
> > > What does "do" do at runtime? Nothing. It is simply a marker for
> > > "while", which does the action: branch back to the marker ("do"). In a
> > > C compiler, "do" will create a temporary label in the generated
> > > assembly code, and the "while" will generate a branch statement back to
> > > it.
> >
> > Ahhhhhhhhhhh generated assembly code! There goes something i'm not
> > doing ;). If we are talking about code generation in FORTH too then
> > this is not the way I was going about. If we are talking about
> > assembly code generation then I can envision compile-time behavior
> > without run-time behavior.
>
> It doesn't have to be assembly code, it can be the address of a word which
> will be executed at run-time, laid down by the compile-time code.

Hmmm.

> > > In Forth, however, BEGIN does this without a temporary label. Instead,
> > > all its information is stored on the stack. Let's consider it in
> > > combination with AGAIN:
> > >
> > > : BEGIN ( -- addr ) HERE ; IMMEDIATE
> >
> > This is also what i do for begin, but at run time ;).
>
> The problem with that is that your address is now on the run-time stack, on
> top of whatever data was there. This makes the code inside the loop more
> complicated (as well as being non-standard, since BEGIN isn't supposed to
> have any effect on the runtime stack).

Yes, as i corrected in another post, I'm sorry again if I wasn't clear
on that, I use the return stack to accomplish the flow duplication.

------------BEGIN------------------AGAIN
|
---------------------AGAIN

> > > When (BRANCH) executes (at run time), it will get the address that is
> > > at the current instruction pointer (the one put there by BEGIN and
> > > compiled into the definition by AGAIN). It will load that into the
> > > instruction pointer -- effectively performing a branch.
> > > Hope this helps!
> >
> > Well if there is assembly code generation involved this all helped to
> > clarify the idea. In this case, it depends how the dictionary entries
> > are "compiled". My compilation doesn't involve assembly code
> > generation at all.
>
> As I said above, it doesn't have to be machine code, that's just the way a C
> compiler (which he was using for his example) works. IMMEDIATE words like
> IF, ELSE, WHILE, REPEAT, and AGAIN can compile the address of a definition
> to be executed at run-time (which is how the old indirect-threaded
> implementations as described in Brodie's book worked).

Hmmm, I'll have to check more into this.

Crypto

Jerry Avins

unread,

Aug 18, 2004, 9:14:13 PM8/18/04

to

Stonelock wrote:

...

> Personally, I prefer the run-time approach, but I still need to
> understand quite a few little bits about the inner workings of the
> FORTH approach and so I might change my mind later ;).

By putting off to run time what you could have done at compile time, you
not only complicate access to items on the stack (or require a separate
control stack), but you also do every time the program is run what the
compiler needs to do only once. That's an inefficiency many would try to
avoid.

Jerry

Elizabeth D Rather

unread,

Aug 18, 2004, 11:46:04 PM8/18/04

to

"Stonelock" <crypto_s...@hotmail.com> wrote in message
news:64f2b1e9.04081...@posting.google.com...

> > I find it hard to understand how you can make any of this work

> > by doing all the control-flow work at runtime. Doesn't it
> > mess up your data stack in totally unmanageable ways? Or do
> > you use a separate stack for runtime control-flow data?
>
> Sorry, I wasn't very clear on that bit, I push the current list
> address on the RETURN stack just as you would nest any other higher
> level forth word. This duplicated the current flow as I drew in the
> previous post
>
> -----------BEGIN-----------------AGAIN
> |
> --------------------AGAIN
>
> AGAIN simply pops back the return stack, returning to the original
> flow at BEGIN where the flow is then duplicated again...and so forth.

But that blocks access to the return stack, so you have a restriction on the
use of >R, R@, and R> around BEGIN loops just as you do around DO...LOOP
structures. Maybe you could live with that, but it's non-standard, not to
mention being unnecessary.

Elizabeth D Rather

unread,

Aug 19, 2004, 12:27:28 AM8/19/04

to

"Stonelock" <spam...@crayne.org> wrote in message
news:64f2b1e9.04081...@posting.google.com...

> Hmmm isn't that what is called the outer interpreter? I thought there
> was 3 distinct entities, the inner interpreter, the outer interpreter
> and the compiler. The inner interpreter executing "compiled" words by
> reading the list of addresses constituting a word and executing each
> one in turn; the outer interpreter parsing the input stream for words
> to execute, feeding them to the compiler in the case of a : ; pair or
> feeding them to the inner interpreter in the case of any other word,
> and the compiler converting words read from the input stream into a
> list of addresses readable by the inner interpreter. Would this be
> correct?

"Outer" and "inner" interpreters are obsolete terms; we gave up on them
because they're very un-descriptive. A better way of thinking about them is
a "text interpreter" which processes the input stream (keyboard, disk,
etc.), and for indirect-threaded implementations such as you're talking
about an "address interpreter" which processes addresses. Not all Forths
have an address interpreter.

There really isn't a separate "compiler", just a distictive behavior of the
text interpreter when the STATE flag is non-zero (set by : and ] ), in which
case it compiles references to non-IMMEDIATE words. IMMEDIATE words are
always executed, and when STATE=0 (set by ; and [ ) everything is executed.

Julian V. Noble

unread,

Aug 19, 2004, 10:40:52 AM8/19/04

to

Of course I omitted to mention (in the interest of simplicity and avoiding
confusion) that IMMEDIATE words can be compiled rather than executed via

POSTPONE and [COMPILE]

That is, these words override the usual behavior and make the compiler
treat an IMMEDIATE word as though it were a non-IMMEDIATE one. Stonelock's
best bet is to consult the dpANS6 document, available as HTML lots of places
including

http://Galileo.phys.Virginia.EDU/classes/551.jvn.fall01/DPANS.HTM

Stephen Pelc

unread,

Aug 19, 2004, 1:26:44 PM8/19/04

to

On Tue, 17 Aug 2004 19:18:07 +0000 (UTC), spam...@crayne.org
(Stonelock) wrote:

>Could somebody explain to me the benefits of having run-time and
>compile-time behaviors? Is it useful at all to have compile-time
>behavior? I implanted BEGIN as pure run-time, no compile-time, whereas
>in Leo Brodie's Starting FORTH chapter 11, BEGIN is pure compile-time
>and no run-time??

The problem with BEGIN ... XXX loops pushing addresses at run-time
is that if you use the parameter or return stack you introduce
non-standard behaviour. Most systems that do this end with a
run-time structure stack to avoid the problem.

The bigger issue is with IF ... ELSE ... THEN as IF doesn't yet
have a destination. The cheap solution is scan forward, but then
you hit the nested control structures problem, and performance
is lousy.

However, code density can be very good as you don't have to
lay destinations at compile time, everything happens at run time.
The temptation with run-time target address generation is to
become completely non-standard, as it solves so many issues,
e.g.

IFT X \ executes A if TOS=nz
IFET X Y \ has else action
BA X \ Begin ... again
BWR X Y \ begin while repeat, X=test, Y=action
BU X \ Begin until

Overall, the classical compile-time behaviour will probably lead
to the shortest implementation for a standard Forth. See the
EuroForth papers - there's a system from Moscow AFAIR that
behaves as you suggest.

Stephen
--
Stephen Pelc, steph...@INVALID.mpeltd.demon.co.uk
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeltd.demon.co.uk - free VFX Forth downloads

Stonelock

unread,

Aug 19, 2004, 2:49:12 PM8/19/04

to

"Elizabeth D Rather" <era...@forth.com> wrote in message news:<mrydnYJcuN_...@vel.net>...

> "Stonelock" <crypto_s...@hotmail.com> wrote in message
> news:64f2b1e9.04081...@posting.google.com...
>
> > > I find it hard to understand how you can make any of this work
> > > by doing all the control-flow work at runtime. Doesn't it
> > > mess up your data stack in totally unmanageable ways? Or do
> > > you use a separate stack for runtime control-flow data?
> >
> > Sorry, I wasn't very clear on that bit, I push the current list
> > address on the RETURN stack just as you would nest any other higher
> > level forth word. This duplicated the current flow as I drew in the
> > previous post
> >
> > -----------BEGIN-----------------AGAIN
> > |
> > --------------------AGAIN
> >
> > AGAIN simply pops back the return stack, returning to the original
> > flow at BEGIN where the flow is then duplicated again...and so forth.
>
> But that blocks access to the return stack, so you have a restriction on the
> use of >R, R@, and R> around BEGIN loops just as you do around DO...LOOP
> structures.

The restrictions are only from outside to inside or inside to outside

Its no stranger than having local variables inside a while statement
in a language like C that can not be used outside the scope of the
brackets. If something is important for me to make stick around, I'll
put it on the stack, not the return stack.

Maybe you could live with that, but it's non-standard, not to
> mention being unnecessary.

Probably non-standard and unnecessary...so? With all due respect, If
you are to comment and help me understand what you think is important
to notice, I would appreciate you being a tad more respectful and
courteous. Because you know, and I don't, doesn't give you any special
rights. This is not the first time this happens, and I was already
reluctant to answer your previous postings. If this doesn't change,
next time i'll simply ignore your posts.

Crypto

Jerry Avins

unread,

Aug 19, 2004, 3:53:54 PM8/19/04

to

Stonelock wrote:

...

> Probably non-standard and unnecessary...so? With all due respect, If
> you are to comment and help me understand what you think is important
> to notice, I would appreciate you being a tad more respectful and
> courteous. Because you know, and I don't, doesn't give you any special
> rights. This is not the first time this happens, and I was already
> reluctant to answer your previous postings. If this doesn't change,
> next time i'll simply ignore your posts.
>
> Crypto

Whoa! Where's the disrespect? However little weight you give to being
standard, it seems silly to give it up in order to obtain a disadvantage
in some other way. Your way provides slower execution, a limitation on
the code you can write and more complexity under the hood. If you want
everyone to agree with you all the time, just say so and they probably
will oblige. Otherwise, engage in polite dialog.

Stonelock

unread,

Aug 19, 2004, 4:27:40 PM8/19/04

to

Jerry Avins <j...@ieee.org> wrote in message news:<4123fee7$0$21743$61fe...@news.rcn.com>...

> Stonelock wrote:
>
> ...
>
> > Personally, I prefer the run-time approach, but I still need to
> > understand quite a few little bits about the inner workings of the
> > FORTH approach and so I might change my mind later ;).
>
> By putting off to run time what you could have done at compile time, you
> not only complicate access to items on the stack (or require a separate
> control stack), but you also do every time the program is run what the
> compiler needs to do only once. That's an inefficiency many would try to
> avoid.

I think the same can be said about interpreted code in general as
opposed to compiled code.

Crypto

Message has been deleted

Jerry Avins

unread,

Aug 19, 2004, 5:50:04 PM8/19/04

to

Stonelock wrote:

> Jerry Avins <j...@ieee.org> wrote in message
news:<4123fee7$0$21743$61fe...@news.rcn.com>...
>
>> Stonelock wrote:
>>
>> ...
>>
>>
>>> Personally, I prefer the run-time approach, but I still need to
>>> understand quite a few little bits about the inner workings of the

>>> FORTH approach and so I might change my mind later .

>>
>>
>> By putting off to run time what you could have done at compile time, you
>> not only complicate access to items on the stack (or require a separate
>> control stack), but you also do every time the program is run what the
>> compiler needs to do only once. That's an inefficiency many would try to
>> avoid.
>
>
>
> I think the same can be said about interpreted code in general as
> opposed to compiled code.
>
> Crypto

True, but it's a mistake to think of Forth as interpreted code. The text
interpreter gets instructions from the input stream (typically, the
keyboard) and executes them one at a time. If an instruction calls for
compilation, an addition is made to the dictionary. If an instruction
calls for any other action, that action is performed. If the instruction
is the name of a dictionary entry, the code at that entry (which is
already compiled) is executed. Neither the sequence of events nor the
baggage of most interpreted languages applies to Forth.

Stonelock

unread,

Aug 19, 2004, 9:44:51 PM8/19/04

to

spam...@crayne.org (Stephen Pelc) wrote in message news:<4124886a....@192.168.0.1>...

> On Tue, 17 Aug 2004 19:18:07 +0000 (UTC), spam...@crayne.org
> (Stonelock) wrote:
>
> >Could somebody explain to me the benefits of having run-time and
> >compile-time behaviors? Is it useful at all to have compile-time
> >behavior? I implanted BEGIN as pure run-time, no compile-time, whereas
> >in Leo Brodie's Starting FORTH chapter 11, BEGIN is pure compile-time
> >and no run-time??
>
> The problem with BEGIN ... XXX loops pushing addresses at run-time
> is that if you use the parameter or return stack you introduce
> non-standard behaviour.

Hmmm, do you know of any case in which this non-conventional behavior
would be problematic?

Most systems that do this end with a
> run-time structure stack to avoid the problem.

Nah, too many stacks now ;).

> The bigger issue is with IF ... ELSE ... THEN as IF doesn't yet
> have a destination. The cheap solution is scan forward, but then
> you hit the nested control structures problem,

Ahhh, yes I see now. Hadn't anticipated nested control structures. I
do use the scan forward approach right now. I'll have to change that
:).

and performance
> is lousy.

Yeah probably, but for now this isn't of much concern to me. I mostly
want to understand the inner workings of the FORTH approach.

>
> However, code density can be very good as you don't have to
> lay destinations at compile time, everything happens at run time.
> The temptation with run-time target address generation is to
> become completely non-standard, as it solves so many issues,

Hmmm...Run time address generating could be interresting to. I'll have
to explore the avenue ;), but I think I see the usefulness of compile
time behavior now. If I understand right, basically,

IF pushes its list address on the stack and will write down the BRANCH
address in the dictionary and will CELL ALLOT for the branching
address to come. THEN will write to the IF address + 1 its own list
address and does nothing else. BRANCH does the decision taking and
branches or not.

For BEGIN, it pushes its list address on the stack and does nothing
else, When say AGAIN is encountered, it will write down the BRANCH
address and write the BEGIN address to the spot reserved for the
branching address; | BRANCH | addr |.

And so BEGIN, AGAIN, WHILE, REPEAT, IF, ELSE, THEN all have to be
IMMEDIATE.

What about the decision taking in the case of a BEGIN ... AGAIN vs IF
... ELSE ... THEN, do we need a different BRANCH for IF's and loops
like BEGIN and cie? (BRANCH) and BRANCH ?

>
> IFT X \ executes A if TOS=nz
> IFET X Y \ has else action
> BA X \ Begin ... again
> BWR X Y \ begin while repeat, X=test, Y=action
> BU X \ Begin until
>
> Overall, the classical compile-time behaviour will probably lead
> to the shortest implementation for a standard Forth. See the
> EuroForth papers - there's a system from Moscow AFAIR that
> behaves as you suggest.

Things are clearer now. Thanks a lot for the all the info. I will also
check into the EuroForth papers when I get a little time ;).

Crypto

Jerry Avins

unread,

Aug 19, 2004, 11:06:53 PM8/19/04

to

Stonelock wrote:

...

> Ahhh, yes I see now. Hadn't anticipated nested control structures. I
> do use the scan forward approach right now. I'll have to change that
> :).
>
> and performance
>
>>is lousy.
>
>
> Yeah probably, but for now this isn't of much concern to me. I mostly
> want to understand the inner workings of the FORTH approach.

Why not use your time to learn how experts do it instead of inventing a
system that almost works and will ultimately be a dead end? I think it's
best to learn what's common before creating variants without a rationale
for them. Variant recipes created by inexperienced cooks rarely make for
good eating, no matter how inspired they mat seem at first. Consider
banana pineapple pizza.

>
>>However, code density can be very good as you don't have to
>>lay destinations at compile time, everything happens at run time.
>>The temptation with run-time target address generation is to
>>become completely non-standard, as it solves so many issues,
>
>
> Hmmm...Run time address generating could be interresting to. I'll have
> to explore the avenue ;), but I think I see the usefulness of compile
> time behavior now. If I understand right, basically,
>
> IF pushes its list address on the stack and will write down the BRANCH
> address in the dictionary

On my system, the dictionary has to be ROMable before the program is run.

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Stonelock

unread,

Aug 20, 2004, 1:05:37 AM8/20/04

to

Jerry Avins <j...@ieee.org> wrote in message news:<41250554$0$21741$61fe...@news.rcn.com>...

> Stonelock wrote:
>
> ...
>
> > Probably non-standard and unnecessary...so? With all due respect, If
> > you are to comment and help me understand what you think is important
> > to notice, I would appreciate you being a tad more respectful and
> > courteous. Because you know, and I don't, doesn't give you any special
> > rights. This is not the first time this happens, and I was already
> > reluctant to answer your previous postings. If this doesn't change,
> > next time i'll simply ignore your posts.
> >
> > Crypto
>
> Whoa! Where's the disrespect?

The disrespect is very simply throwing it in my face without really
explaining why. You're approach is flawed, you're approach is slow,
you have nervous legs and this and that.

I'm not here to get blasted by madam. I asked a question. Some people
answered it and actually gave very good explainations to make me
understand the advantages of compile time behavior. Which I will most
probably switch to now that I see the advantages of the approach.

However little weight you give to being
> standard,

0 weight.

it seems silly to give it up in order to obtain a disadvantage
> in some other way.

Agreed, but it can be brought up in a nicer way by very simply giving
some explaination as to why the approach is flawed. I'm sorry but
plain telling me my approach is flawed in a few different posts,
doesn't make me understand much of anything.

Your way provides slower execution, a limitation on
> the code you can write and more complexity under the hood.

I know that, I'm not entirely stupid, thats why i'm asking questions,
i'm trying to figure out a good way to do things. If she has any
constructive comments i'll be glad to hear them. Any comment bordering
the "I've seen better implementations" and the such can stay where
they're at as I can do very well without them.

If you want
> everyone to agree with you all the time, just say so and they probably
> will oblige. Otherwise, engage in polite dialog.

If you look around at other postings I posted in this thread, you will
notice that I answered politely to everybody who's actually took the
time to explain the comments they made.

Crypto

Stonelock

unread,

Aug 20, 2004, 1:21:39 AM8/20/04

to

Jerry Avins <j...@ieee.org> wrote in message news:<41251f77$0$21761$61fe...@news.rcn.com>...

> True, but it's a mistake to think of Forth as interpreted code. The text
> interpreter gets instructions from the input stream (typically, the
> keyboard) and executes them one at a time. If an instruction calls for
> compilation, an addition is made to the dictionary. If an instruction

> calls for any other action, that action is performed., the code at that
> entry, which is already compiled, is executed. Neither the sequence of

> events nor the baggage of most interpreted languages applies to Forth.

Agreed, but forth is a strange beast in many ways and so if we
consider indirect threaded code implementation, there is an
intermediate form of "compilation", which could be considered to be a
form of code to be interpreted; but then again not in the traditional
sense. I think that in this case, the word compilation and
interpretation are not perfectly adequate if we use them in a
conventional way.

Crypto

Jerry Avins

unread,

Aug 20, 2004, 11:08:33 AM8/20/04

to

Stonelock wrote:

It seems to me that you were overly touchy when you wrote your blast. I
quote in full the words you responded to here:

"But that blocks access to the return stack, so you have a restriction
on the
use of >R, R@, and R> around BEGIN loops just as you do around DO...LOOP

structures. Maybe you could live with that, but it's non-standard, not to
mention being unnecessary.

Cheers,
Elizabeth"

Where does any of that come close to "I've seen better implementations?"
It seems pretty constructive and explanatory to me.

Stephen Pelc

unread,

Aug 20, 2004, 2:14:26 PM8/20/04

to

On Fri, 20 Aug 2004 01:44:51 +0000 (UTC), spam...@crayne.org
(Stonelock) wrote:

>(Stephen Pelc) wrote in message news:<4124886a....@192.168.0.1>...

>> The problem with BEGIN ... XXX loops pushing addresses at run-time
>> is that if you use the parameter or return stack you introduce
>> non-standard behaviour.
>
>Hmmm, do you know of any case in which this non-conventional behavior
>would be problematic?

Whenever you pass parameters into a loop or structure.

> and performance is lousy.
>
>Yeah probably, but for now this isn't of much concern to me. I mostly
>want to understand the inner workings of the FORTH approach.

You can never have too much performance.

Please. Study a few existing and classical implementations
before you implement your own Forth. There are very good
reasons for the traditional implementations, even if cheap
fast 16/32 bit CPUs have lead many of us to optimised
native code compilers.

>IF pushes its list address on the stack and will write down the BRANCH
>address in the dictionary and will CELL ALLOT for the branching
>address to come. THEN will write to the IF address + 1 its own list
>address and does nothing else. BRANCH does the decision taking and
>branches or not.
>
>For BEGIN, it pushes its list address on the stack and does nothing
>else, When say AGAIN is encountered, it will write down the BRANCH
>address and write the BEGIN address to the spot reserved for the
>branching address; | BRANCH | addr |.

Basically so. The required primitives are usually called BRANCH
and QBRANCH ( flag -- ) for unconditional and conditional
branches. They are all you need. They are followed by data
which indicates the branch target address. Now you you can
choose between offsets of 8/16/24/32 bits or absolute
addresses.

Judges1318

unread,

Aug 23, 2004, 1:06:33 PM8/23/04

to

Jerry Avins wrote:

> On my system, the dictionary has to be ROMable before the program is run.

What do you do for a VARIABLE, then?

Elizabeth D Rather

unread,

Aug 23, 2004, 2:16:20 PM8/23/04

to

"Judges1318" <spam...@crayne.org> wrote in message
news:4129BECC...@beaurat.at.hotpop.stop.com...

The executable part of a VARIABLE needs to be in ROM, including a pointer to
its cell of data space, which is in RAM. Thus the implementation of
VARIABLE ends up looking a lot more like a CONSTANT, although the behavior
is equivalent to an all-RAM VARIABLE.

Jerry Avins

unread,

Aug 23, 2004, 2:54:37 PM8/23/04

to

Judges1318 wrote:

I use a word called RAMVAR. It makes a CONSTANT, a pointer to RAM space
where the actual variable is located. That pointer is analogous to HERE,
and is bumped whenever a new RAMVAR is defined. That was simple enough
so I could think it up myself, but I supply more detail if you want it.

Jerry

: RAMVAR RAMHERE @ CONSTANT RAMHERE CELL+ ;

Elizabeth D Rather

unread,

Aug 23, 2004, 4:09:55 PM8/23/04

to

"Jerry Avins" <spam...@crayne.org> wrote in message
news:412a37d4$0$21741$61fe...@news.rcn.com...

> Judges1318 wrote:
>
> >
> >
> > Jerry Avins wrote:
> >
> >> On my system, the dictionary has to be ROMable before the program is
run.
> >
> >
> > What do you do for a VARIABLE, then?
>
> I use a word called RAMVAR. It makes a CONSTANT, a pointer to RAM space
> where the actual variable is located. That pointer is analogous to HERE,
> and is bumped whenever a new RAMVAR is defined. That was simple enough
> so I could think it up myself, but I supply more detail if you want it.
>
> Jerry
>
> : RAMVAR RAMHERE @ CONSTANT RAMHERE CELL+ ;

Hmmm, unusual definition of CELL+. Don't you mean:

: RAMVAR ( -- ) RAMHERE @ DUP CONSTANT CELL+ RAMHERE ! ;

Jerry Avins

unread,

Aug 23, 2004, 7:19:24 PM8/23/04

to

Elizabeth D Rather wrote:

> "Jerry Avins" <spam...@crayne.org> wrote in message
> news:412a37d4$0$21741$61fe...@news.rcn.com...
>
>>Judges1318 wrote:
>>
>>
>>>
>>>Jerry Avins wrote:
>>>
>>>
>>>>On my system, the dictionary has to be ROMable before the program is
>
> run.
>
>>>
>>>What do you do for a VARIABLE, then?
>>
>>I use a word called RAMVAR. It makes a CONSTANT, a pointer to RAM space
>>where the actual variable is located. That pointer is analogous to HERE,
>>and is bumped whenever a new RAMVAR is defined. That was simple enough
>>so I could think it up myself, but I supply more detail if you want it.
>>
>>Jerry
>>
>>: RAMVAR RAMHERE @ CONSTANT RAMHERE CELL+ ;
>
>
> Hmmm, unusual definition of CELL+. Don't you mean:
>
> : RAMVAR ( -- ) RAMHERE @ DUP CONSTANT CELL+ RAMHERE ! ;
>
> Cheers,
> Elizabeth

No. I meant my idiosyncratic CELL+!, effectively on that system, 2+!,
modeled after 1+! Thanks for the catch.

Jerry

Dr. Bruce R. McFarling

unread,

Aug 24, 2004, 3:25:12 PM8/24/04

to

crypto_s...@hotmail.com (Stonelock) wrote in message news:<64f2b1e9.04081...@posting.google.com>...

> Sorry, I wasn't very clear on that bit, I push the current list
> address on the RETURN stack just as you would nest any other higher
> level forth word. This duplicated the current flow as I drew in the
> previous post

> -----------BEGIN-----------------AGAIN
> |
> --------------------AGAIN

> AGAIN simply pops back the return stack, returning to the original
> flow at BEGIN where the flow is then duplicated again...and so forth.

Respectfully, there are two problems with this.

First, you have blocked access to the return stack. That eliminates
the common pattern model of

... >R ... BEGIN ... R@ ... AGAIN ( and ... RDROP ... if you have some
way to escape an endless loop)

Now, that does not mean "its impossible to work around this". Rather,
the point is that you are imposing a limit, and doing so when it is
not necessary.

Second, intrinsically, you have replaced a sequence of

... PUSH CURRENT DICTIONARY POSITION ONTO STACK
... USE DICTIONARY POSITION FROM STACK TO COMPILE A JUMP TO THAT
POSITION

once for

... PUSH CURRENT DICTIONARY POSITION ONTO RETURN STACK
... USE DICTIONARY POSITION FROM RETURN STACK TO PERFORM A JUMP TO
THAT POSITION

each and every time you do the loop. Now, on a real fast machine, when
you are compiling a program you are close to being I/O bound, so the
compile-time work is a much smaller overhead, while on a real slow
machine, it is not only important to precompute as much as possible,
but also likely to be substantially faster to do a compiled-in jump
than a jump based on a return stack entry.

In other words, early binding is normally more efficient than late
binding, so it is preferred when it is equally effective. In this
case, it is equally effective as late binding.

On another point (that is, responding to another subthread), there
certainly is a sense in which a traditional indirect threaded
implementation is "interpreting" the "compiled" code. However,
exactly the same source can be compiled to execute as assembly
language in a subroutine threaded implementation or when executed on a
Forth processor. Therefore, you can equally well think of a
traditional indirect threaded implementation as EMULATING a processer
where the core Forth primitives are hardware instructions and a
routine address acts as a hardware jumpt to subroutine instruction.

Twenty years ago, when I was first starting to learn Forth, indirect
and direct threading may have been a norm. Nowadays, optimised
compilation into the hardware instructions of the processor seems to
be very common, especially for modern processors where many of the
core Forth primitives are only one or a few hardware instructions.

And a hot topic is processors that are stack machines themselves, so
that they support this approach very efficiently, with far less
hardware overhead than a Complex Instruction Set Computer or Reduced
Instruction Set Computer. Since this is even more reduced than
"reduced", an acronym sometimes used is MISC, for Minimal Instruction
Set Computers.

And, by the way, most of the people on this newsgroup are not being
rude if they provide answers that are quick and to the point. They
are trying to be helpful in providing the answers in the first place.
Those who have been on the newsgroup for more than a decade naturally
approach the newsgroup in a casual, conversational style, and tend to
dispense with the formalities that might be expected among strangers
in other settings.

I say "most" because there are, of course, the natural varieties of
personalities among those present. Some might in fact be engaging in
behaviour that you would view as rude. However, this certainly does
not apply to Ms. Rather. I can assure you, she was by no means
attempting to be rude. She was just trying to get to the crux of the
matter.