Newbie Question: dictionary organization?

Charles Turner

unread,

Jul 11, 2006, 8:38:09 AM7/11/06

to

HI all-

I'm writing a simple Forth for the PIC 18f2620 chip, now with the
inner and outer interpreters mostly complete, working on the
dictionary. I've got a rudimentary format that looks like this:

; ************
; * AND *
; ************

N_LAND:
dw 3
dw "AND"
dw LAND ; CFA
dw N_USLAS ; LFA: points to previous word's NFA
dw N_LAND ; NFA
LAND:
goto NEXT ; a NOP at this point...

As you can see, the format is heavily indebted to Albert van der
Horst's ciForth.

My question is that it seems ciForth's NFA points to the previous
word's CFA or PFA, not to its NFA. I'd assume on a dictionary search,
one would want to traverse the name fields quickly, and so pointing
directly to the previous NFA would be a good thing. But I'm sure there
has to be a good reason for this 'indirection" in ciForth.

This is compounded for me because I think the PIC 18f chips can do a
single level of indirection easily, but an additional level of
indirection is a lot more complicated. I've pursued a Direct Threaded
Forth as a result.

I'd be appreciative of any insights offered here!

Many thanks, Charles Turner

Jean-François Michaud

unread,

Jul 11, 2006, 9:53:59 AM7/11/06

to

Hello Charles,

I was looking at the dictionary entry and it seemed to me that some of
the values are unnecessary. I'm not familiar with the bulk of your
implementation, but why not opt for a simple format? Also, adding the
previous dictionary entry would help us see what you mean more clearly.
I personally think that the NFA is not needed. This seems to be
sufficient (Depending on your architecture; 16 bits, 32 bit, I suggest
you pad names so that each entry falls on a memory boundary. This will
accelerate memory fetches when the processor runs through the
dictionary looking for a correct entry). I will assume 32bit
architecture:

N_USLAS:
dw 5
dw "USLAS"
dw 0x0000
dw N_OR ; LFA: points to previous word's NFA
dw USLAS ; CFA
USLAS:

goto NEXT ; a NOP at this point...

N_LAND:
dw 3
dw "AND"

dw N_USLAS ; LFA: points to previous word's NFA

dw LAND ; CFA

LAND:
goto NEXT ; a NOP at this point...

Basically, the LFA points at the beginning of the previous word. If a
match isn't found you advance forward to the LFA, @, and go on to see
if the next entry in the dictionary is a match.

Regards
Jean-Francois Michaud

Mark W. Humphries

unread,

Jul 11, 2006, 10:16:17 AM7/11/06

to

I use the following macro to lay down headers in FASM:

macro HEADER name*,symbol*,vocabulary*,mask* {
align bytes_per_cell
lfa_#symbol dd vocabulary
cfpa_symbol dd cfa_#symbol
nfa_#symbol db (@f -name_#symbol -1) or mask
@@:
vocabulary = nfa_#symbol
last = lfa_#symbol
}

Cheers,
Mark

Charles Turner

unread,

Jul 11, 2006, 11:47:03 AM7/11/06

to

Hi Jean-Francois-

Thanks. Two entries look like this:

; *****************
; * UM/MOD *
; *****************

N_USLAS:
dw 6
dw "UM/MOD"
dw USLAS
dw N_USTAR
dw N_USLAS
USLAS:
goto NEXT

; ************
; * AND *
; ************

N_LAND:
dw 3
dw "AND"
dw LAND

dw N_USLAS
dw N_LAND
LAND:
goto NEXT

I see your point about the NFA, which could be calculated if needed.
Thanks also about the warning on alignment; I think I've got the data
statements correct for the PIC.

It's a Harvard architecture microcontroller, 8 bit data path, 20 bit PC
and 12 bit data addressing. Not the nicest chip for Forth, but I have a
few friends who program them, so that's a help.

Best and thanks! Charles

Charles Turner

unread,

Jul 11, 2006, 11:51:26 AM7/11/06

to

Mark W. Humphries wrote:
> I use the following macro to lay down headers in FASM:

Mark-

Thanks for this. You know, I had removed some macros I created because
the PIC simulator seems to jump to the macro declaration, instead of a
disassembly of what's inline in the word. I found it very
disconcerting.

But the dictionary may be a perfectly fine place for such behavior.
I'll let you know what happens.

Thanks again! Charles

Paul Smith

unread,

Jul 11, 2006, 1:49:16 PM7/11/06

to

Especially for microcontroller, I liked to use this header structure:
dw len+flags (misc flags, like precedence, if applicable, and name length)
dw name (aligned word name string)
dw body (pointing first address of word "body", may be cfa)

headers and bodies are separated. headers sit in consecutive address, therefore no link
to next header. getting to the next header is done by adding aligned name length to address
of current word. a new header is added before the last word, for searching through headers
in the order last-to-first. as result, headers are added from higher to lower addresses.

bodies do not link back to headers. there is no need to, and this facilitates moving headers
around, and/or using a different organization, for faster searching for example. in the few cases
that i need to find header from body, i simply search for the header with matching body address.
This is relatively slow, but hardly ever used - if so, mostly interactively, for debugging and such.
Being able to get rid of NFA and LFA is more than worth this minor slowdown, IMHO. Consider that
headers are relocatable, without requiring any relinking, or offset calculation. Just move them
around, and update contents of LAST or the like.
I never do assume specific header- or body structure for translating between header/body addresses.
In combination with a deferred FIND, this allows me do change the header layout or organization on
the fly, runtime. or, to get rid of headers completely, freeing the space in EEPROM or Flash, if
desired. Not seperating headers from bodies will make this an awkward operation, if possible at all
(or wasting lots of space by the gaps the headers leave you).

Jean-François Michaud

unread,

Jul 11, 2006, 5:09:18 PM7/11/06

to

Paul Smith wrote:
> Especially for microcontroller, I liked to use this header structure:
> dw len+flags (misc flags, like precedence, if applicable, and name length)
> dw name (aligned word name string)
> dw body (pointing first address of word "body", may be cfa)
>
> headers and bodies are separated. headers sit in consecutive address, therefore no link
> to next header. getting to the next header is done by adding aligned name length to address
> of current word. a new header is added before the last word, for searching through headers
> in the order last-to-first. as result, headers are added from higher to lower addresses.

This is quite the interresting idea. I had never thought of that, but
it makes alot of sense. I might reimplement my Forth to take advantage
of this feature. Shave off a few Ks of code ;-) and gain some
flexibility.

> bodies do not link back to headers. there is no need to, and this facilitates moving headers
> around, and/or using a different organization, for faster searching for example. in the few cases
> that i need to find header from body, i simply search for the header with matching body address.
> This is relatively slow, but hardly ever used - if so, mostly interactively, for debugging and such.
> Being able to get rid of NFA and LFA is more than worth this minor slowdown, IMHO. Consider that
> headers are relocatable, without requiring any relinking, or offset calculation. Just move them
> around, and update contents of LAST or the like.
> I never do assume specific header- or body structure for translating between header/body addresses.
> In combination with a deferred FIND, this allows me do change the header layout or organization on
> the fly, runtime. or, to get rid of headers completely, freeing the space in EEPROM or Flash, if
> desired. Not seperating headers from bodies will make this an awkward operation, if possible at all
> (or wasting lots of space by the gaps the headers leave you).

Thanks much for the idea.

Regards
Jean-Francois Michaud

Coos Haak

unread,

Jul 11, 2006, 6:18:10 PM7/11/06

to

Op 11 Jul 2006 14:09:18 -0700 schreef Jean-François Michaud:

You might have a look how eforth is implemented.
The dictionary (bodies) are allocate upwards (of course)
but the headers downwards from the end of memory.

--
Coos

CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html

Mark W. Humphries

unread,

Jul 11, 2006, 8:03:55 PM7/11/06

to

Some typos slipped in to my last post (I was manually copying it at
home from a paper listing), here's the correct version:

I use the following macro to lay down headers in FASM, it assumes heads
are separate from bodies (the cfpa points to the word's cfa).

When a word has non-standard interpretation or compilation behavior I
define the word twice using the bit mask in the header to find the
appropriate definition to compile or interpret. For example doublequote
has two definitions:

HEADER '"' , doublequote , forth , interpreter_bit
HEADER '"' , doublequote_directive , forth , compiler_bit or
immediate_bit

macro HEADER name*, symbol*, vocabulary*, mask* {

align bytes_per_cell
lfa_#symbol dd vocabulary

cfpa_#symbol dd cfa_#symbol

nfa_#symbol db (@f -name_#symbol -1) or mask

name_#symbol db name

Albert van der Horst

unread,

Jul 11, 2006, 8:31:50 PM7/11/06

to

In article <1152621489....@35g2000cwc.googlegroups.com>,

Charles Turner <vze2...@optonline.net> wrote:
>HI all-
>
>I'm writing a simple Forth for the PIC 18f2620 chip, now with the
>inner and outer interpreters mostly complete, working on the
>dictionary. I've got a rudimentary format that looks like this:
>
>; ************
>; * AND *
>; ************
>
>N_LAND:
> dw 3
> dw "AND"
> dw LAND ; CFA
> dw N_USLAS ; LFA: points to previous word's NFA
> dw N_LAND ; NFA
>LAND:
> goto NEXT ; a NOP at this point...
>
>As you can see, the format is heavily indebted to Albert van der
>Horst's ciForth.

Nice to hear that.

>
>My question is that it seems ciForth's NFA points to the previous
>word's CFA or PFA, not to its NFA. I'd assume on a dictionary search,
>one would want to traverse the name fields quickly, and so pointing
>directly to the previous NFA would be a good thing. But I'm sure there
>has to be a good reason for this 'indirection" in ciForth.

The design consideration in ciforth was a Forth with maximum
introspective capability and regularity.
So the name is a regular (ciforth) string laid out in memory
XX DC 3 ; Take a full cell for the count, may be 64 bits!
DB "AND"

Then the name field in the header contains a pointer to XX.
This costs two extra cells per header for the name alone.
I can imagine ways to get rid of both cells.

In ciforth a word is represented by something called DEA. This is the
only representation ever used, so a link field contains a DEA, an
execution token is a DEA, a wid is a DEA, and -- most importantly --
a high level word is a chain of DEA's etc. At offset 0 of the DEA
sits the CFA so you could say that the link field points to a CFA.
The fields are present in alphabetic order CFA DFA FFA LFA NFA XFA.

I'm afraid that this all is not very suitable for
low end processors. OTOH because ciforth is simple, it is
easy to adapt.

I think the best you can do is use the m4 macro machinery in
generating your Forth. Then after it is debugged, change the
macro's to generate the headers to your design.
If you adopt the dictionary search of ciforth there should be only one
high level word that is to be adapted to the new headers: MATCH.
After this all works, you can write the search routine in assembler
if more speed is needed.

>This is compounded for me because I think the PIC 18f chips can do a
>single level of indirection easily, but an additional level of
>indirection is a lot more complicated. I've pursued a Direct Threaded
>Forth as a result.

This again is best accomplished by adapting the macro's
that generate the headers, and the symbolic offsets in
the assembler.
For Direct Threaded Code CFA has to be moved to the last
position in a header. Inspect the assembler source
carefully for possible impacts.
This can be tested separately.

Then the various possibilities for the CFA have to be changed
simultaneously, for vocabularies, high level definitions, variable,
constants and user variables. This is a cold turkey transition, and
you are back to a debugging stage similar to first getting
your Forth to work. You may want to do additional tweaks later,
probably a DFA can be dispensed with now, a FFA need only be
a byte etc.

>I'd be appreciative of any insights offered here!

What has served me well, is using John Hayes test set,
plus the ciforth testset, after each change.

Streaming a testset through kermit and capturing the result,
requires some handshaking measures. You may contact me
for more help.

The most similar ciforth-implementation is for the 6809.
You can download it from the site below: m6809.html.
Throw away rom.asm and type
make rom.asm
to see how this words.

>Many thanks, Charles Turner
>

Groetjes Albert

--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- like all pyramid schemes -- ultimately falters.
alb...@spenarnc.xs4all.nl http://home.hccnet.nl/a.w.m.van.der.horst

Charles Turner

unread,

Jul 12, 2006, 10:08:06 AM7/12/06

to

Thanks Paul, Coos and Albert for your kind help and insighful comments!

On a related note, I thought I should be able to find an ANS conforming
version of the word QUIT written using the ANS CORE wordset. Looked
through the gForth source and still haven't found what I'm looking for
(although the manual has some good stuff about the text interpreter).
Some others too, but no luck so far.

Can anyone pint me to a source?

Best and thanks! Charles

jacko

unread,

Jul 12, 2006, 10:08:12 AM7/12/06

to

hi

i'd go for

LFA LEN+FALGS "name" CFA PFA .......

but thats me.

cheers

jeth...@gmail.com

unread,

Jul 12, 2006, 10:45:44 AM7/12/06

to

Charles Turner wrote:

> I thought I should be able to find an ANS conforming
> version of the word QUIT written using the ANS CORE wordset. Looked
> through the gForth source and still haven't found what I'm looking for
> (although the manual has some good stuff about the text interpreter).
> Some others too, but no luck so far.

> Can anyone pint me to a source?

I don't think there's any way. QUIT does things that you can't do with
other standard words.

It doesn't clear the data stack so you can't properly do it with ABORT
. (Could you store the contents of the data stack in a buffer, do QUIT
, and then restore the contents? No, ABORT will quit the word you're
executing and you'd need to type in a separate word to restore the data
stack.)

QUIT empties the return stack. There are no standard words that tell
you how deep the return stack is at the moment. You might find some
trick to do repeated R> to empty the return stack one item at a time,
but how do you know when to stop?

QUIT empties the text input buffer and starts accepting input from the
keyboard. The wayi to do that varies from system to system. You could
write code that does it correctly on some ANS Forth systems, but
nothing that's truly portable.

If you really want to do this, you might write a complete Forth
interpreter/compiler in ANS Forth. You'd have your own return stack.
Rewrite >R R> R@ DO I J LOOP QUIT ABORT SOURCE and a collection of
other words. It might be hard to do all that with only core words,
maybe easier if you allow a few more like SLITERAL .

Of course an interpreter/compiler written in ANS Forth will not perform
nearly as efficiently as the one that comes with your Forth system. But
if you want to play with it, it would be easier to change around than
it would be to continually recompile your system, and easier than
making your interpreter do auto-brain-surgery.

Coos Haak

unread,

Jul 12, 2006, 12:49:30 PM7/12/06

to

Op 12 Jul 2006 07:45:44 -0700 schreef j2th...@cavtel.net:

If you have CATCH and THROW, the following could do:

: terminal-input
begin
cr query interpret ." ok"
again
;

: quit
reset-return-stack
begin
['] terminal-input catch ?dup
while
handle-error reset-data-stack
repeat
cr ." Unexpected abort" 1 halt
;

jeth...@gmail.com

unread,

Jul 12, 2006, 4:22:27 PM7/12/06

to

If you have CATCh THROW then couldn't you do

: QUIT -56 THROW ;

?

Except that it will probably mess up the data stack depth.

How often do you really need to keep the data stack intact after an
ABORT ? I'd think, only when you want to look at what was on the stack
when things went wrong. Maybe QUIT belongs in the Programming Tools
wordset and not in Core.

Coos Haak

unread,

Jul 12, 2006, 5:05:20 PM7/12/06

to

Op 12 Jul 2006 13:22:27 -0700 schreef j2th...@cavtel.net:

> If you have CATCh THROW then couldn't you do
>
>: QUIT -56 THROW ;
>
> ?

Perhaps, but which word would contain the CATCH ?
I've looked at some implementations, they all give an error
message, and the standard says in the description of QUIT
... Do not display a message ...
I suppose no-one has -56 THROW for QUIT
There are some Win32Forth, CHForth and perhaps VFX that have a
CATCH in QUIT like the one in my previous posting.

> Except that it will probably mess up the data stack depth.

It's cleared ;-)

> How often do you really need to keep the data stack intact after an
> ABORT ? I'd think, only when you want to look at what was on the stack
> when things went wrong. Maybe QUIT belongs in the Programming Tools
> wordset and not in Core.

Good idea.

jeth...@gmail.com

unread,

Jul 12, 2006, 5:37:30 PM7/12/06

to

Coos Haak wrote:
> Op 12 Jul 2006 13:22:27 -0700 schreef j2th...@cavtel.net:

> > If you have CATCh THROW then couldn't you do

> >: QUIT -56 THROW ;

> Perhaps, but which word would contain the CATCH ?

It would be something inside the implementation. It would see that it's
code -56 and it would then leave the data stack alone but clear the
return stack and reset SOURCE to use the text input buffer.

> I've looked at some implementations, they all give an error
> message, and the standard says in the description of QUIT
> ... Do not display a message ...
> I suppose no-one has -56 THROW for QUIT

Yes, it would need a special case just for QUIT . If an implementor
knows how to do that then they can implement -56 THROW for QUIT . And
then you can do it that way.

> > Except that it will probably mess up the data stack depth.

> It's cleared ;-)

It would have to be a special case. I doubt many implementors would
consider it worth doing.

Your example code has reset-return-stack and several other words that
are not ANS and would be hard to implement using only ANS words. QUIT
does things that vary with implementation, and it's hard to write
portable code that will do the right thing.

Charles Turner

unread,

Jul 18, 2006, 8:25:59 AM7/18/06

to

A belated thanks to everyone who has sent their knnowledge in my
direction; I was away for a few days and am now back at work on this...

Here's something I'm puzzling over in the ANS standard:

Section 2.2.3 talks about parsed text notation, and this turns up in a
word like WORD, where the stack note is:

( char "<chars>ccc<char>" -- c-addr )

It's not clear to me how WORD understands where the text to parse is
located. Is this left up to the implementation, or have I missed
something in my reading of the standard?

Best and thanks, Charles

jeth...@gmail.com

unread,

Jul 18, 2006, 10:00:03 AM7/18/06

to

Charles Turner wrote:

> It's not clear to me how WORD understands where the text to parse is
> located. Is this left up to the implementation, or have I missed
> something in my reading of the standard?

WORD parses text in a buffer. SOURCE says where the buffer is and how
long it is.

The system is set up so that what you type goes into a buffer that
SOURCE will point to. It's up to the system where to put that buffer,
and it might put different lines in different places.

You can choose to switch the input to a block of text that's 1024
characters long, using the LOAD command.

You can choose to switch the input to a text file, using the
INCLUDE-FILE or the INCLUDED commands.

You can choose to tell it to parse whatever string you like, using the
EVALUATE command.

A lot of details of how that works is up to the implementation. The >IN
state variable says how far has already been read in the current
buffer. The optional SOURCE-ID command may tell you how the buffer was
filled -- from keyboard, string, or a particular file. The BLK state
variable gives the number of the current 1K block being parsed, and
that number should be zero when it's keyboard, text-file, or string
instead.

So yes, the address of the input buffer is entirely up to the system
most of the time, but you can find out where with SOURCE . And you can
very temporarily assign it yourself with EVALUATE .

Jerry Avins

unread,

Jul 19, 2006, 3:26:33 PM7/19/06

to

But remember that SOURCE returns two numbers: the character count, and
the address of the buffer. That's why 'source type <cr>' produces
'source type ok' as output. SOURCE itself puts <addr> 9 on the stack.

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯