Build your own Forth for Microchip PIC (Episode 838): Threading

none Byron Jeff

unread,

Jun 23, 2007, 3:26:10 PM6/23/07

to

If you read my introductory episode (#837, and arbitrarily chosen number
that reflects that the search for Microchip PIC 16F forths have been
asked for a while) I think I need to implement NEXT, ENTER, and EXIT in
order to run Forth words under Frank Sergeant's 3 instruction Forth
microkernel. Threading is the core of this issue. So let's examine the
possibilities.

As I said in the last missive, the PIC is quirky. It rears its ugly
little head right off the bat in several ways:

1. Ram is so precious that there's no way to run code in it.

2. Program and data flash are both available, but reading them
programmatically is very costly. In addition only 14 bits is available
from each word of program memory.

3. The hardware stack is both limited in size (8 levels deep) and
visibility (as in invisible).

Confined in this environment we need to figure out a threading model to
implement. Brad Rodriguez outlines several techniques in his moving
forth article: http://www.zetetics.com/bj/papers/moving1.htm

Here's a brief recap:

1. Indirect threading (ITC): code is a list of addresses that point to
addresses of the target code to execute.

2. Direct threading (DTC): code is a list of addresses that points to code
that jumps/calls the target code to execute. It's about the same as ITC
but removes one level of indirection.

3. Subroutine threading (STC): code is a set of subroutine calls. Uses the
hardware stack to manage movement.

4. Token Threading (TTC): code is a token. The token is then used to
look up the address of the code to execute.

Let's evaluate the efficacy of each relative to the pic's constraints.

Ordered from worst to first:

worst: STC. The pic simply doesn't have the hardware stack to support
it.

ok: ITC. extracting address from the pic's program memory is costly.
Having to do it on two levels of indirection is even more so.

better: DTC. Halves the cost of fetching the address. Jump instruction
is one cell and executes quickly.

best: TTC. Facilitates fast fetching of tokens. PIC has support for fast
jump tables. Far more efficient than ITC or DTC.

It's interesting that Brad calls TTC the slowest of the bunch. But in
the case of a PIC where fetching program memory programmatically is
close to 20 instruction cycles, TTC runs rings around both DTC and ITC.

I already have a similar scheme implemented for fetching bytecodes in my
NPCI virtual machine interpreter. Fundamentally each byte is encoded
into a pic RETLW instruction. By using the PCLATCH/PCL it's possible to do a
computed goto to any instruction in the address space.

So it looks like TTC with jump tables is the winner.

BAJ

Elizabeth D Rather

unread,

Jun 23, 2007, 4:00:01 PM6/23/07

to

none Byron Jeff wrote:
> If you read my introductory episode (#837, and arbitrarily chosen number
> that reflects that the search for Microchip PIC 16F forths have been
> asked for a while) I think I need to implement NEXT, ENTER, and EXIT in
> order to run Forth words under Frank Sergeant's 3 instruction Forth
> microkernel. Threading is the core of this issue. So let's examine the
> possibilities.
>
> As I said in the last missive, the PIC is quirky. It rears its ugly
> little head right off the bat in several ways:
>
> 1. Ram is so precious that there's no way to run code in it.
>
> 2. Program and data flash are both available, but reading them
> programmatically is very costly. In addition only 14 bits is available
> from each word of program memory.
>
> 3. The hardware stack is both limited in size (8 levels deep) and
> visibility (as in invisible).
>
> Confined in this environment we need to figure out a threading model to
> implement. Brad Rodriguez outlines several techniques in his moving
> forth article: http://www.zetetics.com/bj/papers/moving1.htm
>
> Here's a brief recap:
>

...

>
> best: TTC. Facilitates fast fetching of tokens. PIC has support for fast
> jump tables. Far more efficient than ITC or DTC.
>
> It's interesting that Brad calls TTC the slowest of the bunch. But in
> the case of a PIC where fetching program memory programmatically is
> close to 20 instruction cycles, TTC runs rings around both DTC and ITC.

As you have seen, so much depends on the specific machine architecture.
We implemented a TTC Forth on some low-end AVR processors with very
limited code space, and it ran faster than a native-code 8051 at
comparable clock speed.

> I already have a similar scheme implemented for fetching bytecodes in my
> NPCI virtual machine interpreter. Fundamentally each byte is encoded
> into a pic RETLW instruction. By using the PCLATCH/PCL it's possible to do a
> computed goto to any instruction in the address space.
>
> So it looks like TTC with jump tables is the winner.

Indeed. Although your extremely limited RAM is still going to be a
headache. Tell us again why you love the PIC16? ;-)

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310-491-3356
5155 W. Rosecrans Ave. #1018 Fax: +1 310-978-9454
Hawthorne, CA 90250
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Bruce McFarling

unread,

Jun 23, 2007, 4:23:55 PM6/23/07

to

On Jun 23, 3:26 pm, byron@upstairs.(none) (Byron Jeff) wrote:
> Here's a brief recap:

> better: DTC. Halves the cost of fetching the address. Jump instruction

> is one cell and executes quickly.

> best: TTC. Facilitates fast fetching of tokens. PIC has support for fast
> jump tables. Far more efficient than ITC or DTC.

There's also bit threading, where a bit in the execution token
determines whether it is a primitive or a compiled word. ENTER is then
built into the token fetch process launched by NEXT when the next
token is a compiled word, while EXIT is one of the primitives.

When I started out on Forth on a C64, 16Kbytes (8Kwords with 16-bit
addressing) was ample for a full Forth interpreter/compiler including
dictionary, so with headerless code in the PIC, 13 bits available to
contain compiled word addresses should be sufficient.

A project that looks at the Minimal Forth question is MAF, which is
defined in Forth.

>From the UK Fig Forthwrite Hall of Fame
http://www.figuk.plus.com/fame.htm

"Chris Jakeman

The editor of Forthwrite has published two applications:
FoSM
...
MAF
MAF is a minimal ANS Forth. It is written as an educational tool
for anyone planning to implement a Forth and includes extensive
documentation. MAF builds itself from a small kernel of 48 words and
provides all the words from the Core word set together with 25 from
extension word sets which are used in the building process. MAF is
also portable in confining the key decisions, such as dictionary
mechanism, cell size etc., to the kernel, which is detachable. The
kernel supplied is implemented in ANS Forth (so that MAF is ready to
run) and intended to be replaced by an equivalent kernel in Assembler
or C. Download MAF"

ftp://ftp.taygeta.com/pub/Forth/Applications/ANS/maf1v02.zip

none Byron Jeff

unread,

Jun 23, 2007, 4:44:56 PM6/23/07

to

In article <137quu5...@news.supernews.com>,
Elizabeth D Rather <erath...@forth.com> wrote:
>none Byron Jeff wrote:
Snippage...

>> I already have a similar scheme implemented for fetching bytecodes in my
>> NPCI virtual machine interpreter. Fundamentally each byte is encoded
>> into a pic RETLW instruction. By using the PCLATCH/PCL it's possible to do a
>> computed goto to any instruction in the address space.
>>
>> So it looks like TTC with jump tables is the winner.
>
>Indeed. Although your extremely limited RAM is still going to be a
>headache. Tell us again why you love the PIC16? ;-)

I have a laundry list:

1. Familiarity. I've been working with this family of parts since the
early 90's. So while I'm well aware of the quirks and limitations, they
are quirks and limitations I understand.

2. Availability. There have been times when parts like AVRs have been
made of "unobtainium". Microchip has been real consistent in making sure
that parts are in fact available. Their sample program means that I have
a wide variety of parts on hand.

3. Prototypeability (is that a word?). 5V and DIP packaging goes a long
long way in getting rolling on a prototype. Literally throw one on a
breadboard and go.

4. Price/performance. Even after sampling prices are ultracheap.

But I keep their usage to fit their limitations. A one off project or
three where I need some smarts but not a ton of functionality. For
example I have interns this summer that are working on coding up the
guts to a self-built vending machine. It does require some standalone
smarts, but nothing too high grade.

As for RAM my first target is the 16F88. It has 368 bytes of RAM. even
at 64 cells each for both stacks that'll leave plenty for the rare
variable or three I'd need.

Previous projects I've gotten done include a sunrise/sunset outdoor
light controller that uses interpolated sunrise/sunset data to determine
when to turn on/off the lights and a homebrew thermostat.

Small, reliable, quickly developed projects are the goal.

BAJ

Bruce McFarling

unread,

Jun 23, 2007, 6:30:55 PM6/23/07

to

On Jun 23, 4:44 pm, byron@upstairs.(none) (Byron Jeff) wrote:
> As for RAM my first target is the 16F88. It has 368 bytes of RAM. even
> at 64 cells each for both stacks that'll leave plenty for the rare
> variable or three I'd need.

64 cells for a return stack is fairly deep, if you aren't using
recursive definitions. If you start to run out of wiggle room, you
might think of data/return at 32 cells and 16 cells.

none Byron Jeff

unread,

Jun 23, 2007, 6:32:03 PM6/23/07

to

In article <1182630235.2...@m36g2000hse.googlegroups.com>,

Bruce McFarling <agi...@netscape.net> wrote:
>On Jun 23, 3:26 pm, byron@upstairs.(none) (Byron Jeff) wrote:
>> Here's a brief recap:
>
>> better: DTC. Halves the cost of fetching the address. Jump instruction
>> is one cell and executes quickly.
>
>> best: TTC. Facilitates fast fetching of tokens. PIC has support for fast
>> jump tables. Far more efficient than ITC or DTC.
>
>There's also bit threading, where a bit in the execution token
>determines whether it is a primitive or a compiled word. ENTER is then
>built into the token fetch process launched by NEXT when the next
>token is a compiled word, while EXIT is one of the primitives.

Hmmm. I'll have to think on that one.

>When I started out on Forth on a C64, 16Kbytes (8Kwords with 16-bit
>addressing) was ample for a full Forth interpreter/compiler including
>dictionary, so with headerless code in the PIC, 13 bits available to
>contain compiled word addresses should be sufficient.

That's good to know.

>A project that looks at the Minimal Forth question is MAF, which is
>defined in Forth.

I downloaded it a couple of days ago during my research. I still didn't
get the feeling that it factored out the primatives as effectively as I
would like. What I'm really looking for is a specification for a set of
primative words without a implementation attached. Since all of these
forths are real, they necessarily bind to a particular architecture. I'm
kind of looking for the C++ equivalent of an abstract class that has the
specification without implmentation.

On a second read just now, the primatives make more sense. It's still a
bit conflicting to me because the primatives are in fact implemented in
Forth. Also I still don't get the feel of a minimally implemented
kernel. I guess Jim Lewis set kind of a standard in my mind with his
nanoforth site:

http://www.quirkle.com/misc/forth.htm

I know it's incomplete. I also know that a critical subset of words
haven't been implemented in it. But I'd be real happy to find something
that has a complete description in the same flavor as his page. I would
in fact prefer something that wasn't implemented because it actually
clutters the concept.

>
>>From the UK Fig Forthwrite Hall of Fame
>http://www.figuk.plus.com/fame.htm
>
>"Chris Jakeman
>
>The editor of Forthwrite has published two applications:
>FoSM
> ...
>MAF
> MAF is a minimal ANS Forth. It is written as an educational tool
>for anyone planning to implement a Forth and includes extensive
>documentation. MAF builds itself from a small kernel of 48 words and
>provides all the words from the Core word set together with 25 from
>extension word sets which are used in the building process. MAF is
>also portable in confining the key decisions, such as dictionary
>mechanism, cell size etc., to the kernel, which is detachable. The
>kernel supplied is implemented in ANS Forth (so that MAF is ready to
>run) and intended to be replaced by an equivalent kernel in Assembler
>or C. Download MAF"

48 words. Ouch! When bootstrapping the primitives are critical.

You know what maybe I should try a different approach. I'll start
another episode with it.

Thanks for the pointer.

BAJ

none Byron Jeff

unread,

Jun 23, 2007, 6:33:24 PM6/23/07

to

In article <1182637855.7...@q75g2000hsh.googlegroups.com>,

Even better. Thanks.

BAJ

Elizabeth D Rather

unread,

Jun 23, 2007, 10:49:28 PM6/23/07

to

none Byron Jeff wrote:
> In article <137quu5...@news.supernews.com>,
> Elizabeth D Rather <erath...@forth.com> wrote:

...

> But I keep their usage to fit their limitations. A one off project or
> three where I need some smarts but not a ton of functionality. For
> example I have interns this summer that are working on coding up the
> guts to a self-built vending machine. It does require some standalone
> smarts, but nothing too high grade.

Interesting. We've done some commercial work on vending machines (for a
company in Tennessee), including an MDB protocol implementation and
support for payment by cash or smart cards. 8051's are the most common
part used for this.

> As for RAM my first target is the 16F88. It has 368 bytes of RAM. even
> at 64 cells each for both stacks that'll leave plenty for the rare
> variable or three I'd need.

Well, I'd squeeze stacks down to 32 cells, allow a little for variables
and use the rest for testing definitions. Or use a part with more ram
during development and a smaller one for the production units.

Actually, I'd use a different MCU ;-)

> Previous projects I've gotten done include a sunrise/sunset outdoor
> light controller that uses interpolated sunrise/sunset data to determine
> when to turn on/off the lights and a homebrew thermostat.
>
> Small, reliable, quickly developed projects are the goal.

And a good goal, too, but I really don't see getting a really good
development environment on such a limited chip.

Bruce McFarling

unread,

Jun 24, 2007, 1:52:50 AM6/24/07

to

On Jun 23, 6:32 pm, byron@upstairs.(none) (Byron Jeff) wrote:
> 48 words. Ouch! When bootstrapping the primitives are critical.

They can be reduced. Some by simply omitting them, when they are
primitives because they cannot be synthesized from other words, but
you do not require them. If there is no dictionary and no interpreter
in the PIC, then that may increase the number of primitives that may
be dispensed with.

Others by synthesizing them from a simpler Forth model. For example:

>A A> ... from data stack to an address pointer, and from pointer to data stack.
A@ A! ... store cell on stack at address in A, and fetch cell on stack
at address in A
A+ ... increment value in A

and the Forth-94 words >R R> ... from data stack to return stack, and
conversely.

: DROP ( x -- ) >A ;
: DUP ( x -- x x ) >A A> A> ;
: OVER ( xa xb -- xa xb xa ) >R >A A> R> A> ;
: SWAP ( xa xb -- xb xa ) >R >A R> A> ;
: ROT ( xa xb xc -- xb xc xa ) >R >R >A R> R> A> ;
: @ ( a -- x ) >A A@ ;
: ! ( x a -- ) >A A! ;

\ two address units per cell
: 2@ ( a -- xl xh ) >A A@ >R A+ A+ A@ R> ;
: 2! ( xl xh a -- ) >A A! A+ A+ A! ;

\ one address unit per cell
: 2@ ( a -- xl xh ) >A A@ >R A+ A@ R> ;
: 2! ( xl xh a -- ) >A A! A+ A! ;

: 1+ ( x -- x+1 ) >A A+ A> ;
: 2+ ( x -- x+2) >A A+ A+ A> ;

: R@ ( -- x | R: x -- ) R> >A A> A> R> ;
: 2>R ( xl xh -- | R: -- xl xh ) >A >R A> >R ;
: 2R> ( -- xl xh | R: xl xh -- ) R> >A R> A> ;

etc. ... presumably in this context, after getting the boostrap up and
running, the more cumbersome definitions may be replaced.

Stephen Pelc

unread,

Jun 24, 2007, 8:00:33 AM6/24/07

to

On Sat, 23 Jun 2007 16:49:28 -1000, Elizabeth D Rather
<erath...@forth.com> wrote:

>Interesting. We've done some commercial work on vending machines (for a
>company in Tennessee), including an MDB protocol implementation and
>support for payment by cash or smart cards. 8051's are the most common
>part used for this.

...

>Actually, I'd use a different MCU ;-)

MPE did a vending machine project as a rescue job for a client.
The previous consultancy failed in 2.5 years. MPE delivered a
much more comprehensive embedded system and ... and ... in
under nine months.

The previous consultancy was using PICs.

>> Small, reliable, quickly developed projects are the goal.
>
>And a good goal, too, but I really don't see getting a really good
>development environment on such a limited chip.

If the volumes are less than (say) 10k per annum, the cost of
writing software vastly exceeds the cost of the MCU. A slightly
more capable and expensive MCU can save time and money.

Stephen

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads

none Byron Jeff

unread,

Jun 24, 2007, 8:40:31 AM6/24/07

to

In article <1182664370.2...@w5g2000hsg.googlegroups.com>,

Bruce McFarling <agi...@netscape.net> wrote:
>On Jun 23, 6:32 pm, byron@upstairs.(none) (Byron Jeff) wrote:
>> 48 words. Ouch! When bootstrapping the primitives are critical.
>
>They can be reduced. Some by simply omitting them, when they are
>primitives because they cannot be synthesized from other words, but
>you do not require them. If there is no dictionary and no interpreter
>in the PIC, then that may increase the number of primitives that may
>be dispensed with.

Possibly. But you have to analyze the entire core tree to determine if a
word is in fact not needed.

>Others by synthesizing them from a simpler Forth model.

This is what I've been looking for. A factored primitive model that is
architecture independent along with a coding of core words from that
model.

For example:
---------

>A ... from data stack to an address pointer

A> ... from pointer to data stack.

A! ... store cell on stack at address in A

A@ ... fetch cell at address in A and write to stack

A+ ... increment value in A

>R ... from data stack to return stack, and
R> ... from return stack to data stack
-----------

Now that's an easily implementable model. Only one question:

1. From the description above for A! and the descriptions for ! and 2!
below the value to store is in A and the address is on the stack, right?
Something doesn't match up.

: DROP ( x -- ) >A ;
: DUP ( x -- x x ) >A A> A> ;
: OVER ( xa xb -- xa xb xa ) >R >A A> R> A> ;
: SWAP ( xa xb -- xb xa ) >R >A R> A> ;
: ROT ( xa xb xc -- xb xc xa ) >R >R >A R> R> A> ;
: @ ( a -- x ) >A A@ ;
: ! ( x a -- ) >A A! ;

\ two address units per cell
: 2@ ( a -- xl xh ) >A A@ >R A+ A+ A@ R> ;
: 2! ( xl xh a -- ) >A A! A+ A+ A! ;

\ one address unit per cell
: 2@ ( a -- xl xh ) >A A@ >R A+ A@ R> ;
: 2! ( xl xh a -- ) >A A! A+ A! ;

: 1+ ( x -- x+1 ) >A A+ A> ;
: 2+ ( x -- x+2) >A A+ A+ A> ;

: R@ ( -- x | R: x -- ) R> >A A> A> R> ;
: 2>R ( xl xh -- | R: -- xl xh ) >A >R A> >R ;
: 2R> ( -- xl xh | R: xl xh -- ) R> >A R> A> ;

>etc. ... presumably in this context, after getting the boostrap up and
>running, the more cumbersome definitions may be replaced.

Exactly.

Thanks.

BAJ

Bruce McFarling

unread,

Jun 24, 2007, 4:44:05 PM6/24/07

to

On Jun 24, 8:40 am, byron@upstairs.(none) (Byron Jeff) wrote:
> Possibly. But you have to analyze the entire core tree
> to determine if a word is in fact not needed.

That analysis would be possible with a text editor with a search
function.

Start at the back, go to the definition you want to omit in the colon
definition model that rests on the 48 primitives, do a text search for
the word, and it quickly becomes clear if it is used. Delete it if
not. Keep going up until the colon definitions are trimmed down.

Then start at the primitives, and do the same search ... through the
colon definition file, since the primitives in MAF are defined using
the underlying Forth-94 system.

Then reduce the primitives as below ... I don't think the primitives
should reduce below 16, but they could definitely get below 32.

> For example:

> >A ... from data stack to an address pointer

> A> ... from pointer to data stack.
> A! ... store cell on stack at address in A
> A@ ... fetch cell at address in A and write to stack
> A+ ... increment value in A
> >R ... from data stack to return stack, and
> R> ... from return stack to data stack

> Now that's an easily implementable model. Only one question:

> 1. From the description above for A! and the descriptions
> for ! and 2! below the value to store is in A and the address
> is on the stack, right?

No, the address is in A, and the value is on the stack. ! has
the address on top and the value to store underneath it.
That was off the top of my heads (I used some of those to
save space in trying to implement a STC for a 6502, part
of it is attractive) ... maybe the mneumonic is properly @A
and !A

> : @ ( a -- x ) >A @A ;
> : ! ( x a -- ) >A !A ;

> \ two address units per cell

> : 2@ ( a -- xl xh ) >A @A >R A+ A+ @A R> ;
> : 2! ( xl xh a -- ) >A !A A+ A+ !A ;

> \ one address unit per cell

> : 2@ ( a -- xl xh ) >A @A >R A+ @A R> ;
> : 2! ( xl xh a -- ) >A !A A+ !A ;

These are the ones that I can recall off the top of
my head, because given the size of 2@ and 2! with an
X-indexed data stack on the 65C02, "A" as a 16-bit
zero page location is attractive for cutting the
size of the kernel.

none Byron Jeff

unread,

Jun 24, 2007, 7:56:37 PM6/24/07

to

In article <1182717845....@p77g2000hsh.googlegroups.com>,

Bruce McFarling <agi...@netscape.net> wrote:
>On Jun 24, 8:40 am, byron@upstairs.(none) (Byron Jeff) wrote:
>> Possibly. But you have to analyze the entire core tree
>> to determine if a word is in fact not needed.
>
>That analysis would be possible with a text editor with a search
>function.
>
>Start at the back, go to the definition you want to omit in the colon
>definition model that rests on the 48 primitives, do a text search for
>the word, and it quickly becomes clear if it is used. Delete it if
>not. Keep going up until the colon definitions are trimmed down.
>
>Then start at the primitives, and do the same search ... through the
>colon definition file, since the primitives in MAF are defined using
>the underlying Forth-94 system.

OK. That makes sense.

>Then reduce the primitives as below ... I don't think the primitives
>should reduce below 16, but they could definitely get below 32.
>
>> For example:
>> >A ... from data stack to an address pointer
>
>> A> ... from pointer to data stack.
>> A! ... store cell on stack at address in A
>> A@ ... fetch cell at address in A and write to stack
>> A+ ... increment value in A
>> >R ... from data stack to return stack, and
>> R> ... from return stack to data stack
>
>> Now that's an easily implementable model. Only one question:
>
>> 1. From the description above for A! and the descriptions
>> for ! and 2! below the value to store is in A and the address
>> is on the stack, right?
>
>No, the address is in A, and the value is on the stack. ! has
>the address on top and the value to store underneath it.
>That was off the top of my heads (I used some of those to
>save space in trying to implement a STC for a 6502, part
>of it is attractive) ... maybe the mneumonic is properly @A
>and !A

You are of course correct. For some reason I imprinted in my mind that
the TOS was on the left. That's what I get for not actually reading the
description of the notation. Some kind soul pointed out to me on my
bytecode description post.

>
>> : @ ( a -- x ) >A @A ;
>> : ! ( x a -- ) >A !A ;
>
>> \ two address units per cell
>> : 2@ ( a -- xl xh ) >A @A >R A+ A+ @A R> ;
>> : 2! ( xl xh a -- ) >A !A A+ A+ !A ;
>
>> \ one address unit per cell
>> : 2@ ( a -- xl xh ) >A @A >R A+ @A R> ;
>> : 2! ( xl xh a -- ) >A !A A+ !A ;
>
>These are the ones that I can recall off the top of
>my head, because given the size of 2@ and 2! with an
>X-indexed data stack on the 65C02, "A" as a 16-bit
>zero page location is attractive for cutting the
>size of the kernel.

It's very helpful. Thanks. I'll think it through as I'm working through
the MAF definitions.

BAJ