Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Tiny semi-tethered Forth idea

682 views
Skip to first unread message

Paul Rubin

unread,
Nov 11, 2017, 5:10:55 PM11/11/17
to
Tell me what you think of this. The idea is to decrease the Forth
interpreter program space in a small embedded target: I'm thinking of
the 60 cent STM8 boards you can get on AliExpress, that have 8k of
program flash and 1k of ram. It's not a true tethered Forth with an
interpreter at each end, but a way of offloading the target dictionary
to a simple proxy on the host side, using a host protocol something like
Frank Sergeant's 3-instruction Forth. It would probably be done by
modifying the STM8 eForth to implement the protocol.

Basically, the target would run shrunken versions of the usual address
and text interpreters, with 16 bit cells. The address interpreter would
use ITC, except with 2 bits reserved for "this is an immediate word"
(eliminating a separate field) and "this is the last word of the
definition (eliminating NEXT). That leaves 14 address bits which is
plenty for those small processors.

The text interpreter would do 2 things differently:

1) CREATE "name" would not actually copy "name" to the dictionary,
but would instead do something like advance HERE and run
: host-create ( addr u ) 'C' emit DUP emit ( sends the low byte of u )
TYPE ( sends the string to the host ) ;

2) instead of remembering and looking up names directly, the text
interpreter would read tokens like "0123-DUP" instead of DUP.
$0123 is the code address of the word and "-DUP" is basically a
comment for user convenience. Then the target would execute
the ITC code starting at $0123.

At the host side you'd run a simple program that remembered all the
locations of the Forth words and inserts the address of any word you
type before sending the word itself. So when you type "DUP" the host
would send "0123-DUP" to the target, and so on.

Also, some functions like decimal conversion could be done on the host.
So running "123 ." would send ".7B" to the host, which would print 123.
The host could also convert user input "123" into "$7B" before sending
to the target, so the target would not have to decode decimal.

Obviously it could get fancier in many ways, but the above seems like an
easy start.

Alex

unread,
Nov 11, 2017, 5:38:35 PM11/11/17
to
On 11-Nov-17 22:10, Paul Rubin wrote:

>
> Basically, the target would run shrunken versions of the usual address
> and text interpreters, with 16 bit cells. The address interpreter would
> use ITC, except with 2 bits reserved for "this is an immediate word"

IMMEDIATE is a function of the word at compile time. Why would the
compiled runtime word need a bit for this?

What about a bit indicating a literal? That might be more useful in
terms of density; a 15 bit (or even smaller) literal covers quite a use
range.

> (eliminating a separate field) and "this is the last word of the
> definition (eliminating NEXT). That leaves 14 address bits which is
> plenty for those small processors.
>


--
Alex

Paul Rubin

unread,
Nov 11, 2017, 6:04:36 PM11/11/17
to
Alex <al...@rivadpm.com> writes:
> IMMEDIATE is a function of the word at compile time. Why would the
> compiled runtime word need a bit for this?

Oh good point, the host side can remember whether the word is immediate.

> What about a bit indicating a literal? That might be more useful in
> terms of density; a 15 bit (or even smaller) literal covers quite a
> use range.

That sounds like a good idea too. Thanks!

Would there also usually be a bit indicating that the word is CODE? Or
would there just be a special "flip to CODE" instruction that the ITC
interpreter recognizes? I'd expect that most of the usual Forth words
would be CODE.

rickman

unread,
Nov 12, 2017, 12:50:04 AM11/12/17
to
Isn't that an existing part of any ITC implementation? Why would you need
to flag anything as being code? In ITC every word has a code field which is
executable code for the host processor. In a colon definition this code
simply invokes the code that processes the parameter list as pointers to
Forth definitions.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

Paul Rubin

unread,
Nov 12, 2017, 12:56:16 AM11/12/17
to
rickman <gnu...@gmail.com> writes:
> Why would you need to flag anything as being code? In ITC every word
> has a code field which is executable code for the host processor.

The idea is to get rid of the code field by using a bit in the address
word instead. That saves a byte or two of program memory for each
word in the system, and maybe simplifies the interpreters.

But, I got rid of the CODE bit entirely. I just compile all the CODE
words at the beginning of the program, so they're below a certain
program address (top_of_code). Then the address interpreter just
compares each address to top_of_code to decide whether to execute
as ITC or as CODE.

Helmar Wodtke

unread,
Nov 12, 2017, 2:03:28 AM11/12/17
to
You could implement a "jump bit". txfwia primitives are virtually at low memory addresses where tc can not jump to. Original awk has no AND/OR/XOR in a fast native version, so I use negative values. For example SWAP (a primitive) is -3. The cell [3] can be @/!-ed, but you can not "jump to". Positive values "call" threaded code and primitives can not be called... There is no lost in address space for C-version as txfwia only addresses cells. I decided not to implement common literals as primitives (say 0..255) as it makes implementation a bit more complex.

Regards,
-Helmar

Paul Rubin

unread,
Nov 12, 2017, 2:16:57 AM11/12/17
to
Helmar Wodtke <hel...@gmail.com> writes:
> You could implement a "jump bit". txfwia primitives are virtually at
> low memory addresses where tc can not jump to. Original awk has no
> AND/OR/XOR in a fast native version, so I use negative values.

What is txfwia? How is Awk involved? Using a whole bit (half the
address space) to indicate CODE seems excessive: with LITERAL, LAST, and
CODE bits, that leaves just 13 address bits or 8k addresses. It would
be good to use this interpreter with 16k parts. Above 16k these hacks
aren't so useful. I imagine having maybe 3k of CODE in the system which
isn't all that much.

You make a good point, that using cell addresses rather than byte
addresses can rescue an address bit at the cost of a double-byte shift
for every instruction word. That's significant overhead but it
could be useful.

Helmar Wodtke

unread,
Nov 12, 2017, 2:35:22 AM11/12/17
to
Am Sonntag, 12. November 2017 08:16:57 UTC+1 schrieb Paul Rubin:
> Helmar Wodtke <hel...@gmail.com> writes:
> > You could implement a "jump bit". txfwia primitives are virtually at
> > low memory addresses where tc can not jump to. Original awk has no
> > AND/OR/XOR in a fast native version, so I use negative values.
>
> What is txfwia? How is Awk involved?

It's "Tiny eXperimental Forth Written In Awk" https://bitbucket.org/helmwo/txfwia/src
My "game" in background is that it works with original-awk (restricting to gawk would be much easier) and that it implements a lot of Forth200x. There is also a C-version that I can use for debugging and that is much faster - but it implements a lot of the bugs from the awk version, so it's no general purpose useful Forth for now.

> Using a whole bit (half the
> address space) to indicate CODE seems excessive: with LITERAL, LAST, and
> CODE bits, that leaves just 13 address bits or 8k addresses. It would
> be good to use this interpreter with 16k parts. Above 16k these hacks
> aren't so useful. I imagine having maybe 3k of CODE in the system which
> isn't all that much.

I mean "BRANCH" or "AHEAD" etc. IF compiles for example a primitive that simply skips next word in threaded code if the condition is true. This primitive you can combine with every other primitive, a jump or a call. Basically:

(IF) if (tos != 0) { cp++; }; DROP

and IF is compiling (IF) and doing AHEAD. Without reducing a jump to a cell-sized word in threaded code this would be not that compact and practical. If one has also (!IF) or similar, ?DUP would be (!IF) DUP.


> You make a good point, that using cell addresses rather than byte
> addresses can rescue an address bit at the cost of a double-byte shift
> for every instruction word. That's significant overhead but it
> could be useful.

Maybe your processor supports [base + register*2] or similar as scheme.

Regards,
-Helmar

Rod Pemberton

unread,
Nov 12, 2017, 2:50:25 AM11/12/17
to
On Sat, 11 Nov 2017 21:56:15 -0800
Paul Rubin <no.e...@nospam.invalid> wrote:

> The idea is to get rid of the code field by using a bit in the address
> word instead. That saves a byte or two of program memory for each
> word in the system, and maybe simplifies the interpreters.
>

Some years ago, I recycled my Forth ITC interpreter into a CFA-less
interpreter for a non-Forth situation. Instead of a bit in the address
word, I simply used address value zero to represent code words (or
primitives). Zero is immediately followed by the address of the code
to call for code words (or primitives). I.e., zero is an escape
code followed by the CFA value. This would probably be more compact if
you used byte codes, but this still eliminates the CFA for all words.
For those that aren't following, this modification makes the interpreter
assume all Forth words are threaded code to be interpreted unless a zero
is found, which indicates a code word (or primitive). I.e., ENTER or
doCOL, doVAR, etc is removed from the start of compiled Forth words
(threaded code) and also primitives. Hence, there is no need for the
CFA field in the dictionary. After this modification, the interpreter
can be rewritten to function more like a normal byte code interpreter,
or an interpreter coded in C.

However, IIRC, there was a problem with implementing DOES> after this
modification... You may have the same problem. If so, you may have
difficulty building up Forth from primitives as words like CONSTANT
2CONSTANT SET VALUE DEFER may use DOES> . CONSTANT is probably the
only real problem.

> But, I got rid of the CODE bit entirely. I just compile all the CODE
> words at the beginning of the program, so they're below a certain
> program address (top_of_code). Then the address interpreter just
> compares each address to top_of_code to decide whether to execute
> as ITC or as CODE.

...


Rod Pemberton
--
Does the increase in pedestrian deaths correspond with the increase in
Millennials?

Paul Rubin

unread,
Nov 12, 2017, 4:42:09 AM11/12/17
to
Rod Pemberton <EmailN...@voenflacbe.cpm> writes:
> Zero is immediately followed by the address of the code
> to call for code words (or primitives).

Interesting, but given the ITC approach, a zero cell before every CODE
address is likely to burn a lot of program space, especially given the
likely density of CODE words (all the stack primitives etc) in the user
program. Maybe byte code would make the whole thing smaller but then
you'd need an interpreter switching on the byte codes, instead of just
an address interpreter.

> However, IIRC, there was a problem with implementing DOES> after this
> modification... You may have the same problem. If so, you may have
> difficulty building up Forth from primitives as words like CONSTANT
> 2CONSTANT SET VALUE DEFER may use DOES> . CONSTANT is probably the
> only real problem.

Hmm, there might have to be special hair in the dictionary to make sure
TO won't overwrite a constant. It doesn't seem that bad though. I plan
to have <BUILDS DOES> instead of CREATE DOES>, so CREATE would just push
a literal, while <BUILDS would let DOES> start writing after the address
of the newly created symbol.

Paul Rubin

unread,
Nov 12, 2017, 4:43:11 AM11/12/17
to
Helmar Wodtke <hel...@gmail.com> writes:

> and IF is compiling (IF) and doing AHEAD. Without reducing a jump to a
> cell-sized word in threaded code this would be not that compact and
> practical. If one has also (!IF) or similar, ?DUP would be (!IF) DUP.

Wouldn't IF just translate into BRANCH0 which would be a CODE word that
conditionally skips the next word?

Helmar Wodtke

unread,
Nov 12, 2017, 4:58:07 AM11/12/17
to
I have only BRANCH (unconditionally). That is compiling a negative value that corrects the code pointer without pushing on return stack. (IF) works just similar to 0BRANCH - if there is a non-zero value in tos, it adjusts the code pointer by 1. If I compile (IF) + negative value, it's exactly the same 0BRANCH would do. And it has exactly the same size. But I can also compile (IF) + SWAP and it's similar to "0= IF SWAP THEN". This is more powerful. As positive values are calls, it's also possible to implement a conditional call instead of a jump - for example

: FOO ... do something ... ;

(IF) FOO

would be like

0= IF FOO THEN

Regards,
-Helmar


Albert van der Horst

unread,
Nov 12, 2017, 11:02:56 AM11/12/17
to
In article <87vaigb...@nightsong.com>,
No need to go ahead with premature optimisation. Most modern small
Forth's dancing and singing fit in 8K. CamelForth eForth 4e4th etc.
So you can design yourself a nice simple Forth without complications
and it'll fit. Enjoy building a new Forth for the STM8!

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

a...@littlepinkcloud.invalid

unread,
Nov 12, 2017, 11:30:42 AM11/12/17
to
Paul Rubin <no.e...@nospam.invalid> wrote:
> Tell me what you think of this. The idea is to decrease the Forth
> interpreter program space in a small embedded target: I'm thinking of
> the 60 cent STM8 boards you can get on AliExpress, that have 8k of
> program flash and 1k of ram. It's not a true tethered Forth with an
> interpreter at each end, but a way of offloading the target dictionary
> to a simple proxy on the host side, using a host protocol something like
> Frank Sergeant's 3-instruction Forth. It would probably be done by
> modifying the STM8 eForth to implement the protocol.

You're in danger of re-inventing chipFORTH! The idea there was that
all of the dictionary structures were on the host; the threaded and
executable code were on the target along with the runtime data. The
protocol had only to be able to read and write memory and tell the
target what to execute. Words like . could run entirely on the host,
with the stack copied back and forth as required. I suspect that
SwiftX from Forth, Inc does something similar today, but it's probably
more sophisticated and convenient to use.

Andrew.

Paul Rubin

unread,
Nov 12, 2017, 1:46:13 PM11/12/17
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> No need to go ahead with premature optimisation. Most modern small
> Forth's dancing and singing fit in 8K. CamelForth eForth 4e4th etc.
> So you can design yourself a nice simple Forth without complications
> and it'll fit. Enjoy building a new Forth for the STM8!

STM8 eForth is around 5k and it's very minimal, but yeah, maybe I should
get it working rather than mess with writing something new. I think
CamelForth/4e4th is bigger. 4e4th is around 8k on the MSP430 so would
likely not fit in an 8k STM8.

Paul Rubin

unread,
Nov 12, 2017, 1:55:20 PM11/12/17
to
a...@littlepinkcloud.invalid writes:
> You're in danger of re-inventing chipFORTH! The idea there was that
> all of the dictionary structures were on the host; the threaded and
> executable code were on the target along with the runtime data. The
> protocol had only to be able to read and write memory and tell the
> target what to execute. Words like . could run entirely on the host,
> with the stack copied back and forth as required.

Interesting! Are there docs around? Yes, that's basically what I'm
doing: I got rid of the 1234-DUP stuff so all that happens now is that
dictionary CREATE and lookup are remote calls to the host, plus some
CODE words also use the host instead of running locally. The text
interpreter runs completely on the target and I don't copy the stack
around. Communication with the host is just a few 1-letter commands so
the host isn't a full Forth interpreter. I'll give some thought to
copying the whole stack.
Message has been deleted

Rod Pemberton

unread,
Nov 12, 2017, 2:44:14 PM11/12/17
to
On Sun, 12 Nov 2017 01:42:05 -0800
Paul Rubin <no.e...@nospam.invalid> wrote:

> Rod Pemberton <EmailN...@voenflacbe.cpm> writes:
> > Zero is immediately followed by the address of the code
> > to call for code words (or primitives).
>
> Interesting, but given the ITC approach, a zero cell before every CODE
> address is likely to burn a lot of program space, especially given the
> likely density of CODE words (all the stack primitives etc) in the
> user program.

Well, if they're inlined, yes. If you use an address for the
primitive in compiled words, like any other Forth word, but then have
the primitive definition to be just a zero and a code address, it
shouldn't be so bad. Yes? I.e., you'd have maybe 30 to 60 some
primitives with an extra zero, since primitives weren't inlined.

The code address can be any size because the zero is an escape. E.g.,
you could use smaller addresses, say 16-bits, for ITC code, instead
of, say 32-bits, for the primitive code address, or you could use byte
codes for the compiled high-level Forth, etc.

It also lends itself well to C code, if you use byte codes, as Forth's
in C are usually switch() based. Except for mine, I've not seen another
Forth in C that uses an actual ITC interpreter.

Cecil Bayona

unread,
Nov 12, 2017, 4:11:20 PM11/12/17
to
Interesting stuff you are doing. A few years back I bought a couple of
boards (Chinese W1209 board) with the ST8M CPU for a dollar and small
change. It had three LED 7-segment displays and several push buttons, it
was a very nice board with eFORTH loaded up.

I thought at the time that a nice cross-compiler was what was needed so
the limited space would be used by applications only but nothing was
available

The Chinese seem to like that chip so they are used in sorts of devices,
as a result boards are available at very inexpensive prices.
--
Cecil - k5nwa

rickman

unread,
Nov 12, 2017, 4:59:13 PM11/12/17
to
If I understand what you are saying, the definition of a code word will not
have a code field per-se. Colon definitions will contain a list of
addresses of the words it is defined of. In code definitions the first
address is zero in which case it will be followed by an address which is to
the code for this word. The address in any colon definition using this word
will be used the same as any other word, but the NEXT code will have to
check every colon address for the zero escape address, no? Is that simpler
and faster than using a CFA?

I'm not experienced in writing Forth interpreters. I was reading a bit on
this and there seemed to be a distinction between NEXT and ENTER. If so, I
suppose the zero check only needs to be done on the first word in a
definition to see if it is a colon or a code word.

Paul Rubin

unread,
Nov 12, 2017, 5:22:31 PM11/12/17
to
TG9541 <thomas....@gmail.com> writes:
> got problems getting it working? Did you try one of the stock binaries
> in from GitHub Releases page?

I haven't made any attempt to get STM8 eForth working yet--the boards
are still in their little baggies and I don't have an STLink to program
them with. But I appreciate the advice and I'm sure I'll need it when
the time comes.

With sdcc -mstm8 it looks like my address interpreter is around 120
bytes of asm code, not bad. I'd like to keep the whole thing in the
under-3k range.

Your wiki page says ucsim can emulate the STM8 but the ucsim page itself
doesn't mention this: is STM8 supported? If not, I might try an AVR
emulator or something. Currently I'm just using my Linux laptop and
gcc for testing.

> By the way, I've had the idea to add a "bytecode" mode to STM8 eForth:
> an STC Forth with variable length opcodes for a very simple VM.

That sounds possibly worthwhile. I coded something like it a while
back, inspired by the GA144. How much space does the eForth dictionary
use on the STM8? I imagine getting rid of that and offloading more
interactive functions (maybe even the text interpreter) to the host
would make it smaller.

Does STM8 eForth have a multitasker?

Paul Rubin

unread,
Nov 12, 2017, 5:36:28 PM11/12/17
to
Rod Pemberton <EmailN...@voenflacbe.cpm> writes:
> If you use an address for the primitive in compiled words, like any
> other Forth word, but then have the primitive definition to be just a
> zero and a code address, it shouldn't be so bad.

You mean there would be an ITC word that calls the CODE word? Hmm
I guess you could do that, but it's more space and more overhead.

> Yes? I.e., you'd have maybe 30 to 60 some primitives with an extra
> zero, since primitives weren't inlined.

Sometime I want to give some thought to STC with inlined primitives.

> It also lends itself well to C code, if you use byte codes, as Forth's
> in C are usually switch() based. Except for mine, I've not seen another
> Forth in C that uses an actual ITC interpreter.

Gforth has both ITC and DTC. The thing I'm hacking right now (started
last night) is ITC but at the moment it's only for laughs. Don't take
it too seriously for now. The FFI is pretty simple: any C function
starting with "CODE_" is a CODE word. Example ("dup"):

void CODE_dup() { push (top()); }

If you want a special character in the name, use "_xx" where xx is
the hex code of the character you want. This is "r>":

void CODE_r_3E { rpush(pop()); }

The build process scans the post-linker symbol table to get the
addresses of all those functions, unmangles the names, and puts them
with the addresses into the Forth dictionary. Invoking a primitive is
then just a C indirect function call. It should be possible to do
something similar for primitives written in assembler.

Rod Pemberton

unread,
Nov 12, 2017, 8:35:25 PM11/12/17
to
Correct. This is a change.

> Colon definitions will contain a
> list of addresses of the words it is defined of.

Correct. This is normal.

> In code definitions
> the first address is zero in which case it will be followed by an
> address which is to the code for this word.

Correct. This is a change.

> The address in any colon
> definition using this word will be used the same as any other word,

Correct. This is normal.

> but the NEXT code will have to check every colon address for the zero
> escape address, no?

Yes, it will have to check for zero constantly. This adds some
overhead, but that may be compensated for by other changes. See below.

> Is that simpler and faster than using a CFA?

FYI, I've only written one Forth interpreter, and I'm not a Forth
programmer. I coded mine in C.

Simpler? Yes and/or no.

Technically, this only requires a conditional, but NEXT can be combined
with other code e.g., ENTER. I.e., the code can become more optimized,
especially if C is used. E.g., you might add a switch() to execute
code the primitives within the same routine, if byte code is being used.

Faster? ...

I'm not 100% sure, but I think it is faster, at least for C code.
It might be slower or the same for assembly, as not as much overhead is
present for Forth coded in assembly. This adds the overhead of a
conditional to NEXT, but it eliminates the C overhead of calling the
CFA, i.e., a C function, for every Forth word. This is probably
negligible for assembly instead of C. Assembly won't have the overhead
of parameter stack setup (prolog) and cleanup (epilog), nor calling
convention code needed to call a C function.

> I'm not experienced in writing Forth interpreters. I was reading a
> bit on this and there seemed to be a distinction between NEXT and
> ENTER. If so, I suppose the zero check only needs to be done on the
> first word in a definition to see if it is a colon or a code word.
>

FYI, this should all be qualified with AIUI and/or IIRC ...

QUIT is the code for the outer interpreter or text interpreter.
This parses text and looks up words in the dictionary to compile or
execute them.

NEXT is the code for the inner interpreter or address interpreter.
This interprets the address list for ITC (indirect threaded code) or
DTC (direct threaded code). There are also TTC and STC Forths. TTC
(token threaded code) is like the byte code interpreted BASICs of 8-bit
yesteryear. STC (subroutine threaded code) is basically normal
assembly code or a Forth compiler.

ENTER (a.k.a. doCOL) is the CFA (code field address) for colon
definitions. It switches interpretation from the word currently being
interpreted to the next word to be interpreted. It saves the
instruction pointer and adjusts it to the next word so that NEXT moves
to the next word to interpret. I.e., this is the equivalent of calling
another function from the current function in C, but for threaded code.
This could be be thought of as a function prolog (or prologue).

EXIT (a.k.a. ;S ) is a word that stops the interpretation for the
current word, and returns to execution of the prior word. I.e., this
is equivalent to returning from a function in C, but for threaded code.
This could be thought of as a function epilog (or epilogue).

A primitive, or code word, or low-level word, is a word whose CFA is
something other than ENTER (or doCOL). I.e., a word which is not
interpreted for an ITC or DTC Forth. In other words, it's a routine
which does something other than interpretation (i.e., ENTER). In most
cases, that's some special low-level operation, e.g., doCON for
constant, doVAR for variable, the CFA for EXIT to return from
INTERPRET to QUIT, the CFA for BYE to exit Forth, or the specific
routines for primitives, like >R R> DUP SWAP DROP OVER ROT etc. If a
Forth is built up from primitives, there will be 30 to 60 of primitives,
typically.

...

So, what happens for threaded code, is that NEXT starts interpretation
of the threaded code (i.e., address list or offset list). NEXT obtains
the CFA to execute from the code address, which is usually ENTER, but
may be other addresses for primitives etc. ENTER saves interpretation
for the prior word and starts interpretation for the current word. EXIT
stops interpretation for the current word, resets interpretation to the
prior word, and returns to NEXT. If the CFA of a word is something
other than ENTER then the code for word is executed, i.e., primitive.

The interpretation is moved from word to word repeatedly via
ENTER/EXIT. Nothing is actually executed in threaded code until a
primitive, or code word, or low-level word has a CFA other than ENTER.
Then, some work is done, e.g., by calling doCON, doVAR, primitive
for R> >R DUP DROP etc, instead of calling ENTER . Under normal
circumstances, the interpreter is calling ENTER and EXIT many times,
but the majority of Forth code in an ITC/DTC Forth which is built up
from primitives is threaded code. I.e., migrating interpretation from
word to word by calling ENTER repeatedly is "unnecessary" as most words
are interpreted, i.e., have a CFA of ENTER.

With CFA-less interpreter, this repetition of calling ENTER is
eliminated. Without the CFA, the threaded code basically becomes a
linked-list or binary tree is created with the terminating leaves being
primitives, and the nodes are the operations in each Forth word. The
interpreter is modified so that all Forth words are interpreted by
default, e.g., CFA is removed, so ENTER is merged into NEXT. All words
will be interpreted by default, unless something says to do something
other than interpret words, e.g., zero escape code. But, as I
mentioned before, removing the CFA may break DOES> code. I.e., you
could mess around with it for Forth, but it seems more useful to me for
creating your own interpreted language, which doesn't require all of
Forth's functionality.

rickman

unread,
Nov 12, 2017, 11:56:20 PM11/12/17
to
This is all so complex it seems to me. I much prefer subroutine threaded
code where the instruction overhead is as close to zero as possible such as
in a stack processor design. In my CPU design the code size for a call was
three bits with the rest of the N bit opcode being the final N-3 bits of the
address. Addresses are extended by prepending literal instructions with a
single bit of opcode overhead adding N-1 bits to the literal. The call
saves the previous instruction pointer on the return stack while fetching
the first instruction of the new word for a single cycle of execution time
overhead. The same functionality is used in the interrupt response with the
addition of saving the processor status on the data stack for a 1 clock
cycle interrupt response time.

So in a 4 kB address space a call (which could be absolute or relative)
could be two 8 bit instructions, a literal instruction pushing 7 bits onto
the return stack and a call shifting 5 more bits to form a 12 bit address.
A 9 bit instruction word (useful in many FPGA architectures) extends that to
16 kB address space. If your call or jump can be coded in the more limited
bits of the call or jump instruction alone then there is no need to extend
it with a literal instruction.

Paul Rubin

unread,
Nov 13, 2017, 12:51:19 AM11/13/17
to
rickman <gnu...@gmail.com> writes:
> I much prefer subroutine threaded code where the instruction overhead
> is as close to zero as possible such as in a stack processor design.

If you've got a stack processor or can make one, STC is great. Same
goes if you're willing to write a native compiler for a conventional
cpu. Otherwise, ITC, DTC, and bytecode are all easy to implement and
easy to port from one CPU to another.

rickman

unread,
Nov 13, 2017, 1:06:28 AM11/13/17
to
It doesn't seem that easy. It appears every one is a trade-off in speed,
complexity and memory size. The porting thing has been batted around here a
lot and no one has been able to show that it is as easy as claimed.

Paul Rubin

unread,
Nov 13, 2017, 1:27:39 AM11/13/17
to
rickman <gnu...@gmail.com> writes:
> It doesn't seem that easy. It appears every one is a trade-off in
> speed, complexity and memory size. The porting thing has been batted
> around here a lot and no one has been able to show that it is as easy
> as claimed.

Well, it's easier than writing a compiler. Look at eforth for example.
Sure there are trade-offs but if you're not fussy about them, life
becomes simpler. Particularly, if your application only has to react at
human speeds, cpus today are fast enough that you don't have to worry
about the interpreter speed unless you're doing something computation
intensive.

Lars Brinkhoff

unread,
Nov 13, 2017, 2:49:25 AM11/13/17
to
Paul Rubin wrote:
> Your wiki page says ucsim can emulate the STM8 but the ucsim page
> itself doesn't mention this: is STM8 supported?

I have been using the version of ucsim that's included with sdcc.
It certainly does support the STM8.

Paul Rubin

unread,
Nov 13, 2017, 3:03:54 AM11/13/17
to
Lars Brinkhoff <lars...@nocrew.org> writes:
> I have been using the version of ucsim that's included with sdcc.
> It certainly does support the STM8.

Thanks-- I'm not near needing it yet, but this is good to know.

a...@littlepinkcloud.invalid

unread,
Nov 13, 2017, 4:48:43 AM11/13/17
to
Paul Rubin <no.e...@nospam.invalid> wrote:
> a...@littlepinkcloud.invalid writes:
>> You're in danger of re-inventing chipFORTH! The idea there was that
>> all of the dictionary structures were on the host; the threaded and
>> executable code were on the target along with the runtime data. The
>> protocol had only to be able to read and write memory and tell the
>> target what to execute. Words like . could run entirely on the host,
>> with the stack copied back and forth as required.
>
> Interesting! Are there docs around?

I don't know. There were a few papers at the time.

> Yes, that's basically what I'm doing: I got rid of the 1234-DUP
> stuff so all that happens now is that dictionary CREATE and lookup
> are remote calls to the host, plus some CODE words also use the host
> instead of running locally.

chipFORTH was the other way around: CREATE didn't exist on the target
at all, which had no way of making definitions. The target had no
text interpreter. This meant that none of the target's memory was
consumed by the machinery needed to provide interaction. Well, except
for a little stub.

Andrew.

Albert van der Horst

unread,
Nov 13, 2017, 6:53:30 AM11/13/17
to
In article <87inefm...@nightsong.com>,
noforth is about 10K. But hey! We are talking about fully functional,
fully equipped, industrial grade practical compilers. Look at all the
application notes of noforth! You will not miss any of the tools or
features you expect of a modern Forth. You can start do production
work right away.

So I maintain 4K should be plenty for a proof of principle, educational Forth.
You can concentrate on the basic things. Have look at fig-Forth, it is
small and easy to understand.

An aspect that is glossed over is that a considerable part of any
compiler is understanding the instruction set of the target.
MSP430 is neat. OTOH Harvard structure and flash lead to extra complications.
Some simple processors are too register starved (6800, 6809 is fine)
or have restricted stacks (small PIC's) to be a pleasant target.
Small, cheap is not necessarily simple to use.

16 bit micro processors I've made a Forth for are the 6809, and the
renesas 16 bit. They are fit for a ciforth model, because they
use ram not flash for program code. I've never tackled the "program
in flash problem", but you probably have to.

Mark Wills

unread,
Nov 13, 2017, 8:06:56 AM11/13/17
to
With such a constricted target as Paul mentioned: 8K Flash, I can't
see the point in bothering to build an interactive Forth for it,
tethered or otherwise.

I think you'd be better simply building a cross compiler for it and
be done with it. The cross-compiler could be re-purposed for other
chips in the same family; you'd get more payoff from investing the
time in a cross-compiler IMO.

rickman

unread,
Nov 13, 2017, 9:23:28 AM11/13/17
to
Paul Rubin wrote on 11/13/2017 1:27 AM:
> rickman <gnu...@gmail.com> writes:
>> It doesn't seem that easy. It appears every one is a trade-off in
>> speed, complexity and memory size. The porting thing has been batted
>> around here a lot and no one has been able to show that it is as easy
>> as claimed.
>
> Well, it's easier than writing a compiler.

Now sure what you mean about writing a compiler? You are comparing porting
a Forth to writing a C compiler? Wouldn't the proper comparison be porting
Forth and porting a compiler?


> Look at eforth for example.

Ok, what about it?


> Sure there are trade-offs but if you're not fussy about them, life
> becomes simpler. Particularly, if your application only has to react at
> human speeds, cpus today are fast enough that you don't have to worry
> about the interpreter speed unless you're doing something computation
> intensive.

If you aren't fussy, no reason to not use existing tools.

I've often wondered now much time a CPU spends in the "inner interpreter" vs
time in productive code like code definitions.

a...@littlepinkcloud.invalid

unread,
Nov 13, 2017, 10:33:18 AM11/13/17
to
Mark Wills <markwi...@gmail.com> wrote:
>
> With such a constricted target as Paul mentioned: 8K Flash, I can't
> see the point in bothering to build an interactive Forth for it,
> tethered or otherwise.

Oh man, it was worth it thirty years ago and it's still worth it now.
Interaction is key to productivity, and especially so with Forth.

Andrew.

Anton Ertl

unread,
Nov 13, 2017, 11:22:26 AM11/13/17
to
rickman <gnu...@gmail.com> writes:
>I've often wondered now much time a CPU spends in the "inner interpreter" vs
>time in productive code like code definitions.

That's relatively easy to answer with Gforth:

If you run your program with

gforth-fast --no-dynamic

you get direct-threaded code. If you run it with

gforth-fast --dynamic --ss-number=0 --ss-states=0

you get the code of the primitives concatenated together without NEXT,
except for branching words; this reduces the NEXT's by about 70%, and
the rest is, in a way, necessary. It does update the IP once for
every primitive, though.

Let's see how that works for the small benchmarks:

On a Haswell Core i7-4790K:

> gforth-fast --no-dynamic onebench.fs; gforth-fast --dynamic --ss-number=0 --ss-states=0 onebench.fs
sieve bubble matrix fib fft
0.100 0.124 0.084 0.140 0.052
sieve bubble matrix fib fft
0.080 0.112 0.044 0.084 0.032

So on this CPU 10%-40% of the time is spent in the NEXT's that are
easy to eliminate.

On an ARM Cortex A53:
sieve bubble matrix fib fft
0.660 0.720 0.530 0.850 0.480
sieve bubble matrix fib fft
0.410 0.470 0.330 0.540 0.300

On this CPU 35%-38% of the time is spent in NEXTs that are easy to
eliminate.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2017: http://euro.theforth.net/

Albert van der Horst

unread,
Nov 13, 2017, 12:14:35 PM11/13/17
to
In article <2017Nov1...@mips.complang.tuwien.ac.at>,
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>rickman <gnu...@gmail.com> writes:
>>I've often wondered now much time a CPU spends in the "inner interpreter" vs
>>time in productive code like code definitions.
>
>That's relatively easy to answer with Gforth:
>
>If you run your program with
>
>gforth-fast --no-dynamic
>
>you get direct-threaded code. If you run it with
>
>gforth-fast --dynamic --ss-number=0 --ss-states=0
>
>you get the code of the primitives concatenated together without NEXT,
>except for branching words; this reduces the NEXT's by about 70%, and
>the rest is, in a way, necessary. It does update the IP once for
>every primitive, though.

My idea of optimisation of speed: First inline, then concatenate
the primitives, then peephole.
However, while inlining I would at the same time replace at least the
simple conditionals like resulting from IF THEN BEGIN WHILE REPEAT by
relative jumps possibly conditional
If you can replace R> by a code sequence

MOV, X| T| AX'| BO| [BP] 0 B,
LEA, BP'| BO| [BP] 4 B,
PUSH|X, AX|

couldn't you replace
0BRANCH 14
by
jz 14 (14 corrected by "1 CELLS 1- " or some such)
followed by a few nops, because for sure the code will
become smaller.

I'm planning to try that one of these days, because it doesn't seem
too hard.
If I look at ciforth's results of inlining there is a lot
more to be earned than just getting rid of the next's and jump's though.

After inlining : doit PAD 300 TYPE ;
a partial sequence is " 300 ROT ROT 4 XOS "
The concatenated assembler code there is
PUSHI 300
PUSHI 1
POP DX
POP BX
POP AX
PSH BX
PSH AX
PSH DX
POP DX
POP BX
POP AX
PSH BX
PSH AX
PSH DX
PUSHI 4
POP AX _C{ Function number}
POP DX _C{ Third parameter, or dummy}
POP CX _C{ Second parameter, or dummy}
POP BX _C{ First parameter.}
INT 0x80 _C{ Generic call on LINUX }

(I'm sure Marcel Hendrix will cringe.)

>
>Let'srks for the small benchmarks:
>
>On a Haswell Core i7-4790K:
>
>> gforth-fast --no-dynamic onebench.fs; gforth-fast --dynamic --ss-number=0 --ss-states=0 onebench.fs
> sieve bubble matrix fib fft
> 0.100 0.124 0.084 0.140 0.052
> sieve bubble matrix fib fft
> 0.080 0.112 0.044 0.084 0.032
>
>So on this CPU 10%-40% of the time is spent in the NEXT's that are
>easy to eliminate.
>
>On an ARM Cortex A53:
> sieve bubble matrix fib fft
> 0.660 0.720 0.530 0.850 0.480
> sieve bubble matrix fib fft
> 0.410 0.470 0.330 0.540 0.300
>
>On this CPU 35%-38% of the time is spent in NEXTs that are easy to
>eliminate.

Nice prospect!

>
>- anton

Anton Ertl

unread,
Nov 13, 2017, 12:56:48 PM11/13/17
to
Paul Rubin <no.e...@nospam.invalid> writes:
>The idea is to get rid of the code field by using a bit in the address
>word instead. That saves a byte or two of program memory for each
>word in the system, and maybe simplifies the interpreters.

It certainly complicates the interpreter, because you now have to
explicitly check for the bit instead of letting the code field do its
magic.

Also, note that you have other word types in addition to colon
definitions and primitives, and the code field is used for selecting
among them, too. E.g., at the moment Gforth has the following
built-in word types and corresponding code addresses:

defining word code address
: docol
constant docon
variable dovar also CREATE without DOES>
user douser
defer dodefer
+field dofield
value dovalue
does> dodoes (old)
abi-code doabicode
;abi-code do;abicode
? doextra
does> dodoesxt (new)

In Gforth we use primitive-centric code in colon definitions. E.g., a
call to a colon definition is compiled as primitive CALL followed by
the address of the threaded code of the colon definition.

Your approach appears to be similar in putting the information about
how to deal with the words explicitly in the threaded code, except
that you want to do it with one bit. Of course you need more bits,
but you have them. If you have only 8K flash, you can use 3 bits for
that information in a 16-bit system.

Paul Rubin

unread,
Nov 13, 2017, 3:11:31 PM11/13/17
to
a...@littlepinkcloud.invalid writes:
> chipFORTH was the other way around: CREATE didn't exist on the target
> at all, which had no way of making definitions. The target had no
> text interpreter. This meant that none of the target's memory was
> consumed by the machinery needed to provide interaction. Well, except
> for a little stub.

Thanks! Yes I'm heading towards something like that now. The resident
text interpreter will know just three "words":

#xxxx - push hex number xxxx on the data stack. The hex number
can take an optional comment, like #1234-DUP, to make the
interaction between the host and target more human-readable.
X - execute (pop the stack and call the ITC or CODE at that address)
, - comma, deposit TOS at HERE and advance. This is not strictly
needed since it could be done with more noise using X.

There will still be CREATE on the target, but it will send a command to
the host to create a dictionary entry for the next input token and push
the xt onto the data stack with #xxxx.

There will (at least at first) be no way to create CODE words
interactively: the standard primitives and any you want to add are
compiled into the target build, and the host text interpreter's
dictionary is preloaded with their addresses.

What was the target for chipFORTH? How much space did the address
interpreter etc. use?

How bad is the overhead for bytecode instead of ITC and is it likely to
matter, given the ability to use CODE when needed?

Paul Rubin

unread,
Nov 13, 2017, 3:15:25 PM11/13/17
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
> noforth is about 10K. But hey! We are talking about fully functional,
> fully equipped, industrial grade practical compilers. ...
> So I maintain 4K should be plenty for a proof of principle,
> educational Forth. You can concentrate on the basic things. Have look
> at fig-Forth, it is small and easy to understand.

But, fig-Forth and noforth(?) are complete resident interactive
interpreters. I'm aiming to get rid of almost all of the text
interpreter on the target and just have an address interpreter plus a
small host communication function, with all the user interaction done on
the host. That should make it easier to provide "industrial" features
(whatever those are) while keeping the target image small. I'll look
for the noforth app notes.

Paul Rubin

unread,
Nov 13, 2017, 3:18:41 PM11/13/17
to
rickman <gnu...@gmail.com> writes:
> Now sure what you mean about writing a compiler? You are comparing
> porting a Forth to writing a C compiler? Wouldn't the proper
> comparison be porting Forth and porting a compiler?

No, comparing porting an ITC forth to writing an STC forth that has to
generate native machine code. Well, maybe not if the STC is only for
the lowest level primitives, but it's harder if it can inline.

>> Look at eforth for example.
> Ok, what about it?

It's very simple, has a few primitives defined in asm and uses ITC
for everything else.

>> Sure there are trade-offs but if you're not fussy about them, life
>> becomes simpler. ... cpus today are fast enough...
> If you aren't fussy, no reason to not use existing tools.

What existing tools?

Paul Rubin

unread,
Nov 13, 2017, 3:34:12 PM11/13/17
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>>get rid of the code field by using a bit in the address word instead.
> It certainly complicates the interpreter, because you now have to
> explicitly check for the bit instead of letting the code field do its
> magic.

It doesn't complicate the implementation much (just check for that bit).
Of course it slows things down and I'm wondering if that is likely to
matter.

> Gforth has the following built-in word types and corresponding code
> addresses: [long list]

Interesting and I'll refer to that, but I hope to handle most of those
issues on the host. Am I missing something? Are there any that need
special treatment on the target? Some of them look like just
optimizations, like special variable and constant code instead of the
usual DOES> based implementations.

> Of course you need more bits, but you have them. If you have only 8K
> flash, you can use 3 bits for that information in a 16-bit system.

At the moment I have two bits, one indicating a literal and the other
indicating the last word of a definition, but both are optimizations and
maybe unnecessary. I distinguish between CODE and ITC by having the
address interpreter know where the top of CODE is so it can compare the
addresses. But I suppose using a CODE bit is better in some ways.

Rod Pemberton

unread,
Nov 13, 2017, 4:30:28 PM11/13/17
to
Perhaps, a slightly older processor (circa 2009) and older gForth would
be more representative of the question asked? I.e., the values for
the Haswell seem to be very close together to me ...

FYI, I downloaded onebench.fs and fft-bench.fs, as gForth version 0.7.3
here didn't have fft-bench.fs.

From /proc/cpuinfo, my processor is 64-bit as 3.2Ghz with 2 cores and
512KB cache.

# uname -p
AMD Phenom(tm) II X2 555 Processor

# gforth -v
gforth 0.7.3

# gforth-fast --no-dynamic onebench.fs
sieve bubble matrix fib fft
0.234 0.352 0.249 0.386 0.239

# gforth-fast --dynamic --ss-number=0 --ss-states=0 onebench.fs
sieve bubble matrix fib fft
0.122 0.168 0.086 0.179 0.081

Do you happen to know the processor clock for the A53 (or year)? I.e.,
it seems to be half the speed of a processor which seems to be at least
4 or 5 years older ...

m...@iae.nl

unread,
Nov 13, 2017, 4:57:42 PM11/13/17
to
On Monday, November 13, 2017 at 6:14:35 PM UTC+1, Albert van der Horst wrote:
> In article <2017Nov1...@mips.complang.tuwien.ac.at>,
[..]
Why? You know what you are doing, and you know better, in
several senses of the word.

This thread is starting to repeat ideas being publicly
tossed around around 2006 - 2008, when I posted my
experiences with eForth64 and mxForth. I started with
Wil Baden's inliner idea and developed about 10
progressively more complicated variants of it. The
later ones clashed with processor pipelining and code
/ data cache issues and quite a few non-intuitive findings
were presented. Some of these ideas are still waiting
to flow into iForth.

-marcel

Rod Pemberton

unread,
Nov 13, 2017, 5:39:15 PM11/13/17
to
Of course, gforth-itc would be preferred for these tests, but that gives
the same results for both command lines.

Rod Pemberton

unread,
Nov 13, 2017, 6:02:38 PM11/13/17
to
On Mon, 13 Nov 2017 17:40:37 -0500
Clarification, not the same results as above, but both lines with
identical results, and somewhat slower.

Lars Brinkhoff

unread,
Nov 14, 2017, 2:44:35 AM11/14/17
to
Paul Rubin wrote:
> What was the target for chipFORTH?

Some are mentioned here.

http://www.computer-solutions.co.uk/chipdev/forth.htm

Lars Brinkhoff

unread,
Nov 14, 2017, 2:54:06 AM11/14/17
to
Paul Rubin wrote:
> comparing porting an ITC forth to writing an STC forth that has to
> generate native machine code. Well, maybe not if the STC is only for
> the lowest level primitives, but it's harder if it can inline.

I'd say a pure STC implementation is about as easy to write as ITC.

If you factor in optimizations, ITC can do inling etc too.

> What existing tools?

Now that you mention it, I have made a cross compiler for the STM8.
Currently, it generates a target image in host memory, but it's set up
to be able to access a tethered target as well.

Paul Rubin

unread,
Nov 14, 2017, 3:52:20 AM11/14/17
to
Lars Brinkhoff <lars...@nocrew.org> writes:
>> What was the target for chipFORTH?
> Some are mentioned here.
> http://www.computer-solutions.co.uk/chipdev/forth.htm

Thanks! From its name it sounded like it was for one of Chuck's Forth
chips. It looks interesting. I don't see myself getting that fancy
with the thing I'm playing with.

Do you use the STM8 simulator? How well does it work? If something
works under it, will it probably work on a real STM8?

STM8 eForth actually looks very powerful and maybe I should just use it,
or even just use sdcc directly. But it has always bothered me that
there are no small-target FOSS tethered Forths out there. The ATTiny85
was another chip I was interested in, though less so now.

Lars Brinkhoff

unread,
Nov 14, 2017, 4:15:23 AM11/14/17
to
Paul Rubin wrote:
> Do you use the STM8 simulator? How well does it work? If something
> works under it, will it probably work on a real STM8?

Yes, it seems to work well. The first time I went from the simulator to
hardware, I only had to adjust for the size of the smaller RAM.
Everything worked right off the bat.

> But it has always bothered me that there are no small-target FOSS
> tethered Forths out there.

That's kind of the slot I wanted to fill with my cross compiler.

Paul Rubin

unread,
Nov 14, 2017, 4:17:46 AM11/14/17
to
Lars Brinkhoff <lars...@nocrew.org> writes:
> That's kind of the slot I wanted to fill with my cross compiler.

Oh cool, then I should just wait for you ;-).

Mostly my thing is intended for interactively poking around the hardware
though. Don't know if that's the same goal. I look at the amount of
stuff in the STM8 eForth and I'm nowhere near that ambitious.

Lars Brinkhoff

unread,
Nov 14, 2017, 4:35:22 AM11/14/17
to
Paul Rubin wrote:
> Lars Brinkhoff wrote:
>> That's kind of the slot I wanted to fill with my cross compiler.
> Oh cool, then I should just wait for you ;-).

Well, I did write "kind of". The thing that's missing is interaction
with a taret device, and I don't think I'll get around to that in this
batch of hacking. (There are other urgent matters at hand, like getting
ITS to run in a simulated KA10.)

> Mostly my thing is intended for interactively poking around the
> hardware though. Don't know if that's the same goal. I look at the
> amount of stuff in the STM8 eForth and I'm nowhere near that
> ambitious.

My primary goal was getting Forth to run in the nearly lowest-end
microcontrollers. Thus, a target-resident text interpreter isn't
included. At least not yet; it could be a future optional feature.

You might want to try Thomas' stm8ef and see it it meets your
requirements.

a...@littlepinkcloud.invalid

unread,
Nov 14, 2017, 5:59:41 AM11/14/17
to
Paul Rubin <no.e...@nospam.invalid> wrote:
>
> What was the target for chipFORTH? How much space did the address
> interpreter etc. use?

The targets were typical embedded processors of the time: 8051s, etc.
It was direct-threaded or byte-token-threaded code.

> How bad is the overhead for bytecode instead of ITC and is it likely
> to matter, given the ability to use CODE when needed?

ITC wasn't worth doing. The ability to use CODE was pretty much
essential for any high-speed applications. The address interpreter
was maybe a dozen instructions.

It was quite practical to fit an application in a part with 4k of ROM;
256 bytes of memory were enough to support a main task and a couple of
small background tasks. The core primtives (i.e. the Forth words
belonging to the system rather than the application) could be cut down
to less than 1k, leaving the rest for the application. Fully
interactive development worked just fine.

Andrew.

Anton Ertl

unread,
Nov 14, 2017, 7:52:22 AM11/14/17
to
Paul Rubin <no.e...@nospam.invalid> writes:
>> Gforth has the following built-in word types and corresponding code
>> addresses: [long list]
>
>Interesting and I'll refer to that, but I hope to handle most of those
>issues on the host. Am I missing something?

I think you are missing EXECUTE. It requires you to put the
information about how to execute a word in one cell. In Gforth we
continue to have a code field in order to support EXECUTE; threaded
code mostly does not need the code field. If you put the word type in
the top three bits of the cell, you may be able to get rid of the code
field, except for DOES>-defined words, and, by extension, CREATEd
words.

For generating threaded code, you can use primitive-centric code to
avoid putting the word type in the code field or in the address. For
the word types built into Gforth, the code is something like:

defining word code address threaded code
: docol call <body>
constant docon lit <body @>
variable dovar lit <body>
user douser useraddr <body @>
defer dodefer lit <body> perform
+field dofield lit <body @> +
value dovalue lit <body> @
does> dodoes (old) lit <body> call <doescode>
abi-code doabicode abi-call <body>
;abi-code do;abicode ;abi-code-exec <xt>
? doextra
does> dodoesxt (new) does-xt <xt>

Of course, all these extra primitives cost space, of which you have
little. So it may be better to have the word type in the code field,
and use classical threaded code instead of primitive-centric code, or
alternatively to put the word type in the top 3 bits of the cell, and
have a more complex NEXT that deals with that.

Note that the first Forth implementations had similar memory
constraints and used indirect threaded code.

Anton Ertl

unread,
Nov 14, 2017, 8:45:14 AM11/14/17
to
Rod Pemberton <EmailN...@voenflacbe.cpm> writes:
>On Mon, 13 Nov 2017 15:49:57 GMT
>an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>
>> rickman <gnu...@gmail.com> writes:
>
>> >I've often wondered now much time a CPU spends in the "inner
>> >interpreter" vs time in productive code like code definitions. =20
...
>Perhaps, a slightly older processor (circa 2009) and older gForth would
>be more representative of the question asked?

I would not know why that should be, but anyway, here's gforth-0.7.0
on a Xeon X3460 (Nehalem/Lynnfield):

[a4:~/a5/xxxgforthtest/c8/gcc-4.2.0/install/bin:65431] gforth-fast-0.7.0 --no-dynamic onebench.fs
sieve bubble matrix fib
0.204 0.252 0.132 0.400
[a4:~/a5/xxxgforthtest/c8/gcc-4.2.0/install/bin:65432] gforth-fast-0.7.0 --dynamic --ss-number=0 --ss-states=0 onebench.fs
sieve bubble matrix fib
0.176 0.220 0.088 0.240

>I.e., the values for
>the Haswell seem to be very close together to me ...

The main cost of the NEXTs on such wide CPUs is in the branch
mispredictions, and Haswell has a very good indirect branch predictor.
If you want to see something with a relatively simple indirect branch
predictor, here's an Athlon 64 X2 4400+:

[c6:~/xxxgforthtest/c8/gcc-4.2.0/install/bin:45856] gforth-fast-0.7.0 --no-dynamic onebench.fs
sieve bubble matrix fib
0.384 0.572 0.668 0.744
[c6:~/xxxgforthtest/c8/gcc-4.2.0/install/bin:45857] gforth-fast-0.7.0 --dynamic --ss-number=0 --ss-states=0 onebench.fs
sieve bubble matrix fib
0.196 0.244 0.136 0.352

>Do you happen to know the processor clock for the A53 (or year)?

1536MHz, the machine (Odroid C2) is from 2016, the SoC (AmLogic S905)
from 2015, and the core (Cortex-A53) was announced in 2012.

> I.e.,
>it seems to be half the speed of a processor which seems to be at least
>4 or 5 years older ...

Of course. It's a low-end smartphone SoC, so it has low performance
because it must be slower than high-end smartphone SoCs, and because
even the high-end smartphone SoCs are designed with a relatively tight
power and thermal budget in mind. I.e., the Haswell, the Xeon, your
Phenom, and the Athlon 64 X2 are allowed to consume 80W or more, while
a smartphone CPU is limited to maybe 5W (and less for sustained
power).

Anton Ertl

unread,
Nov 14, 2017, 9:39:47 AM11/14/17
to
Rod Pemberton <EmailN...@voenflacbe.cpm> writes:
>Of course, gforth-itc would be preferred for these tests, but that gives
>the same results for both command lines.

gforth-itc is for those who want to run a piece of software that
requires ITC; implementing various optimizations would subvert the
purpose of gforth-itc, so it does not, and these options are ignored.

Moreover, gforth-itc also maintains all the information useful for
debugging and is therefore a variant of the debugging engine gforth,
not gforth-fast. If you want to see how much difference
primitive-centric DTC vs. primitive-centric ITC vs. classical ITC
makes, use

gforth --no-dynamic #primitive-centric DTC
gforth-itc #primitive-centric ITC
gforth-itc -e "' , is compile," #classic ITC for later-compiled code.

Note that the precompiled stuff is primitive-centric in any case.

With Gforth-0.7.0 and the Xeon X3460, this produces:

[a4:~/a5/xxxgforthtest/c8/gcc-4.2.0/install/bin:65437] gforth-0.7.0 --no-dynamic onebench.fs
sieve bubble matrix fib
0.332 0.424 0.196 0.668
[a4:~/a5/xxxgforthtest/c8/gcc-4.2.0/install/bin:65438] gforth-itc-0.7.0 onebench.fs
sieve bubble matrix fib
0.324 0.420 0.224 0.520
[a4:~/a5/xxxgforthtest/c8/gcc-4.2.0/install/bin:65439] gforth-itc-0.7.0 -e "' , is compile," onebench.fs
sieve bubble matrix fib
0.320 0.404 0.240 0.496

I.e., there is not that much difference between DTC and ITC on wide
CPUs with good branch prediction. Let's see how it works with worse
branch prediction (Athlon 64 X2 4400+):

[c6:~/xxxgforthtest/c8/gcc-4.2.0/install/bin:45859] gforth-0.7.0 --no-dynamic onebench.fs
sieve bubble matrix fib
0.592 0.764 0.856 0.884
[c6:~/xxxgforthtest/c8/gcc-4.2.0/install/bin:45860] gforth-itc-0.7.0 onebench.fs
sieve bubble matrix fib
0.752 0.920 0.996 1.040
[c6:~/xxxgforthtest/c8/gcc-4.2.0/install/bin:45861] gforth-itc-0.7.0 -e "' , is compile," onebench.fs
sieve bubble matrix fib
0.732 0.920 0.992 1.024

Here the indirection has a measurable cost. Primitive-centric in
itself does not make much of a difference, it is just an enabler for
further optimizations.

Cecil Bayona

unread,
Nov 14, 2017, 12:24:03 PM11/14/17
to
Are your cross compilers available somewhere? I like looking at other's
code to get ideas.

--
Cecil - k5nwa

Paul Rubin

unread,
Nov 14, 2017, 1:06:59 PM11/14/17
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> I think you are missing EXECUTE. It requires you to put the
> information about how to execute a word in one cell.

Hmm, there are still parts of this I don't understand. From your list:

> defining word code address threaded code
> : docol call <body>
> constant docon lit <body @>
> variable dovar lit <body>
I'd expect to do these the classical way with <builds ... does> .


> user douser useraddr <body @>
> defer dodefer lit <body> perform
> +field dofield lit <body @> +

Not sure why these need special treatment? What is user?
I'll have to think about defer.

> value dovalue lit <body> @

VALUE could call a special primitive that flags the word in the
dictionary as a value, and TO would be another primitive that checks the
flag.

> abi-code doabicode abi-call <body>
> ;abi-code do;abicode ;abi-code-exec <xt>

Won't have these. Just write more primitives if you want to make
external calls.

I just now realize I've been misusing the word CODE to mean primitives.
I currently don't have CODE, so to add native code you have to write new
primitives and rebuild the interpreter.

> ? doextra

What's this?

> does> dodoesxt (new) does-xt <xt>

I was thinking DOES> would just remember HERE at the time that it runs,
saving the HERE in the dict entry of the word most recently defined.
Then <BUILDS would just create a new word and have it call the xt
that DOES> saved. Will that work?

> So it may be better to have the word type in the code field, and use
> classical threaded code instead of primitive-centric code,

What exactly do you mean by primitive-centric?

> or alternatively to put the word type in the top 3 bits of the cell,
> and have a more complex NEXT that deals with that.

I don't have any NEXT at all at the moment, but just an interpreter loop
that reads a cell, figures out what it is (primitive, literal, regular
ITC pointer), and runs it.

> Note that the first Forth implementations had similar memory
> constraints and used indirect threaded code.

They had resident text interpreters too! Chuck had to have been very
clever.

Lars Brinkhoff

unread,
Nov 14, 2017, 1:13:11 PM11/14/17
to
Cecil Bayona wrote:
> Are your cross compilers available somewhere? I like looking at
> other's code to get ideas.

Yes, there's a temporary repository at:
http://github.com/larsbrinkhoff/xForth

Paul Rubin

unread,
Nov 14, 2017, 1:29:49 PM11/14/17
to
a...@littlepinkcloud.invalid writes:
> ITC wasn't worth doing. The ability to use CODE was pretty much
> essential for any high-speed applications. The address interpreter
> was maybe a dozen instructions.

Hmm I just re-read http://www.bradrodriguez.com/papers/moving1.htm
and I think what I'm doing is actually closer to DTC than ITC. But I'll
think about switching to bytecode.

> It was quite practical to fit an application in a part with 4k of ROM;
> 256 bytes of memory were enough to support a main task and a couple of
> small background tasks. The core primtives (i.e. the Forth words
> belonging to the system rather than the application) could be cut down
> to less than 1k, leaving the rest for the application. Fully
> interactive development worked just fine.

Nice. 1k is smaller than what I'm likely to end up with, but I figure
it's cool if I can get it in 2k. Running in a 4k rom part sounds
feasible in that case. I haven't thought much about how to do tasking.

Did the 8051 use interrupt-driven character i/o? If your program looped
was there a way to ^C back to the interpreter or anything like that?

TG9541

unread,
Nov 14, 2017, 3:28:05 PM11/14/17
to
On Sunday, November 12, 2017 at 11:22:31 PM UTC+1, Paul Rubin wrote:
> I haven't made any attempt to get STM8 eForth working yet--the boards
> are still in their little baggies and I don't have an STLink to program
> them with. But I appreciate the advice and I'm sure I'll need it when
> the time comes.

Using STM8 eForth with Manfred Mahlow's e4thcom should be very simple. For a quick start check out the following code and run "make target" to resolve the dependencies: https://github.com/TG9541/forth-oled-display/tree/depend

> With sdcc -mstm8 it looks like my address interpreter is around 120
> bytes of asm code, not bad. I'd like to keep the whole thing in the
> under-3k range.

That looks feasible. STM8 eForth can be configured to use about 3 KiB (by sacrificing some words, and a great deal of the dictionary).

> Your wiki page says ucsim can emulate the STM8 but the ucsim page itself
> doesn't mention this: is STM8 supported?

Yes but you need a recent version. The how-to is here: https://github.com/TG9541/stm8ef/wiki/STM8S-Programming#sdcc-the-simulator-ucsim

The easiest way to do it is using a Docker container. Check .travis.yml for details.

> > By the way, I've had the idea to add a "bytecode" mode to STM8 eForth:
> > an STC Forth with variable length opcodes for a very simple VM.
> That sounds possibly worthwhile. I coded something like it a while
> back, inspired by the GA144.

The GA144 is the ultimate manifestation of minimalism. I've really got to get one.

> How much space does the eForth dictionary
> use on the STM8? I imagine getting rid of that and offloading more
> interactive functions (maybe even the text interpreter) to the host
> would make it smaller.

The W1209 is a "fat binary" with 5275 bytes (including display support, background task, and other features). Its dictionary with 159 words occupies 1184 bytes. The actual binary code is 4067 bytes large. It has 2941 bytes flash memory free.

The CORE binary (4002 bytes) is a basic eForth (without CREATE..DOES>, DO..+LOOP but with Forth Interrupts, and Flash support). Its dictionary with 113 words occupies 822 bytes. The actual binary code is 3180 bytes large. It has 4190 bytes flash memory free.

Without interrupts, and user words in RAM only the binary is even smaller (3704 bytes, 105 words). Note that you can mark most dictionary entries for removal from the binary in a configuration file.

Note that there is an ALIAS feature (that, I was told is unique to STM8 eForth) which can create temporary dictionary entries. Aliases are automatically created during the build process and are part of the binary distribution. In e4thcom they can be used with the #require feature.

> Does STM8 eForth have a multitasker?

Kind of. For Arduino style processing loops it has a preemptive background task (e.g. 1 .. 10 ms) with character I/O redirect (e.g. EMIT to LED display, ?KEY from board pushbuttons). Interrupt routines, written in Forth, can preempt the background task, but they shouldn't use character I/O-words.

Cecil Bayona

unread,
Nov 14, 2017, 3:30:45 PM11/14/17
to
On 11/14/2017 12:29 PM, Paul Rubin wrote:
> a...@littlepinkcloud.invalid writes:
>> ITC wasn't worth doing. The ability to use CODE was pretty much
>> essential for any high-speed applications. The address interpreter
>> was maybe a dozen instructions.
>
> Hmm I just re-read http://www.bradrodriguez.com/papers/moving1.htm
> and I think what I'm doing is actually closer to DTC than ITC. But I'll
> think about switching to bytecode.
>

A Token Threaded Forth can be quite economical in the use of target
space at the expense of a little speed of execution. Even the speed
aspect can be enhanced by having a jump table to decipher and execute an
opcode is a fairly fast manner.

--
Cecil - k5nwa

Paul Rubin

unread,
Nov 14, 2017, 8:00:36 PM11/14/17
to
TG9541 <thomas....@gmail.com> writes:
> For a quick start check out the following code and run "make
> target" to resolve the dependencies:
> https://github.com/TG9541/forth-oled-display/tree/depend

Nice, I saw that before and I'll definitely give it a try. But I think
I want to start with the emulator. For the real hardware, I haven't
figured out step 2 after "take the board out of the little baggie".

> The easiest way to do it is using a Docker container. Check
> .travis.yml for details.

I've never used Docker but I guess this is a good time to find out how ;).

> The GA144 is the ultimate manifestation of minimalism. I've really got
> to get one.

You might like Bernd Paysan's b16 design. Do you know about it?

https://bernd-paysan.de/b16.html

> The W1209 is a "fat binary" with 5275 bytes ... The actual binary
> code is 4067 bytes large. It has 2941 bytes flash memory free.

That isn't bad at all. My application should be able to fit in it.

>> Does STM8 eForth have a multitasker?
> Kind of. For Arduino style processing loops it has a preemptive
> background task (e.g. 1 .. 10 ms) ...

Cool, that is almost exactly what I want--to do something periodicaly at
around 100 hz. Is the background task clocked by a hardware timer? I
don't care about accurate timing (10.000 ms vs. 10.3 ms or whatever),
but need the rate to be reasonably stable.

Paul Rubin

unread,
Nov 14, 2017, 9:18:59 PM11/14/17
to
Paul Rubin <no.e...@nospam.invalid> writes:
> Not sure why these need special treatment? What is user?

Oh I think I see what you mean. A user variable is at a known offset in
the data block for the current task, but its address will change
depending what task it is. I think I can handle that with a primitive.
What if there is more than one kind of task, and the tasks want to use
their data areas in differing ways? I guess there could be a variable
saying what kind of task it was, that's set at compilation time so
the dictionary can assign offsets starting from 0 for each kind of task.

TG9541

unread,
Nov 15, 2017, 2:46:21 AM11/15/17
to
On Wednesday, November 15, 2017 at 2:00:36 AM UTC+1, Paul Rubin wrote:
> For the real hardware, I haven't
> figured out step 2 after "take the board out of the little baggie".

I know, that's a complicated step. Like the decision with which foot first to touch the cold floor when getting out of bed in the morning ;-)

> I've never used Docker but I guess this is a good time to find out how ;).

You don't even need to use Docker: Travis-CI can do that for you (not interactively, though). Here is how: https://github.com/TG9541/stm8ef/wiki/STM8-eForth-Build-and-Test-Automation

> You might like Bernd Paysan's b16 design. Do you know about it?
> https://bernd-paysan.de/b16.html

I read about it before (also about Chuck Moore's design). It's highly interesting but I'm still at the point where I need to decide which bags to order before I decide in which order I should open them ;-) Maybe I don't need a bag at all, and can do the first steps with simulation.

> > code is 4067 bytes large. It has 2941 bytes flash memory free.
> That isn't bad at all. My application should be able to fit in it.

That's also my idea of the memory requirements of a typical SISO-control application in Forth. There is just so much math one needs, and just so many states for an UI with 3 pushbuttons and a LED display.

> Is the background task clocked by a hardware timer?

Yes- timer2 is used for that (but that can be changed easily).

> I don't care about accurate timing (10.000 ms vs. 10.3 ms or whatever),
> but need the rate to be reasonably stable.

The STM8S003x3 specs say that the HSI stays within 1% over the whole voltage and temperature range after initial user trimming. The specified absolute accuracy is just 5% but I've yet to see the first device that needs trimming because it was outside a 1.5% range. So yes, I think it will meet your requirements, even when using the wakeup-timer.

Anton Ertl

unread,
Nov 15, 2017, 4:11:00 AM11/15/17
to
Paul Rubin <no.e...@nospam.invalid> writes:
>an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>> I think you are missing EXECUTE. It requires you to put the
>> information about how to execute a word in one cell.
>
>Hmm, there are still parts of this I don't understand. From your list:
>
>> defining word code address threaded code
>> : docol call <body>
>> constant docon lit <body @>
>> variable dovar lit <body>
>I'd expect to do these the classical way with <builds ... does> .

Which leads to the question: How do you indicate a
<BUILDS...DOES>-created word if you can only indicate primitives and
colon definitions, and if you don't have a code field. And if you use
<BUILDS with an extra field pointing behind the DOES> for many words,
you lose the advantage of saving the code field for these words, and
incur an additional execution time penalty.

>> user douser useraddr <body @>
>> defer dodefer lit <body> perform
>> +field dofield lit <body @> +
>
>Not sure why these need special treatment?

You can use a general mechanism, such as the code field in classical
ITC, and then you don't need special treatment, but you want to get
rid of the code field; then they need special treatment, because they
behave differently.

> What is user?

A per-task variable.

>> ? doextra
>
>What's this?

I have no idea. Worse than undocumented.

>> does> dodoesxt (new) does-xt <xt>
>
>I was thinking DOES> would just remember HERE at the time that it runs,
>saving the HERE in the dict entry of the word most recently defined.
>Then <BUILDS would just create a new word and have it call the xt
>that DOES> saved. Will that work?

Prose is not a good programming language, but one can describe a
working version in the way you do.

BTW, using an xt-based does> implementation (rather than the classical
threaded-code-address based implementation) will improve the
performance for implementing, e.g., constants, where you can use the
primitive @ rather than having a threaded-code stub "@ ;". Likewise
for DEFER, where the primitive would be PERFORM; and you could add a
primitive @+ for implementing +FIELD and friends. So if you plan to
use DOES> for implementing many word types, that would be the way to
go.

>What exactly do you mean by primitive-centric?

Consider a piece of code

myvar @ mycon + mycolondef

In traditional ITC, this becomes ("cfa" means "code field address"):

<myvar-cfa>
<@-cfa>
<mycon-cfa>
<+-cfa>
<mycolondef-cfa>

In primitive-centric DTC, this becomes ("ca" means "code address")

<lit-ca> <myvar-body>
<@-ca>
<lit-ca> <mycon-body @>
<+-ca>
<call-ca> <mycolondef-body>

So you only have primitives (and their immediate paramaters) in the
threaded code, so you can use DTC (ca instead of cfa) here.

For more details (in particular, how to implement EXECUTE and DOES>),
read

https://www.complang.tuwien.ac.at/papers/ertl02.ps.gz

a...@littlepinkcloud.invalid

unread,
Nov 15, 2017, 4:42:05 AM11/15/17
to
Paul Rubin <no.e...@nospam.invalid> wrote:
>
> Nice. 1k is smaller than what I'm likely to end up with, but I figure
> it's cool if I can get it in 2k. Running in a 4k rom part sounds
> feasible in that case. I haven't thought much about how to do tasking.

It depends on what you're using it for: multi-tasking was pretyyy much
essential for the industrial contol applications we were targeting.

> Did the 8051 use interrupt-driven character i/o?

Oh yes, absolutely. It's the only way to go.

> If your program looped was there a way to ^C back to the interpreter
> or anything like that?

Sort of. The text interpreter ran on the host. If the target looped
you'd reset it. A reset was no big deal because you wouldn't lose
anything and could just carry on.

Andrew.

Kerr-Mudd,John

unread,
Nov 15, 2017, 5:12:37 AM11/15/17
to
Paul Rubin <no.e...@nospam.invalid> wrote in
news:878tf8j...@nightsong.com:
http://www.dibsco.co.uk/forth/skywave/zx81-forth
"Needs at least 2k of memory".

So I was wrong to think that a (fairly minimal) Forth could be done in
1k.

Albert van der Horst

unread,
Nov 15, 2017, 8:29:43 AM11/15/17
to
In article <2017Nov1...@mips.complang.tuwien.ac.at>,
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

>Paul Rubin <no.e...@nospam.invalid> writes:
>>What exactly do you mean by primitive-centric?
>
>Consider a piece of code
>
>myvar @ mycon + mycolondef
>
>In traditional ITC, this becomes ("cfa" means "code field address"):
>
><myvar-cfa>
><@-cfa>
><mycon-cfa>
><+-cfa>
><mycolondef-cfa>

ITC is the only implementation method I know well, and I can tell you
the above is misleading.
If I know the cfa of e.g. @, I can run @, no problem.
I just fetch the content of the cfa and I jump to that address.

If I know the cfa of a variable there is always dovar there to
jump to. So formally there is no information about the address of
the variable. It only works because there is a formal relation
between the cfa and the dfa, say one cell ahead.
But then it is no longer a cfa, but a pointer to a struct
containing a cfa and a dfa.

figForth was very aware of this.
The fig-Forth implementations introduced the work register
that points to what they called parameters. Jumping to
a "code field" is not enough. The work register must contain
valid information.

I got sick of the N^2 relations between all those fields
floating in memory space: NFA>SFA QFA>XFA in tforth in 1993,
where each of those words have to be coded. (There were even
"forget-fields" )
E.g. in figForth to go from a cfa to the next you have to
skip the parameter field, negociate the name field
(find a byte with the m.s. bit up) find the link field and
then fetch its content (or some such).
I introduced the dea: the address of a struct
(object) where the information of a word sits, all fields in
a defined and documented order.
In this conceptual framework high level code is not a list of cfa's,
it is a list of dea's , xt's if you will.
It is not the same.

In DTC the same relation exist, but is obscured. A familiar
technique is to have an indirect call to an address containing
doconstant and then having the constant at the next address.
This constant is fetched via inspecting the return stack.
Looking close you don't just jump to the address, but also
pass a pointer to the constant object in an obscure irregular
(! because it is not related on how e.g. high level code
looks as a dictionary object) format.

So for ITC the above becomes

<myvar-dea>
<@-dea>
<mycon-dea>
<+-dea>
<mycolondef-dea>

or if you will

<myvar-xt>
<@-xt>
<mycon-xt>
<+-xt>
mycolondef-xt>

A dramatic demonstration can be had in the ciforth implementation.
As it is now the cfa is the first field of the struct, so those addresses
can be confused. The offsets of the cfa dfa ffa lfa nfa sfa are
symbolic constants in the header. One can swap the offsets of cfa and say (sfa)
sourcefieldaddress, and the Forth still works. Does that make high
level code a list of sourcefield addresses? Of course not. It was
and still is a list of pointers to struct's representating a Forth word.

>
>In primitive-centric DTC, this becomes ("ca" means "code address")
>
><lit-ca> <myvar-body>
><@-ca>
><lit-ca> <mycon-body @>
><+-ca>
><call-ca> <mycolondef-body>
>
>So you only have primitives (and their immediate paramaters) in the
>threaded code, so you can use DTC (ca instead of cfa) here.

No comment on this. I'm not into DTC.

Groetjes Albert

>- anton
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

rickman

unread,
Nov 15, 2017, 3:01:29 PM11/15/17
to
I have to admit the STM8 snuck up on me. I thought I knew *something* about
nearly ever CPU currently used but I missed the STM8. I thought it was an
8051 core. But when I read about it this seems to be a custom core. I
haven't found much to actually explain this without reading the full
instruction set manual. So am I right that the STM8 is a unique CPU
different from both the 8051 and the ARM? I suppose it is designed for
lowest possible cost rather than performance since ARMs are sold for less
than a dollar. This would explain their abundance on eBay CPU modules.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

TG9541

unread,
Nov 15, 2017, 5:03:07 PM11/15/17
to
On Wednesday, November 15, 2017 at 9:01:29 PM UTC+1, rickman wrote:
> So am I right that the STM8 is a unique CPU
> different from both the 8051 and the ARM? I suppose it is designed for
> lowest possible cost rather than performance since ARMs are sold for less
> than a dollar. This would explain their abundance on eBay CPU modules.

When the STM8 was designed 32bit cores were still relatively expensive, and there was a reason for creating an ST7 successor that could be used with the same peripheral IP as a Cortex chip. Later on, scale effects made it really cheap, cheap enough to give product designers a reason to use it instead of more sophisticated chips (2x10^9 units sold according to ST).

Paul Rubin

unread,
Nov 15, 2017, 5:19:36 PM11/15/17
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> Which leads to the question: How do you indicate a
> <BUILDS...DOES>-created word if you can only indicate primitives and
> colon definitions, and if you don't have a code field.

Is there a reason to do anything special about them at runtime?
The host would know which words were created by <BUILDS but they
would just run at the address recorded at compile time by the DOES>.
I guess the compiler would have to remember for each <BUILDS where
its corresponding DOES> was.

> And if you use <BUILDS with an extra field pointing behind the DOES>

It's very possible that I'm misunderstanding something but I thought
this could be handled at compile time.

> and incur an additional execution time penalty.

I think I'm not worried about this, if the idea is to go for minimal
code space. If something has to run fast it's always possible to add
a primitive.

>>> user douser useraddr <body @>
>>> defer dodefer lit <body> perform
>>> +field dofield lit <body @> +

Can't these be handled like <BUILDS, i.e. just have the compiler know
what kind of word it is and generate appropriate code?

> For more details (in particular, how to implement EXECUTE and DOES>),
> read
>
> https://www.complang.tuwien.ac.at/papers/ertl02.ps.gz

Thanks, I'll look at this.

Paul Rubin

unread,
Nov 15, 2017, 5:20:35 PM11/15/17
to
"Kerr-Mudd,John" <nots...@invalid.org> writes:
> http://www.dibsco.co.uk/forth/skywave/zx81-forth
> "Needs at least 2k of memory".
> So I was wrong to think that a (fairly minimal) Forth could be done in
> 1k.

That is an 8k rom cartridge that wants 2k of ram, according to the ad.
But, it is a fancy system with a code editor etc. I wouldn't call it
minimal.

Paul Rubin

unread,
Nov 15, 2017, 5:27:37 PM11/15/17
to
rickman <gnu...@gmail.com> writes:
> So am I right that the STM8 is a unique CPU different from both the
> 8051 and the ARM?

Yes, it's an 8 bitter, successor to the STM7 with some changes to make
it better as a C compiler target. It has two or three 16 bit address
registers which sound useful for Forth. There is an SDCC target for it.

> I suppose it is designed for lowest possible cost rather than
> performance since ARMs are sold for less than a dollar.

I don't know why it's so cheap compared with say the AVR. It's just
another 8 bitter afaict.

> This would explain their abundance on eBay CPU modules.

I don't understand this so well either: I'd expect there would be
indigenous Chinese MCUs by now that are even cheaper.

rickman

unread,
Nov 15, 2017, 5:47:05 PM11/15/17
to
That's the part that surprises me. Transistors are so small these days that
the area of an MCU devoted to the CPU itself is only a small portion of the
die, including the 32 bit CPUs. I fail to see how an 8 bit device can be
much more economical than a 32 bit MCU. I suppose with the lower pin count
and smaller memory devices the CPU is still a noticeable portion of the
chip. So the STM8 has become the new 4 bit CPU.

rickman

unread,
Nov 15, 2017, 5:55:45 PM11/15/17
to
Paul Rubin wrote on 11/15/2017 5:27 PM:
> rickman <gnu...@gmail.com> writes:
>> So am I right that the STM8 is a unique CPU different from both the
>> 8051 and the ARM?
>
> Yes, it's an 8 bitter, successor to the STM7 with some changes to make
> it better as a C compiler target. It has two or three 16 bit address
> registers which sound useful for Forth. There is an SDCC target for it.
>
>> I suppose it is designed for lowest possible cost rather than
>> performance since ARMs are sold for less than a dollar.
>
> I don't know why it's so cheap compared with say the AVR. It's just
> another 8 bitter afaict.

It's all about die size and process complexity. Pin count adds considerably
to the cost so the low cost chips will all be limited in I/O.


>> This would explain their abundance on eBay CPU modules.
>
> I don't understand this so well either: I'd expect there would be
> indigenous Chinese MCUs by now that are even cheaper.

I don't know what China uses internally, but they have never been strong
innovators. Their forte seems to be production optimization. I suspect
there isn't much room for improvement in today's MCU market. It's not like
there is any shortage of manufacturers or MCU varieties.

It's interesting that Wikipedia has a pretty good overview of the ST6 and
ST7, but nothing on the ST8.

Paul Rubin

unread,
Nov 15, 2017, 7:54:38 PM11/15/17
to
rickman <gnu...@gmail.com> writes:
> Transistors are so small these days

Well, that's because of the expensive high tech processes that they fab
those 32 bit chips with. If you use fewer transistors maybe you can use
an older cheaper process.

> I suppose with the lower pin count and smaller memory devices the CPU
> is still a noticeable portion of the chip. So the STM8 has become the
> new 4 bit CPU.

I sonder why they don't make more low pin count ARMs. One thing I like
about the STM8 is there's an 8k flash, 1k ram, 128 byte eeprom part in
an 8-SOIC package, costing under 30 cents in 1k qty from Digikey. There
are some AVRs and PICs in similar packages but they have much less
memory. STM8 also has some 3x3mm UFQFPN20 packages but they look hard
to hand solder (are they?).

Paul Rubin

unread,
Nov 15, 2017, 7:55:31 PM11/15/17
to
rickman <gnu...@gmail.com> writes:
> It's interesting that Wikipedia has a pretty good overview of the ST6
> and ST7, but nothing on the ST8.

https://en.wikipedia.org/wiki/STM8

Paul Rubin

unread,
Nov 15, 2017, 8:24:56 PM11/15/17
to
TG9541 <thomas....@gmail.com> writes:
>> figured out step 2 after "take the board out of the little baggie".
> I know, that's a complicated step.

Actually that probably *is* step 2. Step 1 is order an ST-LINK from
someplace and I haven't decided how to do that (ST board? Adafruit's
USB dongle? Chinese thingie?) and I don't want to buy more hardware for
this until I'm ready to use it.

> You don't even need to use Docker: Travis-CI can do that for you (not
> interactively, though). Here is how:
> https://github.com/TG9541/stm8ef/wiki/STM8-eForth-Build-and-Test-Automation

You mean I'm supposed to run the emulator in Github's CI system? Can I
connect to the emulated STM8 over the net and type Forth commands at it?
I think I'm better off getting the emulator running locally. I hope
that is not too complicated.

> Maybe I don't need a bag at all, and can do the first steps with
> simulation.

Yep, that's my approach too. I'm not even bothering with emulation
right now--just coding directly on Linux.

>> > code is 4067 bytes large. It has 2941 bytes flash memory free.
>> That isn't bad at all. My application should be able to fit in it.
> That's also my idea of the memory requirements of a typical
> SISO-control application in Forth.

I'm pleasantly surprised at how small the fairly powerful-sounding
stm8ef is. I know that Camelforth/4e4th on the MSP430 is around 8k and
Mecrisp is around 12k, so I would have expected a 4k Forth on an 8
bitter to be barely functional. That's why I got interested in the
tethered approach.

Do you think anyone cares about the speed of Forth on these things? I'm
thinking of switching to a bytecode interpreter, which can simplify
things and make the code even smaller. If a particular word must be
fast, the user can add a primitive. The STM8 runs at 16mhz and is
pipelined, so it wouldn't surprise me if bytecode Forth on it is faster
than native code on the original 8051 etc.

> There is just so much math one needs, and just so many states for an
> UI with 3 pushbuttons and a LED display.

My thing needs a couple PWM outputs, probably a couple of A/D's for
speed control with analog knobs, and maybe an SPI output. Plus the
serial(?) port for the Forth console. Sound ok?

> The STM8S003x3 specs say that the HSI stays within 1% over the whole
> voltage and temperature range after initial user trimming.

Oh that's great. I don't care at all about absolute accuracy since the
user will twiddle a knob if they want to make it faster or slower. But
once set, the speed should be reasonably stable. 1% drift over a 10
minute period is probably ok, but that much variation in 10 seconds
could be a problem.

Thanks!

Paul Rubin

unread,
Nov 15, 2017, 8:33:10 PM11/15/17
to
a...@littlepinkcloud.invalid writes:
> It depends on what you're using it for: multi-tasking was pretyyy much
> essential for the industrial contol applications we were targeting.

Ok, that's good to know. I'd like to have it but haven't thought much
yet about how to do it.

> Sort of. The text interpreter ran on the host. If the target looped
> you'd reset it. A reset was no big deal because you wouldn't lose
> anything and could just carry on.

Did resetting clobber the ram? Was there a way to figure out what
happened if the target hung? I like the idea of having some debugging
features, like being able to single step the address interpreter from
the host console.

rickman

unread,
Nov 15, 2017, 8:40:51 PM11/15/17
to
Paul Rubin wrote on 11/15/2017 7:54 PM:
> rickman <gnu...@gmail.com> writes:
>> Transistors are so small these days
>
> Well, that's because of the expensive high tech processes that they fab
> those 32 bit chips with. If you use fewer transistors maybe you can use
> an older cheaper process.

You aren't understanding the issues. They don't make small MCUs with state
of the art 20 nm processes because they are pad limited. The size of the
pads define the size of the die. If they go to a smaller process the die
size doesn't shrink because they still need the same perimeter for the pads.

Also, the die area required by the logic is defined mostly by the memory.
There are only a few thousand gates making up the CPU while the memory is
many thousands of bits and so much more area. Then there are the many
peripherals which consume significant area of the die. Reducing the CPU
from 32 bit to 8 bit *might* save 5 to 10% of the total logic area of the
die and does nothing to reduce the pad ring.


>> I suppose with the lower pin count and smaller memory devices the CPU
>> is still a noticeable portion of the chip. So the STM8 has become the
>> new 4 bit CPU.
>
> I sonder why they don't make more low pin count ARMs. One thing I like
> about the STM8 is there's an 8k flash, 1k ram, 128 byte eeprom part in
> an 8-SOIC package, costing under 30 cents in 1k qty from Digikey. There
> are some AVRs and PICs in similar packages but they have much less
> memory. STM8 also has some 3x3mm UFQFPN20 packages but they look hard
> to hand solder (are they?).

You can find ARMs in 8 pin packages and very low cost.

http://www.eenewseurope.com/news/nxp-puts-32bit-arm-core-8pin-package-39%C2%A2

Unless you are building thousands why do you care if the MCU cost $1 or $0.30?

rickman

unread,
Nov 15, 2017, 8:46:59 PM11/15/17
to
I had found the ST6/ST7 page and looked for ST8 rather than STM8, thanks.

I see the STM8 is a very limited 8 bit micro. I can't see much reason for
working with it rather than an ARM unless you were building many of a project.

Paul Rubin

unread,
Nov 15, 2017, 9:01:28 PM11/15/17
to
rickman <gnu...@gmail.com> writes:
> I see the STM8 is a very limited 8 bit micro. I can't see much reason
> for working with it rather than an ARM unless you were building many
> of a project.

You can buy complete STM8 dev boards online for 60 cents each, and they
are smaller than comparable ARM boards. I do see there are several
SOIC-8 ARM parts now including some fairly beefy ones (16k flash, 2k
ram) from Cypress, so that's good to know.

Paul Rubin

unread,
Nov 15, 2017, 9:24:23 PM11/15/17
to
Paul Rubin <no.e...@nospam.invalid> writes:
> I think I'm better off getting the emulator running locally. I hope
> that is not too complicated.

It turns out to be very easy to compile and launch. I haven't actually
simulated any code with it yet, but I guess I will pretty soon. It
built fine on both x86 and arm32 Linux. So I guess it should be
runnable on raspberry pi or the like, if you wanted to do that.

rickman

unread,
Nov 15, 2017, 9:49:47 PM11/15/17
to
The fact that it is $0.60 is just a "fun" thing. If I am working with an
MCU board I don't care if it is $5 or $10. I just want something that is
easy to use. Even as a hobby thing it isn't worth it to me to buy the
cheapest possible tool even though I do that sometimes. The few times I
regret it makes it not worthwhile. Even the issue of waiting weeks for the
item can make it not worth the low price.

Paul Rubin

unread,
Nov 15, 2017, 10:34:37 PM11/15/17
to
rickman <gnu...@gmail.com> writes:
> The fact that it is $0.60 is just a "fun" thing.

True.

> If I am working with an MCU board I don't care if it is $5 or $10.

Sure, if it's a onesie. I might want a few dozen or 100 of these
things, depending. Probably not 1000s though.

> Even as a hobby thing it isn't worth it to me to buy the cheapest

But what if part of the interest is in figuring out what you can do with
cheap hardware? It's not much different from liking to optimize code
for speed even when it's already fast enough.

TG9541

unread,
Nov 16, 2017, 12:42:13 AM11/16/17
to
On Thursday, November 16, 2017 at 2:24:56 AM UTC+1, Paul Rubin wrote:
> Actually that probably *is* step 2. Step 1 is order an ST-LINK from
> someplace and I haven't decided how to do that (ST board? Adafruit's
> USB dongle? Chinese thingie?) and I don't want to buy more hardware for
> this until I'm ready to use it.

My advice is to invest $5 in China now, and simply forget to wait until a couple of baggies arrive. That works every time.
A shopping list is here:
https://github.com/TG9541/stm8ef/wiki/Breakout-Boards#getting-started

> > You don't even need to use Docker: Travis-CI can do that for you (not
> You mean I'm supposed to run the emulator in Github's CI system? Can I
> connect to the emulated STM8 over the net and type Forth commands at it?
> I think I'm better off getting the emulator running locally. I hope
> that is not too complicated.

I never tried opening an SSH channel to a node in the Travis-CI build farm, but I certainly wouldn't recommend doing it. I use it in "batch processing mode" (60s style computing) just as everybody else. Continuous Integration Service sounds a lot more fancy but that's what it is. Code in, quality result + logs out.

On the other hand, setting up Docker on a Debian/Ubuntu/Mint based machine should be a matter of minutes, and running the tg9541/docker-sdcc container should just work.

You can also install the tool chain locally without Docker, of course (the very simple Dockerfile shows how to do that).

> > Maybe I don't need a bag at all, and can do the first steps with
> > simulation.
> Yep, that's my approach too. I'm not even bothering with emulation
> right now--just coding directly on Linux.

Using simulation is a lot more difficult than using real hardware, especially when peripherals are involved. That's real work. I use it for running automated regression tests and for interactive debugging.

> I'm pleasantly surprised at how small the fairly powerful-sounding
> stm8ef is.

I'm fairly new to Forth, but the feedback I get from experienced Forthers is really good. Some of the code is highly optimized and you'll find it difficult to get the core words any smaller.

> Do you think anyone cares about the speed of Forth on these things?

No, but it's still surprisingly fast :-)

Check out the comments here:
https://weblambdazero.blogspot.de/2016/10/go-forth-with-arduino.html


> I'm
> thinking of switching to a bytecode interpreter, which can simplify
> things and make the code even smaller.

The overall memory efficiency might be better when using a bytecode approach, but the jump table and the "inner interpreter" are a major investment.

> If a particular word must be
> fast, the user can add a primitive. The STM8 runs at 16mhz and is
> pipelined, so it wouldn't surprise me if bytecode Forth on it is faster
> than native code on the original 8051 etc.

I would be surprised if it wouldn't be much faster than an old 8051.

> My thing needs a couple PWM outputs, probably a couple of A/D's for
> speed control with analog knobs, and maybe an SPI output. Plus the
> serial(?) port for the Forth console. Sound ok?

Right, serial. Add some ADC! .. ADC@, and some scaling/filtering to this code:

https://gist.github.com/TG9541/321e58dd4c837bb3625d93bf845562f3

I just noticed that the SPI words need to be updated, but consider it done.

https://github.com/TG9541/stm8ef/blob/master/lib/hw/spi.fs

> 1% drift over a 10
> minute period is probably ok, but that much variation in 10 seconds
> could be a problem.

I'm pretty sure that this won't happen (unless your power supply is really lousy).

Paul Rubin

unread,
Nov 16, 2017, 2:55:14 AM11/16/17
to
TG9541 <thomas....@gmail.com> writes:
> My advice is to invest $5 in China now, and simply forget to wait
> until a couple of baggies arrive. That works every time.

Haha, ok. There's also an issue that I can't order from AliExpress
because of an issue with my credit card, but maybe I can order from an
AliExpress vendor directly. Or tell you what: if I can get my Forth
running under simulation, I'll spring the $15 and order the Adafruit
dongle.

> I never tried opening an SSH channel to a node in the Travis-CI build

It's ok, I have the simulator installed now and think I want to use
that. My network access will be spotty for the next few days.

Are you able to run stm8ef under ucsim? I didn't notice anything in the
stm8ef wiki about that.

> Using simulation is a lot more difficult than using real hardware,
> especially when peripherals are involved. That's real work. I use it
> for running automated regression tests and for interactive debugging.

Well I just want to simulate a basic serial console at first. That will
see how much memory my Forth uses, etc. Once I can do that I'll
approach the real hardware.

> https://weblambdazero.blogspot.de/2016/10/go-forth-with-arduino.html

Nice article and Amforth is a good implementation. The author hangs
around here too, some of the time.

> The overall memory efficiency might be better when using a bytecode
> approach, but the jump table and the "inner interpreter" are a major
> investment.

Thanks, I'm leaning more towards the bytecode approach now. It
simplifies and maybe shrinks the target end, and it gets rid of some
stupid hacks I've done to figure out addresses of target words on the
host side.
> Right, serial. Add some ADC! .. ADC@, and some scaling/filtering to
> this code:
> https://gist.github.com/TG9541/321e58dd4c837bb3625d93bf845562f3

Thanks!

> I just noticed that the SPI words need to be updated, but consider it done.
> https://github.com/TG9541/stm8ef/blob/master/lib/hw/spi.fs

Thanks for that too!

>> 1% drift over a 10 minute period is probably ok, but that much
>> variation in 10 seconds could be a problem.
> I'm pretty sure that this won't happen (unless your power supply is
> really lousy).

It will be a battery pack or lipo cell, so good DC.

Paul Rubin

unread,
Nov 16, 2017, 3:03:05 AM11/16/17
to
Paul Rubin <no.e...@nospam.invalid> writes:
> Are you able to run stm8ef under ucsim? I didn't notice anything in the
> stm8ef wiki about that.

Oops! I just found simload.sh. Now to figure out how to get it
working.

a...@littlepinkcloud.invalid

unread,
Nov 16, 2017, 5:02:55 AM11/16/17
to
Paul Rubin <no.e...@nospam.invalid> wrote:
> a...@littlepinkcloud.invalid writes:
>> It depends on what you're using it for: multi-tasking was pretyyy much
>> essential for the industrial contol applications we were targeting.
>
> Ok, that's good to know. I'd like to have it but haven't thought much
> yet about how to do it.
>
>> Sort of. The text interpreter ran on the host. If the target looped
>> you'd reset it. A reset was no big deal because you wouldn't lose
>> anything and could just carry on.
>
> Did resetting clobber the ram?

No. Static RAM, nothing would clear it.

> Was there a way to figure out what happened if the target hung?

Sure. It's an interactive Forth system: just debug it in the usual
way.

> I like the idea of having some debugging features, like being able
> to single step the address interpreter from the host console.

What for? Unless your address interpreter was wrong, which seems
pretty unlikely. I never found single-stepping debuggers very useful
for Forth.

Andrew.

rickman

unread,
Nov 16, 2017, 11:11:31 AM11/16/17
to
Yeah, I don't pursue engineering much for the pure joy of spending hours
trying to squeeze every last drop of anything from a design. I've spent way
too many hours with that sort of thing when it was necessary. Read the
book, "Soul of the New Machine". I'm the guy who quit trying to optimize
every last nanosecond from the critical path to go farm and not worry about
any time intervals shorter than a season. Well, almost.

TG9541

unread,
Nov 16, 2017, 1:59:38 PM11/16/17
to
On Thursday, November 16, 2017 at 8:55:14 AM UTC+1, Paul Rubin wrote:
> running under simulation, I'll spring the $15 and order the Adafruit
> dongle.

This sounds like a plan - the Adafruit guys certainly deserve it!

> Are you able to run stm8ef under ucsim? I didn't notice anything in the
> stm8ef wiki about that.

It's here: https://github.com/TG9541/stm8ef/wiki/STM8S-Programming#sdcc-the-simulator-ucsim

> Well I just want to simulate a basic serial console at first. That will
> see how much memory my Forth uses, etc. Once I can do that I'll
> approach the real hardware.

I forgot to mention that execution speed isn't uCsim's strongest point. Expect 5% to 20% ...

> Nice article and Amforth is a good implementation. The author hangs
> around here too, some of the time.

AmForth is good. It would be even better on the STM8 (the AVR architecture is plain Harvard, the STM8 "emulate" van Neumann architecture).

> Thanks, I'm leaning more towards the bytecode approach now.

Great! I'm curious how it works out!
You're welcome! Don't expect it to work in uCsim, though :-)

> It will be a battery pack or lipo cell, so good DC.

You may want to look into using a STM8L chip - the brown-out-reset threshold of the STM8S family is 2.8V (sharp).

TG9541

unread,
Nov 16, 2017, 2:04:59 PM11/16/17
to
simload.sh is a bit "convoluted" (running binaries for boards with "UART simulation" required injecting UART code using a breakpoint).

Better try an interactive session, see link in the previous post.

Paul Rubin

unread,
Nov 16, 2017, 3:09:43 PM11/16/17
to
TG9541 <thomas....@gmail.com> writes:
> simload.sh is a bit "convoluted" (running binaries for boards with
> "UART simulation" required injecting UART code using a breakpoint).
> Better try an interactive session, see link in the previous post.

I'll see if I can do that, but how am I supposed to do console i/o from
the simulated program? When I compile printf("hello") with sdcc, the
sdcc linker complains that there is no _putchar, so I figure I have to
supply it myself.

I found a bunch of UART addresses in stm8device.inc in the eforth
distro, so I was going to figure out how stm8ef accesses the uart and
put similar code into my interpreter to implement KEY and EMIT.
The hope is to get an interactive interpreter (even a very minimal
one that can just handle the equivalent of "2 2 + .") running under
the simulator before trying to mess with actual hardware.

>>spring the $15 and order the Adafruit dongle.
> This sounds like a plan - the Adafruit guys certainly deserve it!

Heh, sounds good. You do of course know that the founder of Adafruit is
not a guy. I knew her slightly before she became a tech celebrity but
I'm sure she considers me a nobody now ;-).

TG9541

unread,
Nov 16, 2017, 4:10:43 PM11/16/17
to
On Thursday, November 16, 2017 at 9:09:43 PM UTC+1, Paul Rubin wrote:
> I'll see if I can do that, but how am I supposed to do console i/o from
> the simulated program?

First terminal:
```
thomas@w500:~/source/stm8s/stm8ef$ sstm8 -tS103 -Suart=1,port=10000 out/MINDEV/MINDEV.ihx
uCsim 0.6-pre33, Copyright (C) 1997 Daniel Drotos.
uCsim comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
0> Loading from out/MINDEV/MINDEV.ihx
4818 words read from out/MINDEV/MINDEV.ihx
run
Simulation started, PC=0x008000
```

Second terminal:
```
thomas@w500:~/source/stm8s/stm8ef$ telnet localhost 10000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
uart[1] terminal display, press ^x to access control menu

STM8eForth 2.2.20
ok
```

> When I compile printf("hello") with sdcc, the
> sdcc linker complains that there is no _putchar, so I figure I have to
> supply it myself.

That's right. Try using STM8 eForth first (instant gratification).

> I found a bunch of UART addresses in stm8device.inc in the eforth
> distro, so I was going to figure out how stm8ef accesses the uart and
> put similar code into my interpreter to implement KEY and EMIT.

Look for "TX!" and "?RX", and for `.ifne HAS_RXUART+HAS_TXUART` in "COLD"

> The hope is to get an interactive interpreter (even a very minimal
> one that can just handle the equivalent of "2 2 + .") running under
> the simulator before trying to mess with actual hardware.

That should work. You can use STM8 eForth as a starting point (you'll need some Forth primitives, right?)

Kerr-Mudd,John

unread,
Nov 16, 2017, 4:54:12 PM11/16/17
to
Paul Rubin <no.e...@nospam.invalid> wrote in
news:87k1yri...@nightsong.com:
Seems that isn't the one I remembered ; it was a cassette version:

http://vi.vipr.ebaydesc.com/ws/eBayISAPI.dll?ViewItemDescV4&item=
170898447530

gives a bit of description. Probably not in 1k though!

Paul Rubin

unread,
Nov 16, 2017, 6:50:29 PM11/16/17
to
TG9541 <thomas....@gmail.com> writes:
> thomas@w500:~/source/stm8s/stm8ef$ telnet localhost 10000 ...
> STM8eForth 2.2.20
> ok

Nice, thanks, that worked! I was able to add 3+3 which was good enough
for me. The simulator runs at around 250k ticks/second on my laptop.
I'm planning to run it on an ARM7 server (about like a Raspberry Pi 2)
so I expect it will be even slower then. But I think that's fine for
this low-level testing.

> That's right. Try using STM8 eForth first (instant gratification).

Cool, it works now. I had started fooling with it but hadn't yet
figured out what args to give the simload script. I also tried running
a C program with no putchar, that just wrote a bunch of A's into a
memory buffer, then tried to find the AAAAA.. in the hex dump, but
didn't find them. I'll have to figure that out too.

> Look for "TX!" and "?RX", and for `.ifne HAS_RXUART+HAS_TXUART` in "COLD"

Thanks, that will help.

> That should work. You can use STM8 eForth as a starting point (you'll
> need some Forth primitives, right?)

Sounds good. My interpreter is (for now) written completely in C, so
part of the exercise is to get the primitives and i/o working from the
SDCC output. I'll certainly use STM8EF as a reference. Primitives are
implemented in a very hacky way: for example, the "dup" primitive looks
like:

void CODE_dup() { push(top()); }

During the build process, a Python script scans (on Linux with gcc) the
a.out symbol table, finds all the names that start with "CODE_", and
remembers the addresses in a host-side dictionary (another Python
script). I plan to similarly scan the .map file for SDCC output.

So to add 6+5 and print the answer, instead of instead of "6 5 + ."
you'd say something like

$6 $5 $1234 X $3456 X

$1234 is the address of the + primitive and $3456 is the "." primitive,
and X is a special command that runs the EXECUTE primitive (pops the
stack and calls the address as a C function). The "." primitive will
then send something like

$000B .

to the UART, which is the hex code for 11, followed by "." to print it.

This output is supposed to be fed into a host-side Forth that does the
actual decimal conversion and printing. At the moment the host side is
Python but I might try connecting Gforth to it to execute the
target-originated commands.

The target only knows how to read and print hexadecimal numbers plus run
a few hardwired commands like X (there is no dictionary on the target).
X is the only really required command since it lets you run the @ and !
primitives through their (known) hex addresses but I currently support a
few more like "," (comma) for convenience. So this should keep the
target download very small, while allowing as fancy a host-side text
interpreter as desired.

If I switch to a bytecode interpreter, the primitives will be selected
with a dense switch statement (so the compiler will generate a jump
table), and it should become easier to write the main bytecode loop in
assembler to make better use of the machine registers.

Paul Rubin

unread,
Nov 17, 2017, 5:29:08 PM11/17/17
to
TG9541 <thomas....@gmail.com> writes:
> thomas@w500:~/source/stm8s/stm8ef$ sstm8 -tS103 -Suart=1,port=10000 out/MINDEV/MINDEV.ihx...
> run
> Simulation started, PC=0x008000

Fwiw, the above works fine on my x86-64 laptop but fails with

0> Loading from out/MINDEV/MINDEV.ihx
4794 words read from out/MINDEV/MINDEV.ihx
run
Simulation started, PC=0x008000
Stop at 0x0087db: (0) Invalid instruction 0x0075
F 0x0087db
Simulated 4591 ticks in 0.011967 sec, rate=0.047954
0>

under arm32 Linux. I may try to debug it sometime but for now will just
use x86.

TG9541

unread,
Nov 18, 2017, 2:22:20 AM11/18/17
to
Paul, that's interesting! Which STM8 eForth baseline, and which revision of uCsim are you using?

Paul Rubin

unread,
Nov 18, 2017, 3:21:18 AM11/18/17
to
TG9541 <thomas....@gmail.com> writes:
> Paul, that's interesting! Which STM8 eForth baseline, and which
> revision of uCsim are you using?

I'm not sure how to tell the stm8ef baseline, but I downloaded it from
github a night or two ago. The md5sum of forth.asm in the stm8ef
directory is ade958ddd0a89f830a1ee918edb26feb if that helps at all.

The sdcc build is 3.6.9 #10185 from just a few days ago. I originally
saw an error with an older sdcc, so I tried installing the latest sdcc
to see if it still happened. It gets further now, but crashes with the
error I pasted earlier. I'll be happy to check anything else you want
or give you access to my ARM server if you want to examine the crash
yourself. The server is a Scaleway C1 (www.scaleway.com), basically a
small ARM board in a data center for 3 euro a month.

Your earlier post about finding the serial port code in forth.asm was
very helpful in understanding how the port works, so thanks again for
that. My C program can read and write the port under simulation now.
I'll try to get the tethered interpreter (slightly) working under
simulation during the weekend. If that happens, I'll order the ST-link
dongle from Adafruit as promised. Do I also have to order an FTDI cable
(I don't have one) or does the ST-link take care of passing the STM8
serial port back to the laptop USB? Thanks!

Paul

TG9541

unread,
Nov 18, 2017, 6:04:48 AM11/18/17
to
On Saturday, November 18, 2017 at 9:21:18 AM UTC+1, Paul Rubin wrote:
> > Paul, that's interesting! Which STM8 eForth baseline, and which
> > revision of uCsim are you using?
>
> I'm not sure how to tell the stm8ef baseline, but I downloaded it from
> github a night or two ago. The md5sum of forth.asm in the stm8ef
> directory is ade958ddd0a89f830a1ee918edb26feb if that helps at all.

I can't identify ade958d, unfortunately.

> The sdcc build is 3.6.9 #10185 from just a few days ago. I originally
> saw an error with an older sdcc, so I tried installing the latest sdcc
> to see if it still happened.

Please try using a the revision of uCsim that works in my CI chain
"svn checkout svn://svn.code.sf.net/p/sdcc/code/trunk/sdcc@993"

> I'll be happy to check anything else you want

On the uCsim console, please try to make a disassembly around the point where it crashed (dc 0x87db minus a single digit value).

> or give you access to my ARM server if you want to examine the crash
> yourself. The server is a Scaleway C1 (www.scaleway.com), basically a
> small ARM board in a data center for 3 euro a month.

We can do this if everything else fails :-) I find uCsim difficult to debug, and I hope to get the author of uCsim interested in looking into it.

By the way, scaleway looks nice. I'm looking into moving my old Ubuntu based server into Scaleway IaaS.

> that. My C program can read and write the port under simulation now.

Nice, you're doing good progress!

> I'll try to get the tethered interpreter (slightly) working under
> simulation during the weekend. If that happens, I'll order the ST-link
> dongle from Adafruit as promised. Do I also have to order an FTDI cable
> (I don't have one) or does the ST-link take care of passing the STM8
> serial port back to the laptop USB? Thanks!

I'm not aware of serial interface features in the ST-Link firmware. Unfortunately, you'll need a serial cable. I try to maintain a small stock of cheap CH340 dongles (see baggie trick in previous post above).
It is loading more messages.
0 new messages