Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

double tasking with two interpreters

126 views
Skip to first unread message

Paul Rubin

unread,
May 27, 2012, 6:47:27 PM5/27/12
to
I'm looking at the MSP430, whose instruction architecture is sort of
like a PDP-11, with less regularity in the address modes, but the upside
is that it has 16 registers (a few of them special purpose) instead of 8.

While most programs are single task and some are multi-tasking with N
tasks, a significant bunch (say for bidirectional i/o on some port) use
exactly two tasks. A classic thread Forth uses 4 or 5 registers in the
address interpreter and a multitasker has to swap these around to some
storage area on task switch, burning cycles and memory. The MSP430
variant I'm looking at has just 128 bytes of ram, so it's not plentiful.

I'm wondering if it's a known, sensible method, to simply have two
copies of the address interpreter, one of them using (say) R0 through
R4, and the other using R5 through R9, to hold the DP, RP, TOS, etc.
Then a task switch would just mean jumping from one interpreter to the
other.

Is this silly?

Elizabeth D. Rather

unread,
May 27, 2012, 7:58:56 PM5/27/12
to
Well, I think there are simpler ways. Assume you have one register
(called U) pointing to the status area of the "current" user or task,
and one register each for the current task's S and R. User variables
contain the parameters that control its interpreter (and other things);
they are accessed via the U register. Assuming further that all task
switches take place in Forth words such as STOP or PAUSE (i.e., not
interrupt code), then all you need to do is push R onto the data stack,
save S in the user area, and change U to the next task. Very fast task
switch, and the current task has the use of all the registers.

I agree, the MSP430 has a lovely instruction set.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Paul Rubin

unread,
May 27, 2012, 9:35:03 PM5/27/12
to
"Elizabeth D. Rather" <era...@forth.com> writes:
> Well, I think there are simpler ways. Assume you have one register
> (called U) pointing to the status area of the "current" user or task,
> and one register each for the current task's S and R.

OK, but now you no longer have the optimizations of keeping TOS and the
instruction pointer in registers, most of the machine registers are
being left unused, and you're consuming more memory with that stuff that
could have been in registers. Avoiding all that was the idea of having
a separate register subset for each task.

BruceMcF

unread,
May 27, 2012, 11:14:27 PM5/27/12
to
On May 27, 9:35 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
If you have one register for the User page, one for S and R, and eight
registers available for use, you could have a shadow register for TOS
and S and push IP and R onto S.

So there's 7 registers in use: ,R ,S ,S' U TOS IP and IP'.

If you are toggling, you can have the two distinct task switches as
independent task switch operations, with the pending task switch
stored in the task switch vector register. So rather than a task
switch vector per user page, there's only one RAM location for the
task-switch. When task switch 1 runs, it stores the vector for task
switch 2 and visa versa.

So the task switch would be:

Task 1:
(1) Store Task2 task switch in SWITCH vector location
(2) Stores User1 base in U
(3) Jump to common task switch ...

Task 2:
(1) Store Task1 task switch in SWITCH vector location
(2) Stores User2 base in U
(3) Jump to common task switch ...

Common:
(1) Push TOS then ,R on S
(2) Swap S and S'
(3) Swap IP and IP'
(4) NEXT

I don't know that processor, but generically, if there is not register
swap instruction, there is a free register to execute the register
swaps.

Elizabeth D. Rather

unread,
May 28, 2012, 12:38:49 AM5/28/12
to
You can certainly keep TOS in a register... just one more thing to push
on the stack and pop off again (1 instruction each). As for I, it is set
from the top of the return stack when a task wakes up.

And the rest of the machine registers are hardly "unused" -- they're
available for the code that's running.

When we're addressing the task of implementing Forth on a new processor,
register allocation and usage is the first and most important
consideration in the design. I suggest that you download the SwiftX
evaluation version for the MSP430 and spend some time reading the
manual, in the parts that describe register utilization and the
multitasker. It'll give you a lot of ideas. That is, of course, a cross
compiler, but as I recall it can run an interpreter in the target.

Rod Pemberton

unread,
May 28, 2012, 12:08:01 PM5/28/12
to
"Paul Rubin" <no.e...@nospam.invalid> wrote in message
news:7xmx4tc...@ruckus.brouhaha.com...
My first impression is: "No, it's not silly." But, I'm not familiar with
MSP430. Is there a "known, sensible method" for it? I don't know.


I'd think the idea is sound if:

1) both sets of the "4 or 5 registers" is not corrupted by other processes,
tasks, applications, etc such as the OS, i.e, preserved. That way, you
don't need to save and restore those registers between task switches.
That reduces overhead.

2) you are using two sets of stacks. I.e., each task has it's own
data/parameter stack and return/control-flow stack. That prevents one task
from corrupting the other upon a task switch. Of course, you've stated that
two of those registers will be DP and RP. So, you may have taken care of
this issue already.

If done that way, to switch tasks, you'd only need to switch stacks and use
the other register set. Of course, every low-level word or primitive that
directly accesses a specific register will need two versions ... one for
each register set. That could be an issue. You might try to make sure only
the low-level address interpreter word(s), e.g., NEXT, use the registers,
and the low-level Forth words, i.e., stack words, e.g., R> >R DUP SWAP,
don't use registers ... If you can do that, then you only need to duplicate
a small number of words, one for each register set. But, I'm not exactly
sure how you'd limit register usage that much. Some of the registers, e.g.,
IP, W, and TOS, are basically independent of Forth words, but others, e.g.,
DP and RP, aren't. Forth stack words are dependent on DP and RP, yes? And,
there are quite a handful of Forth stack words, yes? E.g., if >R needs to
access both DP and RP to move data, you'd need a >R specific for each
register set, yes? How do you get >R to use, say, register R3 for task #1
and use, say, register R8 for task #2? This sort of implies to me that DP
and RP might need to be/use the same two registers for both tasks, and be
saved/restored on each task switch. This way you wouldn't have to rewrite
all those stack operations. Or, maybe you need to code a branch which uses
a flag into each stack word to select which register to use...

Am I correctly understanding what you are trying to do?


Rod Pemberton






Paul Rubin

unread,
May 28, 2012, 12:21:11 PM5/28/12
to
"Rod Pemberton" <do_no...@notemailntt.cmm> writes:
> My first impression is: "No, it's not silly." But, I'm not familiar with
> MSP430. Is there a "known, sensible method" for it? I don't know.

Well, what I actually meant by "known sensible method" "do other systems
do it that way?".

>
> I'd think the idea is sound if:
>
> 1) both sets of the "4 or 5 registers" is not corrupted by other processes,
> tasks, applications, etc such as the OS, i.e, preserved.

We're talking about a microcontroller with 128 bytes of ram and 2k of
program flash. There are no other processes, OS, or anything like that ;-).

> Of course, every low-level word or primitive that
> directly accesses a specific register will need two versions

That's a good point. Maybe there could be a common entry sequence for
such words, that canonicalizes the register assignment. On the other
hand it would impose a few instructions of overhead on each such call,
including for frequent words like DUP or SWAP, to save a few
instructions on the presumably less frequent operation of task
switching.

Thanks.

Rod Pemberton

unread,
May 28, 2012, 12:42:44 PM5/28/12
to
"Rod Pemberton" <do_no...@notemailntt.cmm> wrote in message
news:jq07rf$rcp$1...@speranza.aioe.org...
> "Paul Rubin" <no.e...@nospam.invalid> wrote in message
> news:7xmx4tc...@ruckus.brouhaha.com...
...

> Some of the registers, e.g.,
> IP, W, and TOS, are basically independent of Forth words, but others,
e.g.,
> DP and RP, aren't. Forth stack words are dependent on DP and RP, yes?

Sorry, I was thinking SP but wrote DP ...
That goes for the rest of post too, SP and RP.


RP


Rod Pemberton

unread,
May 28, 2012, 1:07:36 PM5/28/12
to
"Rod Pemberton" <do_no...@notemailntt.cmm> wrote in message
news:jq09sj$1g3$1...@speranza.aioe.org...
Argh...

Maybe I should've asked if DP meant "data pointer" or "dictionary
pointer". If data, then my original reply was correct. If dictionary, then
my "sorry" reply is correct.


RP


BruceMcF

unread,
May 28, 2012, 1:11:33 PM5/28/12
to
On May 28, 12:21 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> That's a good point.  Maybe there could be a common entry sequence for
> such words, that canonicalizes the register assignment.  On the other
> hand it would impose a few instructions of overhead on each such call,
> including for frequent words like DUP or SWAP, to save a few
> instructions on the presumably less frequent operation of task
> switching.

That's why you have one data stack and one return stack register in
use. If you want to have exactly two tasks, and want to avoid using
RAM for register storage, you can shadow a register. But unless that
processor has an atomic register swap, you'll need a spare register to
do the swap:

Store #U1/#U2 in U
Store ST in X
Store S in ST
Store X in ST

... as opposed to:
Store S in U,0
Store #U1/#U2 in U
Store U,0 in S

So you save two RAM locations at the cost of dedicating a register.

You can get rid of the next task information storage entirely if you
are toggling two tasks, since you can *infer* the new U from the
current U. Easiest if to place U1 at address 0, and U2 immediately
following, so if each U has, say, 16 bytes, then in pseudo-assembly

SWITCH:
LOAD [X] with [TOST]
LOAD [TOST] with [TOS]
LOAD [TOS] with [X]
LOAD ([R])-- with [IP]
LOAD {[S])-- with [R]
LOAD [X] with [ST]
LOAD [ST] with [S]
LOAD [S] with [X]
TEST [U] with #0
BREQ SWITCH1
LOAD [U] with #0
BRAN SWITCH2
SWITCH1:
LOAD [U] with #16
SWITCH2:
JUMP ([IP])++

That uses eight registers [S] [ST] [TOS] [TOST] [R] [U] [IP] [X]
... and if its a cooperative toggle, you have one extra slot on the
current return and data stacks in between task switches, so the
occasional register hungry primitive can push TOS onto S and IP onto R
at the outset and pop them back at the end, and you have three work
registers.

Elizabeth D. Rather

unread,
May 28, 2012, 1:24:36 PM5/28/12
to
On 5/28/12 6:21 AM, Paul Rubin wrote:
> "Rod Pemberton"<do_no...@notemailntt.cmm> writes:
>> My first impression is: "No, it's not silly." But, I'm not familiar with
>> MSP430. Is there a "known, sensible method" for it? I don't know.
>
> Well, what I actually meant by "known sensible method" "do other systems
> do it that way?".
>
>>
>> I'd think the idea is sound if:
>>
>> 1) both sets of the "4 or 5 registers" is not corrupted by other processes,
>> tasks, applications, etc such as the OS, i.e, preserved.
>
> We're talking about a microcontroller with 128 bytes of ram and 2k of
> program flash. There are no other processes, OS, or anything like that ;-).

That's pretty small! So, when you say you want to run "two interpreters"
I assume you mean "address interpreters"? Usually an unmodified
"interpreter" in Forth means text interpreter, which you definitely
don't have room for!

SwiftX could certainly generate a cross-compiled target program for
that, but it would be helpful to have a larger MSP430 around for
interactive testing.

>> Of course, every low-level word or primitive that
>> directly accesses a specific register will need two versions
>
> That's a good point. Maybe there could be a common entry sequence for
> such words, that canonicalizes the register assignment. On the other
> hand it would impose a few instructions of overhead on each such call,
> including for frequent words like DUP or SWAP, to save a few
> instructions on the presumably less frequent operation of task
> switching.

We usually assign a few registers to running the Forth VM: S and R
(stack pointers), I (interpreter pointer) and U (user pointer). T (TOS)
if there are enough registers. An ITC needs one more, W (word pointer,
the word being called, and also the access into the data space of
words). The rest (accumulators) are designated as scratch, and always
available to words without saving and restoring.

Paul Rubin

unread,
May 28, 2012, 2:11:23 PM5/28/12
to
"Elizabeth D. Rather" <era...@forth.com> writes:
> That's pretty small! So, when you say you want to run "two
> interpreters" I assume you mean "address interpreters"? Usually an
> unmodified "interpreter" in Forth means text interpreter,

Yes, I mentioned in the original post, two address interpreters.
I figure the text interpreter would be on a tethered host.

> SwiftX could certainly generate a cross-compiled target program for
> that, but it would be helpful to have a larger MSP430 around for
> interactive testing.

There is a bigger MSP430 with iirc 16k of program flash and 512 bytes of
ram, that can apparently run an interactive Forth (Camelforth based,
www.4e4th.eu - there was a thread about it a while back).

Apparently recent TI Launchpads are being shipped with the bigger
processor, though it's still advertised as coming with the smaller one.
I've continued to think in terms of the smaller one for "production"
purposes.

If you made an evaluation download of a tethered SwiftForth for the
Launchpad board, it could get to be pretty popular, I imagine.

Coos Haak

unread,
May 28, 2012, 3:39:53 PM5/28/12
to
Op Mon, 28 May 2012 13:07:36 -0400 schreef Rod Pemberton:
DP is the user variable that containst the value of HERE

: HERE DP @ ;
: ALLOT DP +! ;

--
Coos

CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html

Elizabeth D. Rather

unread,
May 28, 2012, 5:23:15 PM5/28/12
to
The default target kernel for a tethered SwiftX (that's the
cross-compiler series) is about 6K, so it still wouldn't fit on the
basic Launchpad. It's possible to strip that down to <1K, but that
doesn't leave a lot of functionality.

For only $59 you can get the EZ430-RF2500 (MSP430F2274, 32K flash, 1K
SRAM), and SwiftX runs fine on that. It would give you a lot more room
for interesting development.

Paul Rubin

unread,
May 28, 2012, 5:36:22 PM5/28/12
to
"Elizabeth D. Rather" <era...@forth.com> writes:
> The default target kernel for a tethered SwiftX (that's the
> cross-compiler series) is about 6K, so it still wouldn't fit on the
> basic Launchpad. It's possible to strip that down to <1K, but that
> doesn't leave a lot of functionality.

Hmm, I'm surprised it's that large, but if the new Launchpads
start "officially" using the bigger (16k) chip, then it's fine.

> For only $59 you can get the EZ430-RF2500 (MSP430F2274, 32K flash, 1K
> SRAM), and SwiftX runs fine on that.

I think you know what a Launchpad costs, so the EZ430-RF2500 is pretty
expensive by comparison.

Elizabeth D. Rather

unread,
May 28, 2012, 6:52:27 PM5/28/12
to
On 5/28/12 11:36 AM, Paul Rubin wrote:
> "Elizabeth D. Rather"<era...@forth.com> writes:
>> The default target kernel for a tethered SwiftX (that's the
>> cross-compiler series) is about 6K, so it still wouldn't fit on the
>> basic Launchpad. It's possible to strip that down to<1K, but that
>> doesn't leave a lot of functionality.
>
> Hmm, I'm surprised it's that large, but if the new Launchpads
> start "officially" using the bigger (16k) chip, then it's fine.

Well, it "throws in" a set of things that seem most useful. Since it's
supplied in source, it's entirely configurable. And there's a "strip"
function that you can apply when you have your application all finished
and running that automatically deletes words that are never called.

>> For only $59 you can get the EZ430-RF2500 (MSP430F2274, 32K flash, 1K
>> SRAM), and SwiftX runs fine on that.
>
> I think you know what a Launchpad costs, so the EZ430-RF2500 is pretty
> expensive by comparison.

Yeah, but still not much. It really all depends on what one wants to use
it for.

Paul Rubin

unread,
May 28, 2012, 8:14:09 PM5/28/12
to
"Elizabeth D. Rather" <era...@forth.com> writes:
> Well, it "throws in" a set of things that seem most useful. Since it's
> supplied in source, it's entirely configurable. And there's a "strip"
> function that you can apply when you have your application all
> finished and running that automatically deletes words that are never
> called.

I see, it's sort of a runtime library, and it's preloaded by default so
that when you poke at it interactively, everything you need is there
ahead of time. I could imagine a cross compiler keeping up with you as
you type--so when you refer to a word, it goes and loads it for you on
the fly--but that's likely more trouble than it's worth.

Meanwhile I do see that the current Launchpad Quick Start Guide

* http://www.ti.com/lit/ml/slac432a/slac432a.pdf

says "LaunchPad includes a pre-programmed MSP430G2553 device" (that is
the bigger device with the 16k of flash). The marketing stuff still
says it comes with the smaller (2k) part. So this is cool.

>> I think you know what a Launchpad costs, so the EZ430-RF2500 is pretty
>> expensive by comparison.
>
> Yeah, but still not much. It really all depends on what one wants to
> use it for.

Yeah, if I were a hardware developer intending an end target of a custom
circuit, using a fancy dev system would be worth it, but I'm more
interested in running stuff directly on Launchpads.

Meanwhile I see that TI has just released source code for the PC side of
the Launchpad boot loader:

http://www.43oh.com/2012/05/ti-releases-bsl-for-the-msp430/

That might be useful (either directly or as guidance for how the process
works) if you want to support the board with Swift.

Elizabeth D. Rather

unread,
May 28, 2012, 9:33:27 PM5/28/12
to
On 5/28/12 2:14 PM, Paul Rubin wrote:
> "Elizabeth D. Rather"<era...@forth.com> writes:
>> Well, it "throws in" a set of things that seem most useful. Since it's
>> supplied in source, it's entirely configurable. And there's a "strip"
>> function that you can apply when you have your application all
>> finished and running that automatically deletes words that are never
>> called.
>
> I see, it's sort of a runtime library, and it's preloaded by default so
> that when you poke at it interactively, everything you need is there
> ahead of time. I could imagine a cross compiler keeping up with you as
> you type--so when you refer to a word, it goes and loads it for you on
> the fly--but that's likely more trouble than it's worth.

Yes, it's a runtime library with lots of capabilities, 34 files last
time I counted. It's really a lot easier, I think, for you to look at
the list of things loaded and, once you've made some progress on your
application, take out stuff you know you don't need. Here's the list for
the EZ:

\ FILES INCLUDED BY BUILD

INCLUDE %SWIFTX\SRC\MSP430\REG_22x4 \ MSP430F2274 hardware equates
INCLUDE %SWIFTX\SRC\MSP430\CONFIG \ Common configuration
INCLUDE CONFIG \ Target configuration
INCLUDE %SWIFTX\SRC\MSP430\USER \ User variables
INCLUDE %SWIFTX\SRC\MSP430\CORE \ Core word set
INCLUDE %SWIFTX\SRC\CORE \ Common core words
INCLUDE %SWIFTX\SRC\MSP430\EXTRA \ Miscellaneous extensions
INCLUDE %SWIFTX\SRC\MSP430\STRING \ Core string operators
INCLUDE %SWIFTX\SRC\STRING \ Core string operators
INCLUDE %SWIFTX\SRC\MSP430\EXCEPT \ Exception handling
INCLUDE %SWIFTX\SRC\MSP430\DOUBLE \ Double-precision numbers
INCLUDE %SWIFTX\SRC\DOUBLE \ Double-precision numbers
INCLUDE %SWIFTX\SRC\MSP430\MATH \ Core math operators
INCLUDE %SWIFTX\SRC\MIXED \ Mixed-precision operators
INCLUDE %SWIFTX\SRC\MSP430\OPT \ Initialize code optimizer
INCLUDE %SWIFTX\SRC\VIO \ Vectored I/O functions
INCLUDE %SWIFTX\SRC\EXCEPT \ Common exception handling
INCLUDE %SWIFTX\SRC\OUTPUT \ Core & facility output functions
INCLUDE %SWIFTX\SRC\OUTPUT2 \ Double output functions
INCLUDE %SWIFTX\SRC\NUMBER \ Numeric input conversion
INCLUDE %SWIFTX\SRC\METHODS \ Methods and VALUE
INCLUDE %SWIFTX\SRC\MSP430\TASKER \ Multitasker
INCLUDE %SWIFTX\SRC\TOOLS \ Debug tools
INCLUDE %SWIFTX\SRC\DUMP1 \ Memory dump
INCLUDE %SWIFTX\SRC\MSP430\VECTORS_RAM \ Interrupt vectors
INCLUDE %SWIFTX\SRC\MSP430\LPM \ Low Power Mode control
INCLUDE %SWIFTX\SRC\MSP430\XTL \ JTAG cross-target link
INCLUDE %SWIFTX\SRC\MSP430\STEPPER \ Single-step debug support
INCLUDE %SWIFTX\SRC\ACCEPT \ Generic terminal input
INCLUDE %SWIFTX\SRC\MSP430\TIMERA-ALT \ Timer A timing functions
INCLUDE %SWIFTX\SRC\TIMING \ Common timing functions
INCLUDE %SWIFTX\SRC\MSP430\FLASH \ Resident flash programming
INCLUDE APP \ **YOUR APPLICATION LOADED HERE**
INCLUDE %SWIFTX\SRC\MSP430\START \ Common initialization
INCLUDE %SWIFTX\SRC\MSP430\EZ430-RF2500\START \ Power-up init.

(sorry, I had to squinch the lines up to prevent wraparound in my email
editor).

> Meanwhile I do see that the current Launchpad Quick Start Guide
>
> * http://www.ti.com/lit/ml/slac432a/slac432a.pdf
>
> says "LaunchPad includes a pre-programmed MSP430G2553 device" (that is
> the bigger device with the 16k of flash). The marketing stuff still
> says it comes with the smaller (2k) part. So this is cool.

What does "preprogrammed" mean? Presumably some sort of debugger with
the target support for the bootloader. SwiftX wouldn't use any of that
stuff (it has its own equivalents), so I wonder if it would be possible
to reprogram it? Probably locked, though.

>>> I think you know what a Launchpad costs, so the EZ430-RF2500 is pretty
>>> expensive by comparison.
>>
>> Yeah, but still not much. It really all depends on what one wants to
>> use it for.
>
> Yeah, if I were a hardware developer intending an end target of a custom
> circuit, using a fancy dev system would be worth it, but I'm more
> interested in running stuff directly on Launchpads.

The EZ430-RF2500 is awfully small and cute, too, and a lot more capable.

> Meanwhile I see that TI has just released source code for the PC side of
> the Launchpad boot loader:
>
> http://www.43oh.com/2012/05/ti-releases-bsl-for-the-msp430/
>
> That might be useful (either directly or as guidance for how the process
> works) if you want to support the board with Swift.

We'd be a lot more interested if there were a more useful target processor.

Paul Rubin

unread,
May 28, 2012, 9:53:35 PM5/28/12
to
"Elizabeth D. Rather" <era...@forth.com> writes:
> \ FILES INCLUDED BY BUILD
> ...

Lots of stuff, looks useful.
>> says "LaunchPad includes a pre-programmed MSP430G2553 device"
> What does "preprogrammed" mean? Presumably some sort of debugger

I think it means it's programmed with an LED blinking demo application.
The boot loader if I understand correctly is in ROM and communicates
with a formerly closed PC application, the source code of which is just
released, making it easier to write the host side of a tethered system.

>> the Launchpad boot loader: ..
> We'd be a lot more interested if there were a more useful target processor.

The target processor seems quite nice. 16k of flash, 0.5k of ram,
counters and timers and uarts galore, a/d and d/a converters, etc. This
is quite a bit better than the AVR8 stuff in the lower Arduinos from
what I can tell, or the 8051, etc. There are also some even fancier
MSP430'S that won't fit in the Launchpad but that there are also
inexpensive TI boards for.

Rugxulo

unread,
May 29, 2012, 3:22:39 AM5/29/12
to
Hi,

On May 28, 12:24 pm, "Elizabeth D. Rather" <erat...@forth.com> wrote:
> On 5/28/12 6:21 AM, Paul Rubin wrote:
>
> > We're talking about a microcontroller with 128 bytes of ram and 2k of
> > program flash.  There are no other processes, OS, or anything like that ;-).
>
> That's pretty small! So, when you say you want to run "two interpreters"
> I assume you mean "address interpreters"? Usually an unmodified
> "interpreter" in Forth means text interpreter, which you definitely
> don't have room for!

I hate to intrude, esp. since I have no tips, but I just had to
mention this (though caveat that I don't know its internals well, and
I'm no embedded pro):

http://en.wikipedia.org/wiki/Atari_2600

MOS 6507 @ 1.19 MHz, 128 bytes RAM, 4 kB ROM, introductory price: 199
USD, retail availability Oct. 14, 1977 (U.S.)

Heck, even later, we weren't much further along, comparatively:

http://en.wikipedia.org/wiki/Atari_Lynx

MOS 65SC02 @ 3.6 Mhz w/ Suzy (16-bit custom CMOS cpu @ 16 Mhz), 64 kb
RAM, 128-512k ROM, $179 USD, 1989 (U.S.)

...

My point is: sheesh, they still sell things that are as weak as the
2600? I've vaguely heard woes about how "painful" it was to code for
the 2600, esp. since it had so little RAM. Heck, even the ROMs were
too small, which is one of the main reasons (among others) why the
official 2600/VCS "port" of Pac-Man sucked so bad (4 kb ROM) while the
later port of Ms. Pac-Man (8 kb ROM) was much much better.

Okay, I'm way out of my element here, just found it interesting. ;-)

Anton Ertl

unread,
May 29, 2012, 4:19:14 AM5/29/12
to
Paul Rubin <no.e...@nospam.invalid> writes:
[Only two tasks]
>I'm wondering if it's a known, sensible method, to simply have two
>copies of the address interpreter, one of them using (say) R0 through
>R4, and the other using R5 through R9, to hold the DP, RP, TOS, etc.
>Then a task switch would just mean jumping from one interpreter to the
>other.

There are some consequences:

If you want to do tranditional indirect or direct-threaded code, you
don't just have to duplicate the primitives, but everything that calls
that that you use in the given task. And you would have to compile
your code specifically for the task. And you have to enhance the
dictionary structure and compiler to support all that. That does not
sound attractive.

An alternative would be to have a different inner interpreter. I
think a good option would be indirect threading with two code fields.
I.e., the header would look like this:

Name field
Link field
Code Field 1 (used by interpreter 1)
Code Field 2 (used by interpreter 2)
Body

The CFAs/XTs (in colon definition bodies, and for passing to EXECUTE)
would always point to code Field 1. Interpreter 2 would add the
offset for using Code Field 2 inside NEXT (depending on the CPU this
might not cost extra). CREATE/DOES> is left as an exercise to the
reader.

I don't think that this method is known, although there is some
related work in implementing Prolog's read/write modes in
interpreters.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/

Elizabeth D. Rather

unread,
May 29, 2012, 4:34:59 AM5/29/12
to
The MSP430G2553 looks usable, although it's awfully convenient to have a
little more RAM to download code for testing before you add it to the
kernel in flash. The smaller processors don't seem very useful, though.

Stephen Pelc

unread,
May 29, 2012, 7:52:52 AM5/29/12
to
On Sun, 27 May 2012 15:47:27 -0700, Paul Rubin
<no.e...@nospam.invalid> wrote:

>While most programs are single task and some are multi-tasking with N
>tasks, a significant bunch (say for bidirectional i/o on some port) use
>exactly two tasks. A classic thread Forth uses 4 or 5 registers in the
>address interpreter and a multitasker has to swap these around to some
>storage area on task switch, burning cycles and memory. The MSP430
>variant I'm looking at has just 128 bytes of ram, so it's not plentiful.
>
>I'm wondering if it's a known, sensible method, to simply have two
>copies of the address interpreter, one of them using (say) R0 through
>R4, and the other using R5 through R9, to hold the DP, RP, TOS, etc.
>Then a task switch would just mean jumping from one interpreter to the
>other.

Although it's good intellectual effort to ask how to fit into such
constrained environments, you should also ask why you should do so.

You can have a Cortex-M0 at lower price and power with more memory
and resources. US$0.55 in volume from Nuvoton or ST.

MSP430 is becoming more popular because the Lauchpads cost $4.30.
We got Nuvoton and some STM32 boards for less. My Raspberry Pi
cost a bit more but came in a much smaller box than a Launchpad.

Stephen (the modernist one)

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads

Albert van der Horst

unread,
May 29, 2012, 8:56:34 AM5/29/12
to
In article <jq09sj$1g3$1...@speranza.aioe.org>,
That (and SSP) is the reason I promote naming those DSP adn RSP.

>
>
>RP
>
>


--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Albert van der Horst

unread,
May 29, 2012, 8:52:14 AM5/29/12
to
In article <7xmx4tc...@ruckus.brouhaha.com>,
With all due respect yes. The greatest problem with 128 bytes of
RAM is to fit 2 stacks and a user area. With 2 tasks those areas
have to be duplicated.

Groetjes Albert

Stephen Pelc

unread,
May 29, 2012, 9:29:19 AM5/29/12
to
On 29 May 2012 12:52:14 GMT, Albert van der Horst
<alb...@spenarnc.xs4all.nl> wrote:

>>Is this silly?
>
>With all due respect yes. The greatest problem with 128 bytes of
>RAM is to fit 2 stacks and a user area. With 2 tasks those areas
>have to be duplicated.

The real problem is the assumption of 128 bytes. Why deal with
the 1970s in the 2010s?

Stephen

BruceMcF

unread,
May 29, 2012, 11:29:45 AM5/29/12
to
On May 29, 9:29 am, stephen...@mpeforth.com (Stephen Pelc) wrote:
> The real problem is the assumption of 128 bytes. Why deal with
> the 1970s in the 2010s?

Because those are the devices on the sub-$10 board in question?

Rod Pemberton

unread,
May 29, 2012, 11:55:58 AM5/29/12
to
"Albert van der Horst" <alb...@spenarnc.xs4all.nl> wrote in message
news:m4sbr...@spenarnc.xs4all.nl...
> In article <7xmx4tc...@ruckus.brouhaha.com>,
> Paul Rubin <no.e...@nospam.invalid> wrote:
> > I'm looking at the MSP430, whose instruction architecture is sort of
> > like a PDP-11, with less regularity in the address modes, but the upside
> > is that it has 16 registers (a few of them special purpose) instead of
> > 8. [...] The MSP430 variant I'm looking at has just 128 bytes of ram,
> > so it's not plentiful. [...]
> >
> >Is this silly?
>
> With all due respect yes. The greatest problem with 128 bytes of
> RAM is to fit 2 stacks and a user area. With 2 tasks those areas
> have to be duplicated.
>

16 registers
128 bytes of ram
2KB of rom

I think he already decided to cross-compile. I'd assume his dictionary and
program are both entirely in rom and not in ram. I.e., fixed dictionary,
fixed program, no user areas. If so, I'd think the ram is almost entirely
available for stacks.

He could try 32 bytes per stack for 4 stacks. He could adjust them as
needed. I.e., if control-flow uses more stack, then, 48 bytes each for 2
control-flow stacks and 16 bytes each for 2 data stacks, or vice-versa.

I'm not sure how much stack a typical Forth program uses. I know I'm using
two 256 byte stacks on a 32-bit machine without issues, so far ... That's
nothing. I set them really low while I was getting the interpreter working.
I.e., I'd guess that's roughly equivalent to two 64 byte stacks on an 8-bit
machine.

However, all that depends on his Forth system variables. I have eleven
variables and six constants in mine so far. Does he need a DP (dictionary
pointer) or LAST if his dictionary is static/fixed? He needs data/parameter
stack pointer and return/control-flow stack pointer. The initial S0/SP0 or
R0/RP0 can be hardcoded. He may or may not need >IN or BLK. So, he
might need around 10 or so ram locations for variables.

He also has alot of registers. He could move some of those Forth system
variables into registers. Or, he could shift stack items into registers.
He could ke ep constants, like zero, in register. E.g., he could keep the
three top stack items of the data stacks in registers. That would eliminate
much manipulation of the ram portion of the stack. I.e., DUP, SWAP, OVER,
etc will operate only the registers. Of course, the sequences for DROP,
pushing a value to the stack, and shifting a stack (ROLL) become larger.


What can he do to reduce rom usage?

1) use DTC (or ITC) instead of STC
2) implement the inner/address interpreter but do not implement the
outer/text interpreter
3) eliminate the dictionary headers, i.e., non-searchable
4) avoid Forth words that could be more difficult to implement,
e.g., ROLL DOES>
5) limit characters to A-Z and 0-9
6) compute characters instead of using a table
...


Rod Pemberton



BruceMcF

unread,
May 29, 2012, 12:00:12 PM5/29/12
to
On May 27, 6:47 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> I'm looking at the MSP430, whose instruction architecture is sort of
> like a PDP-11, with less regularity in the address modes, but the upside
> is that it has 16 registers (a few of them special purpose) instead of 8.

> While most programs are single task and some are multi-tasking with N
> tasks, a significant bunch (say for bidirectional i/o on some port) use
> exactly two tasks.

A presumption I've been making here is that the Forth runtime is for
compact code, and the system is headerless without a compiler-
interpreter. A sub-Forth94 interpreter-only can of course be squeezed
into very little RAM with a 31byte text input buffer, but within
anything less than 8K ROM the cost-benefit of the headers is hard to
see.

A point here is that this would not be *generic* task switching where
you are adding and removing tasks from a round-robin, but dedicated
task switching: in a 2K ROM, 128 byte device, you are going to have
one dedicated set of tasks to perform in a dedicated sequence.

So have a main task and side tasks, and define the side tasks in words
with stack effects of:
( -- )

Now, a task switch from the main task to the side task can be
(assuming one dedicated next-task-register [NT]):

PAUSE:
(1) IP>R
(2) NT>IP
(3) NEXT

First thing each side task does is load [NT] with the execution
address of the next side task in turn, then it does its thing. When it
returns, it returns back to the main task already in progress.

One [DSP], one [RSP], one [IP], one [TOS], one [NT] and eleven
available registers. After working out how many registers are needed
as registers by the processes, the balance are used as fast variables,
conserving RAM.

Paul Rubin

unread,
May 29, 2012, 2:17:39 PM5/29/12
to
steph...@mpeforth.com (Stephen Pelc) writes:
> Although it's good intellectual effort to ask how to fit into such
> constrained environments, you should also ask why you should do so.

That's what's available on these boards ;-).

> You can have a Cortex-M0 at lower price and power with more memory
> and resources. US$0.55 in volume from Nuvoton or ST.

I looked into this a little, I saw some around 70 cents in qty 5000 at
digi-key, still pretty good. But, the package is quite a bit larger and
I think the power consumption is higher, than what's available in
MSP430's, or (even smaller) AVR8's, PIC's, etc. There is an AVR in a
2*2mm package that can run an RTC on around 2 microwatts of power.[1]
MSP430's are a bit bigger, but still pretty small.

> MSP430 is becoming more popular because the Lauchpads cost $4.30.
> We got Nuvoton and some STM32 boards for less.

Interesting. Do they have anything like the retail accessibility of the
Launchpads, and comparable FOSS toolchains and so on?

> My Raspberry Pi cost a bit more but came in a much smaller box than a
> Launchpad.

Lucky you, RPi's are about $200 on ebay right now. Apparently new
orders currently are being quoted November delivery dates.

Aside from that, the RPi is a tiny workstation using something like 300
mA at 5 volts, while a Launchpad is more easily embeddable and uses
around 1 mA on unregulated 2-3V when active, and near zero power when
idle.

> The real problem is the assumption of 128 bytes. Why deal with
> the 1970s in the 2010s?

In that case, remind me again why we're talking about Forth ;-).

----

[1] http://embeddedsystemnews.com/atmels-tinyavr-flash-avr-microcontroller-package.html

Elizabeth D. Rather

unread,
May 29, 2012, 2:23:32 PM5/29/12
to
On 5/29/12 5:55 AM, Rod Pemberton wrote:
> "Albert van der Horst"<alb...@spenarnc.xs4all.nl> wrote in message
> news:m4sbr...@spenarnc.xs4all.nl...
...
>> With all due respect yes. The greatest problem with 128 bytes of
>> RAM is to fit 2 stacks and a user area. With 2 tasks those areas
>> have to be duplicated.
>>
>
> 16 registers
> 128 bytes of ram
> 2KB of rom

Well, if you've followed other parts of this thread you'll see that he's
found larger versions of the MSP430 that work in this board.

> I think he already decided to cross-compile. I'd assume his dictionary and
> program are both entirely in rom and not in ram. I.e., fixed dictionary,
> fixed program, no user areas. If so, I'd think the ram is almost entirely
> available for stacks.
>
> He could try 32 bytes per stack for 4 stacks. He could adjust them as
> needed. I.e., if control-flow uses more stack, then, 48 bytes each for 2
> control-flow stacks and 16 bytes each for 2 data stacks, or vice-versa.
>
> I'm not sure how much stack a typical Forth program uses. I know I'm using
> two 256 byte stacks on a 32-bit machine without issues, so far ... That's
> nothing. I set them really low while I was getting the interpreter working.
> I.e., I'd guess that's roughly equivalent to two 64 byte stacks on an 8-bit
> machine.

It's quite possible to run simple programs with 32 bytes per stack (16
items, on a 16-bit processor).

A terminology caution: the "control flow stack" mentioned in ANS Forth
is actually the Data Stack used during compilation, not the Return Stack
at run-time as you seem to imply.

> However, all that depends on his Forth system variables. I have eleven
> variables and six constants in mine so far. Does he need a DP (dictionary
> pointer) or LAST if his dictionary is static/fixed? He needs data/parameter
> stack pointer and return/control-flow stack pointer. The initial S0/SP0 or
> R0/RP0 can be hardcoded. He may or may not need>IN or BLK. So, he
> might need around 10 or so ram locations for variables.

He is not compiling, so certainly needs no dictionary management.
However, HERE can specify data space, e.g. a buffer for temporary use.
We don't know if he wants to manage text input, although I suspect not.
That would take some space.

> He also has alot of registers. He could move some of those Forth system
> variables into registers. Or, he could shift stack items into registers.
> He could ke ep constants, like zero, in register. E.g., he could keep the
> three top stack items of the data stacks in registers. That would eliminate
> much manipulation of the ram portion of the stack. I.e., DUP, SWAP, OVER,
> etc will operate only the registers. Of course, the sequences for DROP,
> pushing a value to the stack, and shifting a stack (ROLL) become larger.

See the discussion elsewhere in this thread about register usage.

> What can he do to reduce rom usage?
>
> 1) use DTC (or ITC) instead of STC

ITC is usually smaller than DTC on a 16-bit processor.

> 2) implement the inner/address interpreter but do not implement the
> outer/text interpreter
> 3) eliminate the dictionary headers, i.e., non-searchable

Goes with ditching the text interpreter, and standard practice on
embedded systems.

> 4) avoid Forth words that could be more difficult to implement,
> e.g., ROLL DOES>

ROLL should be avoided on the grounds of good programming practice :-)
But DOES> actually saves space in the target, as it is a means of
avoiding repeating code sequences.

> 5) limit characters to A-Z and 0-9

Can't imagine how that helps. An 8-bit char handles all ASCII, just not
extended chars.

> 6) compute characters instead of using a table

Doesn't everyone?

Dirk Bruehl

unread,
May 29, 2012, 2:25:03 PM5/29/12
to mik....@googlemail.com, Di...@bruehlconsult.com
On 29 Mai, 07:52, stephen...@mpeforth.com (Stephen Pelc) wrote:
>
> You can have a Cortex-M0 at lower price and power with more memory
> and resources. US$0.55 in volume from Nuvoton or ST.
>

I never heard about Nuvoton before, so I googled for Nuvoton, and I
read "Nuvoton Technology Corp. was founded upon future expectation to
create a new era by innovative inspiration." Nuvoton is a 2008 Winbond
spin off based in Taiwan.

They have a myriad of different types of ARM micros at Nuvoton, and a
myriad of distributors is listed on their webpage - but no part
related link.

The smallest Nuvoton ARM device is $2.18, the MSP430G2553 is $2.64,
each per single unit @ DigiKey.
Operating current of the Nuvoton Mini51 series is 4mA @ VDD = 3.3V at
12 MHz,
operating current of TI's MSP430G2553 is 3mA @ VDD = 3.3V at 12 MHz.
Both have sleep modes, and while the MSP430 Standby Mode is specified
with 0.5 μA,
the Mini51 Standby Mode is specified with 10 μA.
The smallest Mini51 is 33-pin QFN, the MSP430G2553 is the same size or
20pin TSSOP, and is available as 20pin DIP, which is important for
people like me.

> MSP430 is becoming more popular because the LaunchPads cost $4.30.
> We got Nuvoton and some STM32 boards for less.

Stephen, please share your knowledge about these Nuvoton and some
STM32 boards which you got for less. I found two boards at www.nuvoton.com,
the NuTiny-Mini51 SDK ($28.50 @ DigiKey) and the Nu-LB-Mini51 ($48.45
@ DigiKey). Ten time as much as the LaunchPad - more powerful, I am
sure, but that's not the point.

STM ARM boards start with $7.99 @ Mouser, $49.60 @ DigiKey.

How much would I have to pay to get a board for educational purposes
containing Forth?

Regards,
Dirk Bruehl.

> My Raspberry Pi
> cost a bit more but came in a much smaller box than a Launchpad.
>
> Stephen (the modernist one)
>
> --
> Stephen Pelc, stephen...@mpeforth.com

Paul Rubin

unread,
May 29, 2012, 2:39:49 PM5/29/12
to
Albert van der Horst <alb...@spenarnc.xs4all.nl> writes:
> With all due respect yes. The greatest problem with 128 bytes of
> RAM is to fit 2 stacks and a user area. With 2 tasks those areas
> have to be duplicated.

I think it's not too bad. A GA144 node has 10 data stack and 8 return
stack slots or so, which don't seem to be its biggest constraint, and
this processor is sort of comparable. In the two-task setup if the code
follows reasonable practices, it should be possible to statically figure
out the max stack depths for each task, and allocate memory accordingly.
3-5 words (2 bytes/word) for each stack (especially with a bit of
compiler inlining to save return slots) is likely to be enough for the
sort of thing I have in mind, plus there might be a few static
variables. Programming in assembler it's probably feasible for some
sensible programs to use no ram at all, other than the registers.

BruceMcF

unread,
May 29, 2012, 3:26:59 PM5/29/12
to
On May 29, 2:39 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Programming in assembler it's probably feasible for some
> sensible programs to use no ram at all, other than the registers.

The prospect for several tasks using no *extra* RAM goes up if your
task switching relies on:
( -- )
... side tasks, since other than the stacks, you have effectively 11
cellwide variables to work with before you need to go out to RAM.

For the stacks, the relevant stack depth (for both stacks) is the
greatest of the deepest that the main task stack gets, or the deepest
that the main task stack is when it PAUSEs, plus the deepest a side
stack goes. So if the deepest a side task goes is four cells, and the
deepest that the main task is when calling PAUSE is four cells, seven
cells plus TOS will do, and the main task can be as much as eight deep
between PAUSE. Similar for the RSTACK though with likely lower amounts
both ways.

Bernd Paysan

unread,
May 29, 2012, 4:12:27 PM5/29/12
to
Rod Pemberton wrote:

> "Paul Rubin" <no.e...@nospam.invalid> wrote in message
>> Is this silly?
>
> My first impression is: "No, it's not silly."

I think this is silly. Why not just exchange the few registers you need
in the PAUSE code? For TOS+SP+RP+IP, you need a total of 12 moves (or
xors if you don't like tmp registers ;-) to get the two sets exchanged.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Bernd Paysan

unread,
May 29, 2012, 4:16:31 PM5/29/12
to
Paul Rubin wrote:
> Apparently recent TI Launchpads are being shipped with the bigger
> processor, though it's still advertised as coming with the smaller
> one.

You actually get both - the big and the small one.

Stephen Pelc

unread,
May 29, 2012, 4:15:31 PM5/29/12
to
On Tue, 29 May 2012 11:17:39 -0700, Paul Rubin
<no.e...@nospam.invalid> wrote:

>> MSP430 is becoming more popular because the Lauchpads cost $4.30.
>> We got Nuvoton and some STM32 boards for less.
>
>Interesting. Do they have anything like the retail accessibility of the
>Launchpads, and comparable FOSS toolchains and so on?

There are a gazillion free and FOSS toolchains for ARM and Cortex.
MPE has broken them all at one time or another. Start with Code
Sourcery. But note that significant Forth developments come from
the vendors, especially for embedded systems.

In terms of hobby access, look for the STM32F4 Discovery board (yum,
yum) and the STM32F0 board. Whether it's $4.30 or $12, who cares?
Personally, I think it's a USAnian disease to worry about hardware
price alone. At board level, the cost of silicon is rapidly declining
towards zero, leaving the cost of the board dependent on the PCB
and the volume.

Yes, you probably can achieve slightly less idle power with a an
MSP430, but what matters is system consumption. There are several
people, e.g. Energy Micro, who will argue that system consumption
on a Cortex-M3 is better than that of an MSP430.

We have clients who use ARMs for underwater instrumentation, and
they report that the new hardware uses less overall power than the
previous 16 bit hardware. The 32 bit systems are just so much more
power efficient at doing the crunching.

>> The real problem is the assumption of 128 bytes. Why deal with
>> the 1970s in the 2010s?
>
>In that case, remind me again why we're talking about Forth ;-).

Because Forth moved on significantly about 10-15 years ago and the
functionality per 100kb of Forth cannot be beat. Even with the
text interpreter left in. Once you accept the changes of the
modern world, you can stop worrying about 128 bytes on an
embedded system. You can have full-blown text interpreter and
life becomes quite delightful again.

Stephen

Elizabeth D. Rather

unread,
May 29, 2012, 4:57:05 PM5/29/12
to
On 5/29/12 10:16 AM, Bernd Paysan wrote:
> Paul Rubin wrote:
>> Apparently recent TI Launchpads are being shipped with the bigger
>> processor, though it's still advertised as coming with the smaller
>> one.
>
> You actually get both - the big and the small one.
>

What's unclear to me is the availability of the big one for user code.

Paul Rubin

unread,
May 29, 2012, 5:23:19 PM5/29/12
to
"Elizabeth D. Rather" <era...@forth.com> writes:
> What's unclear to me is the availability of the big one for user code.

Yes, it is available for user code. "Preprogrammed" means it comes in
the box with a demo application that blinks the LEDs. It's flash memory
so you can overwrite that program with your own program. Take a look at
4e4th.eu for a Forth-preprogrammed version (click the UK flag at the
upper right of the white box with the German text).

John Passaniti

unread,
May 29, 2012, 5:24:36 PM5/29/12
to
On Tuesday, May 29, 2012 9:29:19 AM UTC-4, Stephen Pelc wrote:
> The real problem is the assumption of 128 bytes.
> Why deal with the 1970s in the 2010s?

Many of the systems I've worked on in the past-- and many that I continue to work on-- are multiprocessor systems. The main processor may be a big 32-bit microcontroller, but it's not the only processor in the system. In one recent system, we offloaded the processing needed for the front panel (32 12-segment LED bargraphs, a character LCD, a bunch of switches) onto a small 8-bit microcontroller, freeing the main processor of updating all that and making the interface simpler. The same system had a network audio card that for legacy reasons had a parallel bus interface, but the module on that card was serial. Slapping a tiny 8-bit microcontroller there was the way we bridged those two worlds. Another system had a small 8-bit microcontroller that acted as a system-level supervisory watchdog.

Here's another example: We're considering using the i.MX28 (an ARM-based SoC from Freescale) for an upcoming project. Unfortunately, it's SPI implementation is painful and to get around that pain, we're going to need to either write a lot of clever code on the i.MX28 or we're going to have to do something externally. I'm advocating going external and we'll likely use a small 8-bit microcontroller with a lot of I/O pins to handle the SPI issues and absorb a lot of the "glue" logic that we need.

The people who believe that 8-bit microcontrollers-- especially tiny ones-- are dead are the people who apparently don't have to be concerned with costs or who haven't yet figured out that factoring in hardware (by using multiprocessing) can make systems simpler and easier.

Dirk Bruehl

unread,
May 29, 2012, 5:57:13 PM5/29/12
to mik....@googlemail.com, Di...@bruehlconsult.com
On 28 Mai, 21:53, Paul Rubin <no.em...@nospam.invalid> wrote:
> ... There are also some even fancier
> MSP430'S that won't fit in the Launchpad but that there are also
> inexpensive TI boards for.

Are you writing about TI's MSP-EXP430FR5739 Experimenter Board ?

DB.

Mark Wills

unread,
May 30, 2012, 4:59:26 AM5/30/12
to
Not at all. Why use an 8-bit part, if you can buy a 16-bit part for
the same price? Have you actually looked at the price of the MSP430
family in quantity? They're cheap as chips. They pack a whole range of
IO features, and have flash and RAM all inside the chip, reducing
board layout costs and PCB sizes, power consumption, and heat
dissipation. They're every bit as good as any 8-bit MCU I can recall,
though being 16-bit makes them a lot better IMHO. 8-bit processors
rapidly become a fiddly pain in the arse if you want to do any math -
16 bit makes it a bit easier, especially as they usually come with 32
bit multiply and divide, whereas most 8 bit parts have no MUL/DIV at
all.

Regarding your 8 bit MCU driving 32 12 segment displays, how, and why?
Why wouldn't you use discrete bar drivers? That's what they're for.
Did your 8 bit MCU have 32 ADCs? Clearly not. So how did you multiplex
32 audio signals into (possibly one?) ADC channel, and how did you
multiplex the output to each individual bar display. And how did you
do this fast enough that the displays were accurate? I presume the bar
displays were scaled in db's? How did you multiplex the input, perform
the calculation (I guess it could be a simple look-up), and the
multiplex the output fast enough to update 32 bar graph's in real time?

Albert van der Horst

unread,
May 30, 2012, 5:34:52 AM5/30/12
to
In article <7xlika6...@ruckus.brouhaha.com>,
But the GA144 is not programmed in Forth! It is programmed in restricted
colorforth aka GA144 assembler.

Paul Rubin

unread,
May 30, 2012, 6:02:35 AM5/30/12
to
Albert van der Horst <alb...@spenarnc.xs4all.nl> writes:
> But the GA144 is not programmed in Forth! It is programmed in restricted
> colorforth aka GA144 assembler.

Maybe I should just implement the GA "exchange" or coroutine jump
instruction. It swaps the program counter with the top of the return
stack, making no attempt to have separate stacks or registers per task.

Paul Rubin

unread,
May 30, 2012, 6:05:14 AM5/30/12
to
Mark Wills <markrob...@yahoo.co.uk> writes:
> though being 16-bit makes them a lot better IMHO. 8-bit processors
> rapidly become a fiddly pain in the arse if you want to do any math -
> 16 bit makes it a bit easier, especially as they usually come with 32
> bit multiply and divide,

The cheap msp430's that run in the launchpad don't have mul/div either.
The multiplier in the models that have it is fairly fast as such things
go, though.

Mark Wills

unread,
May 30, 2012, 6:19:27 AM5/30/12
to
On May 30, 11:05 am, Paul Rubin <no.em...@nospam.invalid> wrote:
You haven't said what your application actually *is*. (Apologies if I
missed it). Are you *sure* you even need to run two tasks? Could it
not be done in one task with a state machine?

Arnold Doray

unread,
May 30, 2012, 10:10:57 AM5/30/12
to
Isn't that similar to R stack manipulation?

AIUI, the coroutine jump method is not meant for toggling between
stateful tasks, since state is lost. For toggling between tasks wouldn't
you would have to add saving the registers, which is equivalent to
Elizabeth's suggestion?

I can't see how you can get away with using just one pair of data/return
stacks, unless you assume that each task either leaves a clean stack or
does not require the contents of the stack to be preserved during a
switch. Or you could copy the stack to some safe area during the switch.
Much easier to have two pairs of stacks. 128 bytes is plenty, I think.

Cheers,
Arnold



John Passaniti

unread,
May 30, 2012, 10:20:41 AM5/30/12
to
On Wednesday, May 30, 2012 4:59:26 AM UTC-4, Mark Wills wrote:
> Not at all. Why use an 8-bit part, if you can buy a
> 16-bit part for the same price? Have you actually
> looked at the price of the MSP430 family in quantity?

Yep, we have. And when you actually take the time to compare the actual features we are using (number of GPIOs, amount of RAM, amount of flash) the Freescale 8-bit parts still win. That won't always be the case, but in our specific case, it is.

The Freescale 8-bit parts we're using also win in other ways. First, we're very familiar with them, know their quirks, and have loads of experience with the tools; time to market matters. Second, we're already using those 8-bit microcontrollers in other designs, so we can drive up the quantities. Third, we're extremely happy with our supply chain and support FAE's and we get price breaks because of *other* products we buy.

> [...] 8-bit processors rapidly become a fiddly pain in the arse
> if you want to do any math - 16 bit makes it a bit easier,
> especially as they usually come with 32 bit multiply and divide,
> whereas most 8 bit parts have no MUL/DIV at all.

The Freescale parts we're using (in the HCS08 family) do have 8x8 multiply and 16x8 divide. But that doesn't matter at all to us. First, we're not programming them in assembly language, but in C. And the C compiler's libraries provide not only full 32x32 multiply and divide but also floating point. Second, we're not using these microcontrollers for math-intensive operations. Our use is primarily for control. That being said, even the slow floating point library routines are many times faster than what we need.

> Regarding your 8 bit MCU driving 32 12 segment
> displays, how, and why? Why wouldn't you use
> discrete bar drivers? That's what they're for.
> Did your 8 bit MCU have 32 ADCs? Clearly not.
> So how did you multiplex 32 audio signals into
> (possibly one?) ADC channel, [...]

The front panel connects to the rest of the system using a SPI channel. Like in all reasonable audio equipment, no audio is going to the front panel. These systems are digital audio signal processors and power amplifiers; there are separate DSPs handling the processing of audio. Related to metering, it's the DSPs that handle peak detection, level hold, and ballistics. The front panel is only responsible for getting metering data and displaying it. This frees the DSPs and main microcontroller from having to service the front panel. As for why we don't use dedicated bar drivers is because it's less expensive to have the little 8-bit microcontroller drive that LED matrix directly.

> [...] And how did you do this fast enough that the
> displays were accurate? I presume the bar
> displays were scaled in db's? How did you multiplex
> the input, perform the calculation (I guess it
> could be a simple look-up), and the multiplex the
> output fast enough to update 32 bar graph's in real
> time?

The DSP handles the math. The 8-bit microcontroller handles the control aspects of the front panel.

For many applications-- especially control-oriented-- 8-bit microcontrollers are still more than enough to get the job done, leverage existing experience, and have excellent support, and often come in at lower prices. Do they always make sense? No, of course not. We had a design that started off with an 8-bit HCS08 microcontroller (a QE128), but in that specific application we did actually have to do real-time impedance calculations. That required being able to sample at 48kHz, which the part could do, but it couldn't keep up all it's other duties that we placed on it. So in that case, we moved to the MCF51 version of the part, which is the exact same pin-out but replaces the 8-bit CPU with a 32-bit CPU. Recompile, fix some minor portability issues, and presto-- it works fine with plenty of margin.

You're free to continue to offer suggestions, but your Derren Brown skills of mind-reading are pretty poor. I'll be happy to discuss details of the design in private email. But really, the details aren't important. This sub-thread is about the reality that these days many hardware designs are best achieved not with a singular processor, but with multiple processors each dedicated to a subsystem. And there, depending on what you're trying to accomplish, there are still many places where 8-bit microcontrollers win.

Paul Rubin

unread,
May 30, 2012, 11:24:26 AM5/30/12
to
Mark Wills <markrob...@yahoo.co.uk> writes:
> You haven't said what your application actually *is*. (Apologies if I
> missed it). Are you *sure* you even need to run two tasks? Could it
> not be done in one task with a state machine?

Well I'm thinking sort of generically. Yes of course it's always
possible to write these things with state machines, but using tasks or
coroutines is often more natural.

Paul Rubin

unread,
May 30, 2012, 11:34:04 AM5/30/12
to
Arnold Doray <inv...@invalid.com> writes:
> Isn't that similar to R stack manipulation?

I'm not sure how to do that operation with standard Forth words.
You have to be able to jump into the middle of another word.

> AIUI, the coroutine jump method is not meant for toggling between
> stateful tasks, since state is lost.

They'd no longer be independent tasks. Each would have to know what the
other was doing. But, they could pass stack args to each other, which
is nice.

> I can't see how you can get away with using just one pair of data/return
> stacks, unless you assume that each task either leaves a clean stack or ...

No I think it works out quite nicely. It's not the same thing as
independent multitasking, but it has very low overhead, especially if
the top of the return stack is in a register, so you'd just swap two
registers.

Paul Rubin

unread,
May 30, 2012, 12:12:56 PM5/30/12
to
steph...@mpeforth.com (Stephen Pelc) writes:
>>> We got Nuvoton and some STM32 boards for less. ...
> There are a gazillion free and FOSS toolchains for ARM and Cortex.

I remember looking into the STM32 boards and finding they need
proprietary tools, but maybe I missed something. I certainly didn't
think of them as operating in the same space as the Launchpad or Arduino
(i.e. very low power control-oriented device). Maybe I'm wrong about
the toolchain:

http://cu.rious.org/make/stm32f4-discovery-board-with-linux/

It looks attractive other ways too:

http://www.ecnmag.com/products/2011/10/stm32f4-discovery-kit-now-available

> In terms of hobby access, look for the STM32F4 Discovery board (yum,
> yum) and the STM32F0 board. Whether it's $4.30 or $12, who cares?

The STM32F4 appears to be about $15, still very affordable. The
Launchpad is attractive in that I can load some code, attach a battery
to it, and slap it into something without worrying about it for a long
time. It can probably run for months on a penlight cell.

>>In that case, remind me again why we're talking about Forth ;-).
> Because Forth moved on significantly about 10-15 years ago

I tried to figure out what significant change came to Forth 10-15 years
ago, and found:

Forth has been a recognized programming language since the
1970's. ColorForth is a redesign of this classic language for the
21st century. It also draws upon a 20-year evolution of minimal
instruction-set microprocessors. Now implemented to run under
Windows, it can also stand-alone without an operating system.

I'm not sure if that's what you had in mind ;-)

> you can stop worrying about 128 bytes on an embedded
> system. You can have full-blown text interpreter and life becomes
> quite delightful again.

It looks like I can have a text interpreter on the bigger Launchpad chip
(16k flash, 0.5k ram), which could be pretty cool. Now if there were
just an ultra-cheap Bluetooth interface, I could control it from a
phone. I don't know why the embeddable ones I can find are so
expensive. Hmm, maybe I could use an audio interface instead (plug
cable between Launchpad and the phone's mic/headphone socket, with
low-speed software modems at each end).

Dirk Bruehl

unread,
May 30, 2012, 11:20:51 PM5/30/12
to mik....@googlemail.com, Di...@bruehlconsult.com
On 30 Mai, 12:12, Paul Rubin <no.em...@nospam.invalid> wrote:
>
> The STM32F4 appears to be about $15, still very affordable.  The
> Launchpad is attractive in that I can load some code, attach a battery
> to it, and slap it into something without worrying about it for a long
> time.  It can probably run for months on a penlight cell.

If $15 is not a problem, you may look at TI's MSP-EXP430FR5739
Experimenter Board for $29 with 16KB FRAM / 1KB SRAM - you can run it
on a penlight cell much longer, FRAM values are safe even without
battery! May be you only need a super cap to run it for several month!

Shortly after I started with Forth in 1984 I read about FRAM, and
recently the FRAM microcontroller, the micro of my dreams, came into
life: In my opinion the MSP430FR5739 is the ideal microcontroller to
run Forth on it - doesn't have FLASH, only mostly nonvolatile RAM
(16k) to store programs and data. That's why I encouraged Michael to
put a Forth on it: look at http://www.camelforth.com - Contributed:
MSP430 FRAM (MSP-EXP430FR5739) or at http://www.forth-ev.de/repos/CF430FR/

There is enough room on it to establish your special double task
feature.

DB.

David Kuehling

unread,
May 31, 2012, 4:09:45 AM5/31/12
to
>>>>> "Albert" == Albert van der Horst <alb...@spenarnc.xs4all.nl> writes:

> In article <7xmx4tc...@ruckus.brouhaha.com>,
> Paul Rubin <no.e...@nospam.invalid> wrote:
[..]
>> I'm wondering if it's a known, sensible method, to simply have two
>> copies of the address interpreter, one of them using (say) R0 through
>> R4, and the other using R5 through R9, to hold the DP, RP, TOS, etc.
>> Then a task switch would just mean jumping from one interpreter to
>> the other.
>>
>> Is this silly?

> With all due respect yes. The greatest problem with 128 bytes of RAM
> is to fit 2 stacks and a user area. With 2 tasks those areas have to
> be duplicated.

Some of the german Forthers I met at Linuxtag suggested to not even give
the second task its own stacks. Instead running it like an interrupt on
top of the stack of the currently running word. Yes the second task
would need to balance stacks at PAUSE. Very accaptable compromise for a
task blinking LEDs or doing some simple state-machine I/O processing.

David
--
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205 D016 7DEF 5323 C174 7D40

Paul Rubin

unread,
May 31, 2012, 5:07:43 AM5/31/12
to
David Kuehling <dvdk...@gmx.de> writes:
> Some of the german Forthers I met at Linuxtag suggested to not even give
> the second task its own stacks. Instead running it like an interrupt on
> top of the stack of the currently running word. Yes the second task
> would need to balance stacks at PAUSE. Very accaptable compromise for a
> task blinking LEDs or doing some simple state-machine I/O processing.

Yes, this is what I'm planning to do now, basically the colorforth EX
instruction (swap P and top of R). I might make an RTUCK primitive
for when a task does want to save something before pausing but it's
probably saner to allocate a few registers for these purposes.

Albert van der Horst

unread,
May 31, 2012, 7:44:45 AM5/31/12
to
In article <7xbol5e...@ruckus.brouhaha.com>,
The difference between
XCHG %ESI, [%EBP]
and
XCHG %ESI, %EBP

Come on, don't you have other things to worry about?

I have used it and I can tell you, the manipulation to have
the data available is the main overhead.

I use the name CO in ciforth, maybe Chuck's name `` ;: '' is better.
Does it catch on (that name I mean) ?

Albert van der Horst

unread,
May 31, 2012, 7:32:26 AM5/31/12
to
In article <2f556466-f84d-4bd0...@googlegroups.com>,
John Passaniti <john.pa...@gmail.com> wrote:
<SNIP>
>
>You're free to continue to offer suggestions, but your Derren Brown skills =
>of mind-reading are pretty poor. I'll be happy to discuss details of the d=
>esign in private email. But really, the details aren't important. This su=
>b-thread is about the reality that these days many hardware designs are bes=
>t achieved not with a singular processor, but with multiple processors each=
> dedicated to a subsystem. And there, depending on what you're trying to a=
>ccomplish, there are still many places where 8-bit microcontrollers win.

I respect your judgement. Still in one of my last jobs they were
moving all control to work via a dedicated industrial PC that emulates
PLC's. The main program was in Java, and the bottom line was that I
couldn't count 1 mS pulses myself in Java and had to write a remote
control for the PLC and have written the lowlevel part in a different
PLC-compatible language. This makes a Forther cringe, probably.

Still somehow I thought it made a lot of sense. What they were moving
away from had was a couple of boards with 8-bit controllers. They face
the problem that those go end of life. Then the board has to be
redesigned and the program has to be adapted to a new processor.

Of course the PLC-computer has a lot of plug in modules for different
controls. Presumably these are black boxes with ... your 8 bit micro
controllers. But now the responsability for personal injury and
mission critical aspects is with very experienced designers of black
boxes.

Alex McDonald

unread,
May 31, 2012, 10:02:00 AM5/31/12
to
On May 31, 12:44 pm, Albert van der Horst <alb...@spenarnc.xs4all.nl>
wrote:
> In article <7xbol5el8z....@ruckus.brouhaha.com>,
> Paul Rubin  <no.em...@nospam.invalid> wrote:
>
>
>
>
>
>
>
>
>
> >Arnold Doray <inva...@invalid.com> writes:
> >> Isn't that similar to R stack manipulation?
>
> >I'm not sure how to do that operation with standard Forth words.
> >You have to be able to jump into the middle of another word.
>
> >> AIUI, the coroutine jump method is not meant for toggling between
> >> stateful tasks, since state is lost.
>
> >They'd no longer be independent tasks.  Each would have to know what the
> >other was doing.  But, they could pass stack args to each other, which
> >is nice.
>
> >> I can't see how you can get away with using just one pair of data/return
> >> stacks, unless you assume that each task either leaves a clean stack or ...
>
> >No I think it works out quite nicely.  It's not the same thing as
> >independent multitasking, but it has very low overhead, especially if
> >the top of the return stack is in a register, so you'd just swap two
> >registers.
>
> The difference between
>     XCHG  %ESI, [%EBP]
> and
>     XCHG  %ESI, %EBP
>
> Come on, don't you have other things to worry about?

With the execution frequency of the XCHG, it may be a concern. The
first XCHG with a memory operand issues a LOCK, which (depending on
processor; later x86 processors appear better in this respect) is
reported to slow down operations significantly. I've not done any
benchmarking on this; YMMV.

BruceMcF

unread,
May 31, 2012, 10:17:03 AM5/31/12
to
Yes, as long as your side-tasks are ( -- ) stack-effect words, that'll
work ~ push the side-routine xt on the return stack, call CO, as soon
as the co-routine completes, execution resumes after CO. PAUSE can
push a dedicated side-task register to the return stack, but you also
get co-routine support as well.

The side-task chain could work the same way either way ~ each side-
task over-writes the side-task register with the xt of the next side-
task in the chain, goes about its business, and when its done,
execution of the main task loop picks up where it was paused.

In the bigger processor, you could walk along a side-task queue or
chain, but for the smaller processor, baking the task sequence in is
most space-efficient.

John Passaniti

unread,
Jun 1, 2012, 10:07:58 AM6/1/12
to
On Thursday, May 31, 2012 7:32:26 AM UTC-4, Albert van der Horst wrote:
> I respect your judgement. Still in one of
> my last jobs they were moving all control
> to work via a dedicated industrial PC that
> emulates PLC's. [...]

You don't have to respect my judgement. The only thing you (and that's the generic "you" here) have to respect is that different embedded systems have different requirements and constraints. So there is no singular right answer and that answer will change over time. Statements about the irrelevance of 8-bit microcontrollers appear to be lost on chip manufacturers who in response to demand are still producing them. Statements about the design decision to not factor embedded system hardware by using multiple microcontrollers appear to be lost on most every non-trivial design I've come across in the past 25 years.
0 new messages