PFS173 - Inexpensive MCU for Forth

gnuarm.del...@gmail.com

unread,

Oct 15, 2018, 10:40:57 AM10/15/18

to

This MCU has a real lack of registers with only an accumulator. Everything is done to/from memory with a single indexed addressing instruction which is only load/store. Can a viable Forth be written for it?

http://www.padauk.com.tw/upload/doc/PFS173_datasheet_v000_EN_20180816.pdf

I realize code can be output by a cross-compiler and optimized for the instruction set. I'm wondering how painful the code would be. On the other hand I expect it to be horrible for any other language too.

BTW, in a 14 pin SOP (I think) it is only 0.05 USD qty 100. Not bad.

1. 3KW MTP program space (programmable more than 1,000 times)
2. 256 Bytes data space
3. A 16-bit timer
4. Two 8-bit timers with PWM function
5. One set of three 11-bit SuLED (Super LED) PWM generators and counters
6. Provide a comparator
7. Band-Gap circuit provides 1.20V reference voltage
8. Up to 13 channels of 8-bit precision ADC (one of the channels is from the band-gap voltage)
9. ADC reference voltage: external input, internal VDD
10. Maximum 18 IO pins

Rick C.

a...@littlepinkcloud.invalid

unread,

Oct 16, 2018, 6:13:39 AM10/16/18

to

gnuarm.del...@gmail.com wrote:

> This MCU has a real lack of registers with only an accumulator.
> Everything is done to/from memory with a single indexed addressing
> instruction which is only load/store. Can a viable Forth be written
> for it?

It'd be hard. Some kind of umbilical system a la chipFORTH might work,
but it's a very 8-bit processor, even more so than 8051.

Andrew.

Lars Brinkhoff

unread,

Oct 16, 2018, 7:36:12 AM10/16/18

to

> 1. 3KW MTP program space (programmable more than 1,000 times)
> 2. 256 Bytes data space

This is the kind of low-end part my Forth cross compiler targets.

a...@littlepinkcloud.invalid

unread,

Oct 16, 2018, 11:14:58 AM10/16/18

to

That's not the real problem. The 8051 would be described as similar to
the above, and the 8051 is an OK Forth target.

Andrew.

Paul Rubin

unread,

Oct 20, 2018, 6:38:34 PM10/20/18

to

gnuarm.del...@gmail.com writes:
> http://www.padauk.com.tw/upload/doc/PFS173_datasheet_v000_EN_20180816.pdf
> ... in a 14 pin SOP (I think) it is only 0.05 USD qty 100. Not bad.

> 1. 3KW MTP program space (programmable more than 1,000 times)

> 2. 256 Bytes data space...
> This MCU has a real lack of registers... Can a viable Forth be
> written for it?

Besides handling the addressing modes you also have to decide the data
cell size. It's an 8-bit MCU with very little memory, but Forth
traditionally uses data cells that are the same size as addresses and
that are big enough for typical integers, i.e. 16 bits. I remember that
some Dutch Forthers have written Forths with 8-bit data cells (that
means the data stack is 1 byte wide) but I don't know how usable those
were. Do you want 8 bit cells, or 16 bit, or something else?

> I realize code can be output by a cross-compiler and optimized for the
> instruction set. I'm wondering how painful the code would be. On the
> other hand I expect it to be horrible for any other language too.

I'd guess that the usual Forth VM using subroutines (words) with deeply
nested factoring, passing parameters on the data stack, and doing
intermediate calculations as stack operations, would take a considerable
hit in speed and maybe code density.

In C for something like this, you'd use locals freely and avoid indirect
function calls and reentrancy. An MCU-oriented C compiler would then
possibly allocate a pseudo-register for each local or intermediate
result in the program, analyze the whole program's call graph, and use
graph coloring (or whatever) to assign static memory cells to the
pseudo-registers so they'd be re-used without interfering with each
other. It would also inline functions that were called only once, etc.
You maybe still take a hit compared to writing in assembler, where you'd
probably judiciously use the accumulator for parameter and return values
in the lowest level functions, and use memory slots at the higher
levels.

It's not too hard to write Forth in a style where the compiler can
statically know the stack picture (i.e. don't use things like ?DUP).
The compiler could then do something like the above C example, but it
seems like an abstraction inversion: what do you get from it?

I have no real experience programming devices this small so don't have a
clear sense of whether carefully written assembly code is different from
what's reasonable for a compiler. But, this Padauk device has more
program and data memory than small PIC and AVR parts that are often
programmed in C. So it all seems doable in typical MCU applications
where even a significant efficiency hit won't matter much.

a...@littlepinkcloud.invalid

unread,

Oct 21, 2018, 9:08:43 AM10/21/18

to

Paul Rubin <no.e...@nospam.invalid> wrote:

> In C for something like this, you'd use locals freely and avoid
> indirect function calls and reentrancy. An MCU-oriented C compiler
> would then possibly allocate a pseudo-register for each local or
> intermediate result in the program, analyze the whole program's call
> graph, and use graph coloring (or whatever) to assign static memory
> cells to the pseudo-registers so they'd be re-used without
> interfering with each other. It would also inline functions that
> were called only once, etc.

I don't think it'd be practical. Even if it did work, you'd run out of
code and data memory very quickly. What would work, IMO, is a Forthish
macro-assembler with carefully chosen macros for your needs.

> I have no real experience programming devices this small

Hmm. :-)

I've written a Forth for 8051-class processors which worked well with
4 or 8k of RAM memory and 128-256 bytes of internal RAM. It worked out
OK and even allowed multiple tasks to run in that space. However, I
can't see a way to make Forth work well on this ISA and fit moderately
complex applications in the available space.

Andrew.

Paul Rubin

unread,

Oct 22, 2018, 4:52:35 PM10/22/18

to

a...@littlepinkcloud.invalid writes:
> I can't see a way to make Forth work well on this ISA and fit
> moderately complex applications in the available space.

Yeah, I think this is a situation where for Forth (or C or whatever) to
be competitive, its code footprint has to be comparable to assembler.
If you're shipping a million units of your gadget and the Forth program
fills the 3k of program flash but rewriting it in assembler lets it fit
in 2k so you can replace the expensive 5 cent MCU with a cheaper 4 cent
version, the economics say to use assembler.

But, it seems to me that this part is pretty capacious for lots of MCU
applications. I imagine the proverbial fancy digital wristwatch with an
LCD, some pushbuttons, stopwatch and alarm features, etc.; and a few
dozen bytes of ram plus the 3k program space seems like plenty for that.

Can I ask how many levels of call stack your 8051 programs typically
used, counting only words that were called from more than one place?
The idea is if a word is called from only one place, the compiler can
inline it, saving a couple of instructions and a return stack slot.
It's similarly possible for an analytical compiler to mostly eliminate
the data stack.

Also, maybe something like the famous Apple II "Sweet 16" VM could run
on this part.

a...@littlepinkcloud.invalid

unread,

Oct 23, 2018, 4:00:54 AM10/23/18

to

Paul Rubin <no.e...@nospam.invalid> wrote:
> a...@littlepinkcloud.invalid writes:
>> I can't see a way to make Forth work well on this ISA and fit
>> moderately complex applications in the available space.
>
> Yeah, I think this is a situation where for Forth (or C or whatever) to
> be competitive, its code footprint has to be comparable to assembler.

You can get Forth to be tighter than straightforward assembler by
generating byte tokens. The Forth core can be squeezed into maybe 750
bytes and the app fits in the rst of memory.

> If you're shipping a million units of your gadget and the Forth program
> fills the 3k of program flash but rewriting it in assembler lets it fit
> in 2k so you can replace the expensive 5 cent MCU with a cheaper 4 cent
> version, the economics say to use assembler.
>
> But, it seems to me that this part is pretty capacious for lots of MCU
> applications. I imagine the proverbial fancy digital wristwatch with an
> LCD, some pushbuttons, stopwatch and alarm features, etc.; and a few
> dozen bytes of ram plus the 3k program space seems like plenty for that.

I doubt it. It certainly wouldn't fit if the watch had a bitmap
display.

> Can I ask how many levels of call stack your 8051 programs typically
> used, counting only words that were called from more than one place?

That's very hard to say: I never measured it.

> The idea is if a word is called from only one place, the compiler can
> inline it, saving a couple of instructions and a return stack slot.
> It's similarly possible for an analytical compiler to mostly eliminate
> the data stack.
>
> Also, maybe something like the famous Apple II "Sweet 16" VM could run
> on this part.

Forth is better for this job than Sweet 16. It's more expressive,
easier to write, and it's just as compact.

Andrew.

gnuarm.del...@gmail.com

unread,

Oct 23, 2018, 9:36:55 PM10/23/18

to

On Monday, October 22, 2018 at 4:52:35 PM UTC-4, Paul Rubin wrote:
> a...@littlepinkcloud.invalid writes:
> > I can't see a way to make Forth work well on this ISA and fit
> > moderately complex applications in the available space.
>
> Yeah, I think this is a situation where for Forth (or C or whatever) to
> be competitive, its code footprint has to be comparable to assembler.
> If you're shipping a million units of your gadget and the Forth program
> fills the 3k of program flash but rewriting it in assembler lets it fit
> in 2k so you can replace the expensive 5 cent MCU with a cheaper 4 cent
> version, the economics say to use assembler.

You are not only correct, but absolutely right. However... you can write and debug the app 10 times faster in Forth than assembly, then hand optimize the code to reduce it to 2k and still have done it faster and more accurately.

Rick C.

Paul Rubin

unread,

Oct 24, 2018, 5:51:52 PM10/24/18

to

a...@littlepinkcloud.invalid writes:
> You can get Forth to be tighter than straightforward assembler by
> generating byte tokens. The Forth core can be squeezed into maybe 750
> bytes and the app fits in the rst of memory.

Good point, that might be worth a try.

>> proverbial fancy digital wristwatch... a few dozen bytes of ram plus

>> the 3k program space seems like plenty for that.
> I doubt it. It certainly wouldn't fit if the watch had a bitmap display.

I was thinking of the traditional kind with a segmented display. A few
K of code went a long way back when memory was expensive. The Padauk
part has no BCD arithmetic per se, but it has a nibble swap instruction
and an auxiliary carry flag that detects nibble over/underflow on
ordinary addition/subtraction, so they must have had stuff like this in
mind.

> Forth is better for this job than Sweet 16. It's more expressive,
> easier to write, and it's just as compact.

Could be. This old page is kind of interesting, comparing code density
of different VM's, though they are for fancier systems:

http://www.1strecon.org/downloads/Forth_Resources/ByteCodeInterpretters_4_TinyComputers.pdf