Single-pass or multi-pass Forth compiler

minf...@arcor.de

unread,

Jan 16, 2023, 6:50:28 AM1/16/23

to

Historically Forth features a single-pass compiler that is triggered e.g.
when the interpreter hits a : colon. Some control flows require code
back-patching to resolve jump targets. This can be awkward.

OTOH by going over the code in several passes offers easy paths for
simplification and optimization. e.g.
- elimination of comments
- resolving of labels for jump targets
- intermediate language e.g. program as database of instruction lists
- peephole optimization, superinstructions
- tail call elimination

Is there any Forth out there that compiles high-level definitions in this way?
Anybody willing to share some other experience in this field?

Jan Coombs

unread,

Jan 16, 2023, 10:05:04 AM1/16/23

to

On Mon, 16 Jan 2023 03:50:26 -0800 (PST)
"minf...@arcor.de" <minf...@arcor.de> wrote:

> - intermediate language e.g. program as database of instruction lists

Maybe this one in 'Holon Forth' and 'Fifth':

Planet Holonforth Wolf Wejgaard EuroForth 2016
http://www.euroforth.org/ef16/papers/wejgaard.pdf

1989-09-00 Paper presented at the EuroFORML conference 1989
Not Screens nor Files but Words Wolf Wejgaard
https://holonforth.com/ef89.html

Further papers at:
https://www.holonforth.com/papers.html

Ref to Fifth was in:
A Fresh Look at the Forth Dictionary
Wolf Wejgaard
Presented at the EuroFORTH conference 1995
https://www.holonforth.com/ef95.html

Jan Coombs
--

Anton Ertl

unread,

Jan 16, 2023, 11:41:20 AM1/16/23

to

Passes in the original sense have gone out of fashion once RAM sizes
were big enough. Compilers often have intermediate representations
for purposes such as analysis and optimization, and for modularization
purposes.

iForth and VFX have an intermediate representation that they use for
inlining. AFAIK they don't use it for elimination of comments, jump
target resolution, peephole optimization, superinstructions, or
tail-call elimination.

Gforth uses the threaded-code representation (and its relocatable
variant) either directly, or as a step to a mixture of native code and
threaded-code. It also buffers the threaded code for a straight-line
piece of code before generating native code, to allow better code
generation.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

Hans Bezemer

unread,

Jan 16, 2023, 11:43:08 AM1/16/23

to

On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:
> OTOH by going over the code in several passes offers easy paths for
> simplification and optimization. e.g.
> - elimination of comments

I can't think of a Forth compiler that DOESN'T. It simply ignores the stuff:

: \ blk @ IF >in @ c/l /f 1+ c/l * >in ! EXIT THEN source >in ! drop ; immediate

> - resolving of labels for jump targets

Again - I can't think of a Forth compiler that doesn't. References are placed on
some stack and used as target addresses when jumping back or as source
addresses to be patched later when jumping forward.

> - peephole optimization, superinstructions
4tH has a peephole optimizer used for strength reduction, constant folding and
dead code elimination. It only looks back a single instruction, but can in some
circumstances work recursive.

> - tail call elimination
4tH's optimizer does that one as well.

> Is there any Forth out there that compiles high-level definitions in this way?
> Anybody willing to share some other experience in this field?

It works fine and transparent - unless you insist to meddle with the return stack
in non-portable ways.

https://sourceforge.net/p/forth-4th/wiki/Optimization/

Hans Bezemer

Marcel Hendrix

unread,

Jan 16, 2023, 12:39:29 PM1/16/23

to

On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:

> Historically Forth features a single-pass compiler that is triggered e.g.
> when the interpreter hits a : colon. Some control flows require code
> back-patching to resolve jump targets. This can be awkward.
>
> OTOH by going over the code in several passes offers easy paths for
> simplification and optimization. e.g.

[..]

> Is there any Forth out there that compiles high-level definitions in this way?

Using an intermediate representation is far more powerful, as one can
do recursive 'context-sensitive' inlining (delay compiling as long as possible).

In iForth's iSPICE package I'll eventually use run-time code generation (e.g.
to eliminate pointer-chasing). This delays instruction selection to the limit.
However, it only works if the target application is sufficiently unsophisticated,
or has predictable execution behavior (like SPICE). I'm sure that there would
be many snags would I try it for Forth itself.

-marcel

minf...@arcor.de

unread,

Jan 16, 2023, 2:29:35 PM1/16/23

to

I seem to remember Holon running on DOS. Last time I looked I got the impression
that it was a nice mixture of Tcl and Forth, with most fun parts included from Tcl libraries.
Good work, not criticizing, but running an interpreter through another interpreter
seemed rather heavy to me.

Nevertheless the "modules/groups/dictionary/words in a database" is an appealing
concept.

minf...@arcor.de

unread,

Jan 16, 2023, 2:47:15 PM1/16/23

to

Interesting. I never considered lazy evaluation for Forth.

Without climbing up to Haskell's heights, there are simple "lazy evaluators in Python
(copied from Stackoverflow):
######
In a nutshell, lazy evaluation means that the object is evaluated when it is needed,
not when it is created.
In Python 2, range will return a list - this means that if you give it a large number,
it will calculate the range and return at the time of creation:
>>> i = range(100)
>>> type(i)
<type 'list'>
In Python 3, however you get a special range object:
>>> i = range(100)
>>> type(i)
<class 'range'>
Only when you consume it, will it actually be evaluated - in other words, it will only return
the numbers in the range when you actually need them.
######

As knee reflex ISTM to have many similarities with returning macros. It would be
quite easy in Forth to return xt's of words containing EVALUATEs or [[ ]] gforthisms
for delayed evaluation.

Gerry Jackson

unread,

Jan 16, 2023, 3:29:24 PM1/16/23

to

My sytem, that has never been released (and won't be), which was
developed from about 2005, compiles Forth code to an intermediate form,
carries out some optimisations, then generates the executable code. I
did it this way (I think) to:
- widen the range of optimisations beyond simple peephole optimisation
- making it more readily portable to different processors and OS's (one
front end, n back ends).
- generating different types of executable code including machine code.

and probably other reasons that I've forgotten.

I did peephole optimisations and super-instructions but then found other
things took priority e.g. using the system, so I've never got round to
getting on with other aims.

It did make things such as quotations easier as it just involves
suspending the outer definition, compiling the quotation, then resuming
the suspended definition. Nested as much as desired.

It enabled me to experiment with other capabilities such as nested colon
definitions, declaration of variables inside a definition, different
entry points, not that I've used them for real.

--
Gerry

Lars Brinkhoff

unread,

Jan 17, 2023, 1:39:51 AM1/17/23

to

Leeloo says: multipass.

none albert

unread,

Jan 17, 2023, 2:36:05 AM1/17/23

to

In article <c211daee-ca84-4ec0...@googlegroups.com>,

Marcel Hendrix <m...@iae.nl> wrote:
>On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:
>> Historically Forth features a single-pass compiler that is triggered e.g.
>> when the interpreter hits a : colon. Some control flows require code
>> back-patching to resolve jump targets. This can be awkward.
>>
>> OTOH by going over the code in several passes offers easy paths for
>> simplification and optimization. e.g.
>[..]
>> Is there any Forth out there that compiles high-level definitions in this way?
>
>Using an intermediate representation is far more powerful, as one can
>do recursive 'context-sensitive' inlining (delay compiling as long as possible).
>
>In iForth's iSPICE package I'll eventually use run-time code generation (e.g.
>to eliminate pointer-chasing). This delays instruction selection to the limit.

What is "pointer chasing"?

>
>-marcel
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

none albert

unread,

Jan 17, 2023, 2:50:48 AM1/17/23

to

In article <74bc3d99-4179-47d7...@googlegroups.com>,

minf...@arcor.de <minf...@arcor.de> wrote:
>Historically Forth features a single-pass compiler that is triggered e.g.
>when the interpreter hits a : colon. Some control flows require code
>back-patching to resolve jump targets. This can be awkward.
>
>OTOH by going over the code in several passes offers easy paths for
>simplification and optimization. e.g.
>- elimination of comments
>- resolving of labels for jump targets
>- intermediate language e.g. program as database of instruction lists
>- peephole optimization, superinstructions
>- tail call elimination

My optimiser goes through the steps (experimental)

1. find stack effect for all primitive words (through a sophisticated
assembler analysis) ( + stack effects properties)
2. find stack effects for other words, ( + stack effects properties)
3. optimise words from the bottom up,high level, a generalisation of
constant folding
4. inline the machine code. A number of blocks that jumps to each
other results
5. optimise the blocks, individually and as a whole.

It results (in some examples) to a speed up comparably to gforth
versus mpeforth.

Highly experimental.

>
>Is there any Forth out there that compiles high-level definitions in this way?
>Anybody willing to share some other experience in this field?

Go to my site below. forthlectures.html View lecture 5.
(This has been announced several times.)

Tail call elimination?
If you're programming for speed, you probably avoid tail calls.
(Not that it is hard to do.)

Groetjes Albert

minf...@arcor.de

unread,

Jan 17, 2023, 3:11:33 AM1/17/23

to

Gerry Jackson schrieb am Montag, 16. Januar 2023 um 21:29:24 UTC+1:
> My sytem, that has never been released (and won't be), which was
> developed from about 2005, compiles Forth code to an intermediate form,
> carries out some optimisations, then generates the executable code. I
> did it this way (I think) to:
> - widen the range of optimisations beyond simple peephole optimisation
> - making it more readily portable to different processors and OS's (one
> front end, n back ends).
> - generating different types of executable code including machine code.
>
> and probably other reasons that I've forgotten.
>
> I did peephole optimisations and super-instructions but then found other
> things took priority e.g. using the system, so I've never got round to
> getting on with other aims.
>
> It did make things such as quotations easier as it just involves
> suspending the outer definition, compiling the quotation, then resuming
> the suspended definition. Nested as much as desired.

So it compiles the innermost definitions/quotations first before compiling
outer definitions?

The "classic" way is to jump over quotations, nesting included.
From the reference test:
T{ : q1 [: 1 ;] ; q1 EXECUTE -> 1 }T
T{ : q2 [: [: 2 ;] ;] ; q2 EXECUTE EXECUTE -> 2 }T

Did you experiment with enclosures?

minf...@arcor.de

unread,

Jan 17, 2023, 3:23:42 AM1/17/23

to

Anton Ertl schrieb am Montag, 16. Januar 2023 um 17:41:20 UTC+1:
> iForth and VFX have an intermediate representation that they use for
> inlining. AFAIK they don't use it for elimination of comments, jump
> target resolution, peephole optimization, superinstructions, or
> tail-call elimination.
>
> Gforth uses the threaded-code representation (and its relocatable
> variant) either directly, or as a step to a mixture of native code and
> threaded-code. It also buffers the threaded code for a straight-line
> piece of code before generating native code, to allow better code
> generation.

The VFX docs speak of code inlining vs source inlining, the latter now
improved through tokenizing. This looks similar to gforth's inlining method.

Anton Ertl

unread,

Jan 17, 2023, 4:14:42 AM1/17/23

to

The original source inliner of VFX had the same correctness problem as
EVALUATE-based macros. AFAIK The "tokenizing" resolves the names,
BASE etc. when the code is tokenized and therefore does not have this
problem.

>This looks similar to gforth's inlining method.

Gforth currently does not inline colon definitions. The native-code
approach has been called "selective inlining" (of primitives) by
Piumarta, and is probably similar to what VFX documents as "code
inlining", but I normally don't call it inlining.

minf...@arcor.de

unread,

Jan 17, 2023, 5:01:43 AM1/17/23

to

Anton Ertl schrieb am Dienstag, 17. Januar 2023 um 10:14:42 UTC+1:
> "minf...@arcor.de" <minf...@arcor.de> writes:
> >Anton Ertl schrieb am Montag, 16. Januar 2023 um 17:41:20 UTC+1:
> >> iForth and VFX have an intermediate representation that they use for
> >> inlining. AFAIK they don't use it for elimination of comments, jump
> >> target resolution, peephole optimization, superinstructions, or
> >> tail-call elimination.
> >>
> >> Gforth uses the threaded-code representation (and its relocatable
> >> variant) either directly, or as a step to a mixture of native code and
> >> threaded-code. It also buffers the threaded code for a straight-line
> >> piece of code before generating native code, to allow better code
> >> generation.
> >
> >The VFX docs speak of code inlining vs source inlining, the latter now
> >improved through tokenizing.
> The original source inliner of VFX had the same correctness problem as
> EVALUATE-based macros. AFAIK The "tokenizing" resolves the names,
> BASE etc. when the code is tokenized and therefore does not have this
> problem.

Macro correctness is an interesting topic as it can depend on changed
machine states. Off topic, but correct selection of what has to be included
in saved closure enviroments falls into the same problem category.

I wonder if the Holon approach (keeping everything in a realtime source
database) did not suffer from the same dilemma.

dxforth

unread,

Jan 17, 2023, 5:23:58 AM1/17/23

to

On 16/01/2023 10:50 pm, minf...@arcor.de wrote:
> Historically Forth features a single-pass compiler that is triggered e.g.
> when the interpreter hits a : colon. Some control flows require code
> back-patching to resolve jump targets. This can be awkward.

m/c assemblers went from two-pass to one-pass. Trust forth to want to go
in the opposite direction :)

Gerry Jackson

unread,

Jan 17, 2023, 5:47:27 AM1/17/23

to

On 17/01/2023 08:11, minf...@arcor.de wrote:
> Gerry Jackson schrieb am Montag, 16. Januar 2023 um 21:29:24 UTC+1:
>> My sytem, that has never been released (and won't be), which was
>> developed from about 2005, compiles Forth code to an intermediate form,
>> carries out some optimisations, then generates the executable code. I
>> did it this way (I think) to:
>> - widen the range of optimisations beyond simple peephole optimisation
>> - making it more readily portable to different processors and OS's (one
>> front end, n back ends).
>> - generating different types of executable code including machine code.
>>
>> and probably other reasons that I've forgotten.
>>
>> I did peephole optimisations and super-instructions but then found other
>> things took priority e.g. using the system, so I've never got round to
>> getting on with other aims.
>>
>> It did make things such as quotations easier as it just involves
>> suspending the outer definition, compiling the quotation, then resuming
>> the suspended definition. Nested as much as desired.
>
> So it compiles the innermost definitions/quotations first before compiling
> outer definitions?
>
> The "classic" way is to jump over quotations, nesting included.

Yes a crude, but effective, solution imposed by the traditional single
pass compilation.

> From the reference test:
> T{ : q1 [: 1 ;] ; q1 EXECUTE -> 1 }T
> T{ : q2 [: [: 2 ;] ;] ; q2 EXECUTE EXECUTE -> 2 }T
>
> Did you experiment with enclosures?

Not with the built in quotations. But before quotations were
standardised I did develop an ANS Forth implementation of closures in
2011 that I can let you have if you are interested.

--
Gerry

minf...@arcor.de

unread,

Jan 17, 2023, 7:18:58 AM1/17/23

to

Thank you for your kind offer! I just sent you a private mail.

Matthias Koch

unread,

Jan 17, 2023, 10:06:07 AM1/17/23

to

I found multiple passes necessary in Mecrisp-Across to re-join different control flow branches that use a different local register allocation without relying on canonical fallback. If the number of stack elements held in registers at the joint differ, one needs to use the smaller number of stack elements, which may require additional passes to achieve.

Using the larger number of elements causes non-existing elements to be popped from the stack, at least temporarily. Of course, in valid Forth code, these elements will be pushed back, but may cause reads/writes beyond the beginning of the stack area in memory. Simply adding a guard space at the beginning of the stack would be a solution to keep it single-pass, but in combination with nested interrupts, this may be riscy. I solved this with multiple passes in the compiler, and generate elegant code along the way.

Note this compiler is capable of register allocation across ?dup and similiar constructs.

minf...@arcor.de

unread,

Jan 18, 2023, 4:54:04 AM1/18/23

to

Looks like some 'spill and iterate' algorithm. Did you consider spilling costs?
I imagine that moves between registers and memory take MUCH more time
than some little register reshuffling at control flow joints.

Matthias Koch

unread,

Jan 18, 2023, 5:16:21 AM1/18/23

to

Can you point me to a text describing the solution or existing implementation? I am reshuffling registers as much as possible without spilling them to the stack, but I do not know on how to handle the situation without partial spilling (and subsequent reshuffling of the common numer of elements in registers) when I have a larger total amount of stack elements in registers than in the other branch, as different branches may have different depth stack effects, and these elements do not exist on the stack at all for one of the two branches.

minf...@arcor.de

unread,

Jan 18, 2023, 6:17:46 AM1/18/23

to

Matthias Koch schrieb am Mittwoch, 18. Januar 2023 um 11:16:21 UTC+1:
> Can you point me to a text describing the solution or existing implementation? I am reshuffling registers as much as possible without spilling them to the stack, but I do not know on how to handle the situation without partial spilling (and subsequent reshuffling of the common numer of elements in registers) when I have a larger total amount of stack elements in registers than in the other branch, as different branches may have different depth stack effects, and these elements do not exist on the stack at all for one of the two branches.

I am certainly much less experienced than you are in this field. I just happen to know
that Chaitin's algortihm includes spilling until new liveliness analysis comes up with
an acceptably colored graph. But AFAIK optimal spilling is still a thing for heuristics
or brute force iteration. Until today there is no known 'formula' for it.

Hence my question out of curiosity whether you have gained new insights. It had not
been clear whether you do liveliness analysis at all.

Add to this that spilling costs are not constant, given the caching effects in many CPUs.
I found only one lecture note that at least mentions this (but does not deal with it later on):
https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/lectures/17/Slides17.pdf

Of course it remains also a design decision of where for a small Forth compiler for
embedded systems reasonable usability ends and where overkill starts.

minf...@arcor.de

unread,

Jan 18, 2023, 6:47:01 AM1/18/23

to

P.S. since you won't do graph coloring, perhaps this paper may be of more interest because
it also treats with data flow analysis:
http://www.christianwimmer.at/Publications/Wimmer10a/Wimmer10a.pdf

Matthias Koch

unread,

Jan 18, 2023, 7:10:55 AM1/18/23

to

These slides are a very nice intro to register allocation difficulties - but register spilling therein happens when "we have more elements than registers", whereas in Mecrisp-Across I have enough registers for most of the reasonable code. For example, 2OVER gets 6 deep, so for an embedded MSP430 Forth, it is OK to start spilling at 10 elements deep because of lack of registers. The variable liveness analysis is a different kind of beast than a stack. I do not try to perform liveness analysis at all.

What I do is: When stack elements are handled first time, these are popped from the stack(s) and put in registers, and the compiler tries to keep these in registers unless forced to get back to canonical state by non-optimiseable structures.

This is done in Mecrisp-Stellaris RA (data stack only) and Mecrisp-Quintus with the loadable Acrobatics compiler extension (both data and return stack) which is written in Forth itself, my recommendation if you want a starting point. These do canonical fallback when encountering control structures.

Mecrisp-Across is one step ahead and keeps the current set of registers when control flow forks execution, and tries to re-merge the two resulting register sets at the joint, if possible by reshuffling one of the branches to get back to a common register set. When not, it spills the additional elements from the larger branch (which may require an additional pass in my compiler), and reshuffles the remaining ones. This is not done because of lack of registers, but because of the nature of the stacks and the limited scope of the compiler. At the end of a definition, or before an unknown immediate word is executed, a full canonical fallback is done.

A curious mind one out there wrote a nice blog post analysing the output of the Mecrisp-Across compiler:

https://joldosh.blogspot.com/2019/12/optimized-msp430-assembly-in-mecrisp.html

Certainly there is no caching, as I am targeting tiny classic MSP430 chips like the MSP430F2012 with Mecrisp-Across, therefore register spills are the same cost under all circumstances.

Caching effects apply when running Mecrisp-Stellaris or Mecrisp-Quintus hosted on ARM/RISC-V/MIPS Linux (and thanks to Robert Clausecker FreeBSD), but the family of Mecrisp Forths is designed for deep embedded and the hosted ports are not the main focus of my efforts.

Matthias Koch

unread,

Jan 18, 2023, 7:18:08 AM1/18/23

to

> P.S. since you won't do graph coloring, perhaps this paper may be of more interest because
> it also treats with data flow analysis:
> http://www.christianwimmer.at/Publications/Wimmer10a/Wimmer10a.pdf

Thanks, looks nice!
By the way, I do not even use SSA intermediate form.