Optimising parsed words

Alex McDonald

unread,

Jul 12, 2007, 5:35:20 PM7/12/07

to

I'm developing an optimiser for the new version of win32forth that
employs the method of simulating at compile time (using smart compile,)
the stack effects of basic blocks. Anton has several papers that discuss
this method (the RAFTS papers) and I believe VFX using a similar (the
same?) technique.

http://www.complang.tuwien.ac.at/papers/ertl&pirker96.ps.gz

I've grasped the idea of "before" and "after" stacks, and building a
data flow graph from the primitives, with the following fairly simple
classes of words;

Class 1. execute stack manipulation words that have no side effects
(such as modifying memory) such as DUP SWAP etc.
Class 2. don't execute, but add to the DFG by popping/pushing based on
the word's stack effects and point to the operands on the stack.
Class 3. terminate at other words and build a DFG for the basic block

The bit I can't get my noddle around are parsing words like TO and S" .
Here's the output from my intelligent compile, parse tree for TO

0 value y ok
: x 10 to y ;

outputs the tree for x of

10 literal
to ( 4781984 literal ! )
; ( exit ( unnest ) [ )

The ( ) indicate the compile, depth. For S"

: x s" abcdefg" ;

outputs the tree

s" ( sliteral ( 2literal ( 4781989 literal 7 literal ) ) )
; ( exit ( unnest ) [ )

Question: TO parses, so to be part of the basic block and extract the
LITERAL, it needs to fall into class 1 -- it needs to be executed to
extract the information. But it has side effects; hence it should really
be class 2 or 3 and terminate a basic block.

S" parses too, but it refers to static values as it generates a pair of
LITERALs, so I think it can be in class 1.

Terminating on TO seems a bit early; there are a lot of optimisations
that can be carried out on it, should I be able to work out how to
handle it.

Where am I getting this wrong?

--
Regards
Alex McDonald

Elizabeth D Rather

unread,

Jul 12, 2007, 6:06:33 PM7/12/07

to

The parsing behavior of S" and TO is restricted to compilation and
interpretation. The execution behavior, which is accurately represented
above, doesn't include any parsing. For TO all you have is two
literals, a value and an address, and a !. Of course, it does modify
memory by doing the !.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310-491-3356
5155 W. Rosecrans Ave. #1018 Fax: +1 310-978-9454
Hawthorne, CA 90250
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Alex McDonald

unread,

Jul 12, 2007, 7:06:45 PM7/12/07

to

Sorry, I may have confused with the phrase in "TO ... needs to be
executed to extract the information". I meant that the compilation
behaviour needs to be performed. From the parse tree;

: x 10 to y ;

10 literal
to ( 4781984 literal ! )
; ( exit ( unnest ) [ )

you can see the compilation behaviour of TO is n literal ! where n is
the value address. Effectively the lowest level of the TO part of the
tree represents the primitives that get added to the DFG.

The trouble I'm having is that although [10 literal] [n literal] are
both rooted in the DFG, the ! is not rooted, as it can only represent
operations that have output stack operands, so ! doesn't have a way of
connecting to the "after" data stack.

Aha, I've just spotted a note in the paper above that says ! and other 0
output stack effect words have to be kept on a separate list. It doesn't
say how they're dealt with; I'll have to get some pencil and paper and
experiment.

--
Regards
Alex McDonald

Anton Ertl

unread,

Jul 13, 2007, 6:20:04 AM7/13/07

to

Alex McDonald <bl...@rivadpm.com> writes:
>The trouble I'm having is that although [10 literal] [n literal] are
>both rooted in the DFG, the ! is not rooted, as it can only represent
>operations that have output stack operands, so ! doesn't have a way of
>connecting to the "after" data stack.
>
>Aha, I've just spotted a note in the paper above that says ! and other 0
>output stack effect words have to be kept on a separate list. It doesn't
>say how they're dealt with; I'll have to get some pencil and paper and
>experiment.

You just treat them as roots in the graph. The order in which your
process the various roots is (partially) determined by the dependences
through memory.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2007: http://www.complang.tuwien.ac.at/anton/euroforth2007/

Anton Ertl

unread,

Jul 13, 2007, 7:32:05 AM7/13/07

to

Alex McDonald <bl...@rivadpm.com> writes:
>I'm developing an optimiser for the new version of win32forth that
>employs the method of simulating at compile time (using smart compile,)
>the stack effects of basic blocks. Anton has several papers that discuss
>this method (the RAFTS papers) and I believe VFX using a similar (the
>same?) technique.
>
>http://www.complang.tuwien.ac.at/papers/ertl&pirker96.ps.gz

For the modern Intel/AMD CPUs I would probably leave the instruction
scheduling away for simplicity (the hardware performs scheduling, and
instruction scheduling tends to increase the register pressure). I
might be tempted to leave the graph away and generate code directly,
but that may cost quite a bit of performance (no instruction
selection, only register allocation); one could get a part of
instruction selection back through some variant of peephole
optimization.

>I've grasped the idea of "before" and "after" stacks, and building a
>data flow graph from the primitives, with the following fairly simple
>classes of words;
>
>Class 1. execute stack manipulation words that have no side effects
>(such as modifying memory) such as DUP SWAP etc.
>Class 2. don't execute, but add to the DFG by popping/pushing based on
>the word's stack effects and point to the operands on the stack.
>Class 3. terminate at other words and build a DFG for the basic block

These are things that should be done at the COMPILE, level or lower.

>The bit I can't get my noddle around are parsing words like TO and S" .
>Here's the output from my intelligent compile, parse tree for TO
>
>0 value y ok
>: x 10 to y ;
>
>outputs the tree for x of
>
>10 literal
>to ( 4781984 literal ! )
>; ( exit ( unnest ) [ )
>
>The ( ) indicate the compile, depth. For S"
>
>: x s" abcdefg" ;
>
>outputs the tree
>
>s" ( sliteral ( 2literal ( 4781989 literal 7 literal ) ) )
>; ( exit ( unnest ) [ )
>
>Question: TO parses, so to be part of the basic block and extract the
>LITERAL, it needs to fall into class 1 -- it needs to be executed to
>extract the information. But it has side effects; hence it should really
>be class 2 or 3 and terminate a basic block.

As you show, TO and S" should never be passed to "COMPILE," in these
examples. Instead TO should POSTPONE and eventually "COMPILE,"
LITERAL and "!", and S" should eventually "COMPILE," LITERAL twice, so
your compiler should do whatever is appropriate for compiling these
words.

If your system supports code like "POSTPONE TO", "['] TO COMPILE,"
etc., then your "COMPILE," could see an xt associated with TO (the xt
of the compilation semantics of TO in the first case, and the xt of
the interpretation semantics in the second case). Since that xt
typically represents a colon definition, "COMPILE," should compile a
call to that colon definition.

Alex McDonald

unread,

Jul 13, 2007, 10:48:26 AM7/13/07

to

Anton Ertl wrote:
> Alex McDonald <bl...@rivadpm.com> writes:
>> I'm developing an optimiser for the new version of win32forth that
>> employs the method of simulating at compile time (using smart compile,)
>> the stack effects of basic blocks. Anton has several papers that discuss
>> this method (the RAFTS papers) and I believe VFX using a similar (the
>> same?) technique.
>>
>> http://www.complang.tuwien.ac.at/papers/ertl&pirker96.ps.gz
>
> For the modern Intel/AMD CPUs I would probably leave the instruction
> scheduling away for simplicity (the hardware performs scheduling, and
> instruction scheduling tends to increase the register pressure). I
> might be tempted to leave the graph away and generate code directly,
> but that may cost quite a bit of performance (no instruction
> selection, only register allocation); one could get a part of
> instruction selection back through some variant of peephole
> optimization.

I've tried that, and there's too much stack juggling that needs
eliminated at the peephole stage (in fact, I was doing it immediately
prior to code generation to avoid having to walk and re-write the code
area). The flow graph will be useful to eliminate much of this.

Instruction scheduling is beyond me right now :-)

>
>> I've grasped the idea of "before" and "after" stacks, and building a
>> data flow graph from the primitives, with the following fairly simple
>> classes of words;
>>
>> Class 1. execute stack manipulation words that have no side effects
>> (such as modifying memory) such as DUP SWAP etc.
>> Class 2. don't execute, but add to the DFG by popping/pushing based on
>> the word's stack effects and point to the operands on the stack.
>> Class 3. terminate at other words and build a DFG for the basic block
>
> These are things that should be done at the COMPILE, level or lower.

Lower, in my case.

The code passed the postpone.fs test last time I checked; I'll see if it
still does so after the latest round of changes I've made. The tree
shown above is actually generated by the execution of the compile-token
pair, not COMPILE, itself, so it currently contains the higher level
constructs. The lowest level in parens is the part that COMPILE, sees,
so in the case of TO, it sees the pair [n literal] and ! .

>
> - anton