actually I'm reading a paper about Knuth's MMIX
virtual processor. it's a register file based
RISC architecture and I'm quite surprised.
http://www-cs-faculty.stanford.edu/~knuth/mmix.html
(http://mmixmasters.sourceforge.net/)
I have big respect in the work of Kunth and
have now problems in understandig why he is
thinking about this architecture as well fitted to
the third millenium.
I thought that I have understood why stack
based architectures are simple and easy to
develop software with. but now I'm confused.
does this mean that stack based architectures
are wrong, at least wrong for the future ?
what is our opinion ?
best wishes
Andreas
Remember that MIX and MMIX are intended to be reasonably
representative of "typical" architectures. This is necessary so that
people can easily understand how to translate the programs for their
own machine.
However you don't want to be too real, and put up with the warts and
weirdnesses that complicate real hardware. For example, having a
large number of registers in MMIX avoids having to deal with issues
like spilling variables.
Using a stack based architecture for this work would have two bad
effects:
1. It wouldn't be possible to describe how to do things on a register
machine, and
2. It would reduce the size of Knuth's audience.
Andrew.
I'm suprised that you are suprised, but I expect that
you have started an entertaining thread.
> have now problems in understandig why he is
> thinking about this architecture as well fitted to
> the third millenium.
1. academic thinking where cost is not a factor.
2. it is the direction most other people are going.
3. he has little if any understanding of hardware.
4. he is looking to hardware people to solve his
software problems.
He gives five reasons from the sixties about why he
thinks it is useful to think and write in machine language:
1. to understand how high level language constructs are
actually implemented on machines.
2. it is short and easy to grasp.
3. people should have some idea of what the underlying
hardware is like "otherwise the programs they write will be
pretty wierd."
4. machine language is necessary in any case.
5. to compare different architectures fairly.
(sort of the opposite rational of ANS Forth's focus
on portability and avoiding those details and sort of
what I always say about machineForth.)
Then he cops out completely by using a fake artificial
academic architecture. ;-) It negates his five reasons
for wanting to use actual machine language.
> I thought that I have understood why stack
> based architectures are simple and easy to
> develop software with. but now I'm confused.
You were confused before if you thought that
there were a lot of other people who understood that.
You are just highlighting the theme that I often
forward in this newsgroup. Ideas about how
software should work are related to ideas about how
hardware should work. Ideas about how hardware should
work are related to ideas about how software should
work.
> does this mean that stack based architectures
> are wrong, at least wrong for the future ?
They are 'wrong' if all your thinking is based
upon memory or register arrays in hardware. They
are right if your thinking is based upon stacks.
If you are thinking C and register (array) based
hardware then stack architectures are "wrong."
> what is our opinion ?
The opposite ends are the register vs stack
approaches. Stack ideas map poorly to the
more expensive register based hardware because
most of the expensive hardware is wasted.
Addressable register ideas map terribly to
stack hardware because the addressable register
hardware that is needed just isn't there.
For low cost, simplicity of hardware and software,
performance on stack code or performance/cost
stack hardware is hard to beat. For cost is no
object, complexity of hardware and software,
performance on backwards compatible (C) code,
for gulligle consumers and inflated profit margings
in the hardware and software addressable register
hardware is hard to beat.
It's like the business mantra from the 80s, "Greed
is good." (or was that Greed is God? ;-)
best wishes,
Jeff Fox
Warum sollte Man etwas einfach machen wenn Man es wundershoen
compliziert machen kann?
I have much respect to peoples who did a long
time sw/hw development. I expect a big
pool of experience and want to learn from.
the quality of this newsgroup (the peoples
who are the backbone of this newsgroup)
is very good. I'm happy and apreciate
to be able to assimilate that.
well, Knuth is a person reach of experience (like
you and many others here).
he had spent many years into software development,
and now I see a big gap between this two worlds.
> but I expect that
> you have started an entertaining thread.
> 1. academic thinking where cost is not a factor.
if he is a scientist (and I expect he is), than he
couldn't ignore others techniques. if he hasn't
ignored and never mentioned stack based techniques
then it was his decision to remove that - and
this is bewildering to me.
okay, cost is not a factor, but complexity is.
every scientist strives to the simplest possible
solution :-)
> 2. it is the direction most other people are going.
> 3. he has little if any understanding of hardware.
> 4. he is looking to hardware people to solve his
> software problems.
>
> He gives five reasons from the sixties about why he
> thinks it is useful to think and write in machine language:
>
> 1. to understand how high level language constructs are
> actually implemented on machines.
> 2. it is short and easy to grasp.
> 3. people should have some idea of what the underlying
> hardware is like "otherwise the programs they write will be
> pretty wierd."
> 4. machine language is necessary in any case.
> 5. to compare different architectures fairly.
>
> (sort of the opposite rational of ANS Forth's focus
> on portability and avoiding those details and sort of
> what I always say about machineForth.)
>
> Then he cops out completely by using a fake artificial
> academic architecture. ;-) It negates his five reasons
> for wanting to use actual machine language.
so he is tergiversating.
>
> > I thought that I have understood why stack
> > based architectures are simple and easy to
> > develop software with. but now I'm confused.
>
> You were confused before if you thought that
> there were a lot of other people who understood that.
probably. but to understand and work with stack
architecture is quite easy. and if this is easy
to me then this must be more easier to scientists.
I'm not sure, but I think that register based
processors are event not well suited for Java.
> You are just highlighting the theme that I often
> forward in this newsgroup. Ideas about how
> software should work are related to ideas about how
> hardware should work. Ideas about how hardware should
> work are related to ideas about how software should
> work.
100% agreed, this is what I have understood so far.
>
> > does this mean that stack based architectures
> > are wrong, at least wrong for the future ?
>
> They are 'wrong' if all your thinking is based
> upon memory or register arrays in hardware.
but this thinking is more complex and complexity
grows exponential, I guess.
> It's like the business mantra from the 80s, "Greed
> is good." (or was that Greed is God? ;-)
>
> best wishes,
> Jeff Fox
>
> Warum sollte Man etwas einfach machen wenn Man es wundershoen
> compliziert machen kann?
how true :-)
thank you
Andreas
I think of Java as more a subset of C++ than a
stack based environment. It is usually implemented
on register based architectures. There has been some
work on Java hardware which would of course include
a stack but it is quite different than Forth.
Look at how Patriot spent nearly a decade modifying
a design from its humble ShBoom Forth chip beginning
to a much more complicated PSC1000 Java chip, compare
the two user manuals.
There is considerable debate about Java chips.
The conventional RISC chip makers point to some
pretty crummy Java chip attempts. Java chip
designers point to their better results but
most people focus on what is most common.
best wishes,
Jeff Fox
"In questions of science the authority of a thousand
is not worth the humble reasoning of a single individual."
Galileo Galilei
Andreas Klimas wrote:
>
> Jeff Fox wrote:
> >
> > Andreas Klimas wrote:
> > > actually I'm reading a paper about Knuth's MMIX
> > > virtual processor. it's a register file based
> > > RISC architecture and I'm quite surprised.
> >
> > I'm suprised that you are suprised,
I'm also surprised ... about how quickly both of you judge
Knuth's effort. To me it seems that his design is pretty
flexible and allows a lot of different usage options.
E.g. a to split the 256 registers into globals and
a register stack...
--
Helmut Leitner lei...@hls.via.at
Graz, Austria www.hls-software.com
It is not just pretty flexible, it is very flexible and
for a "full range of usage options". I would like one, if
someone else is paying for it. ;-) But it isn't real,
the ramifications of the specified complexity would lead
to so much other complexity in hardware that it would be a
very expensive design in hardware. It is a software writer's
fantasy. And it is theoretically a very powerfull design.
But mostly it is a nice fit to his specified goal of demonstrating
the stuff in machine language. ;-) It is fine for his
purposes and it does look like many other people's fantasies.
That is one of his purposes for it.
My point is that it is not specialized for Forth, or
demonstrating how simple stack based code, Forth, is
on simple stack based hardware. My only judgment of
Knuth's work is in the context of how it is sort of
the opposite of a demonstration of Forth. That was what
Andreas asked as far as I could tell.
Instead of demonstating how beautifully simple stack
stuff can be it demonstrates how wonderfully complex
the whole thing can be. After all, why make it simple
when one can make it wonderfully complex?
Anyone care to code and compile MMIX and run some
benchmarks on Knuth's MMIX machine code for
comparisons? :-)
Best Wishes,
Jeff Fox
>Warum sollte Man etwas einfach machen wenn Man es wundershoen
>compliziert machen kann?
:-)
shorter:
Warum einfach, wenns auch kompliziert geht?
Why doit simple if it could be done complicated?
Seems the credo of modern management :-)
Bye from germany
Wolfgang
--
FORTHing @ work | *Cheap* ...pick any
Dipl.-Ing. Wolfgang Allinger | *Fast* *Good* ... *two* of them
Germany ------------------------------------
## KreuzPunkt XP2 R ## | reply address set
well, lets have a look into the world of OO.
I don't have many experience in C++ but in
Smalltalk and Java, if you see well written
OO code, methods becomes quite small. may be
1 to 3 lines. with so less lines, temporaries
simply are not needed (at least in Smalltalk).
this is even true for well factored C code.
so, if one writes very small methods (or words
or whatever) a powerful call/rts and datastack
becomes an issue. I'm not an authority. I
simply try to understand, in this context, what
the benefit of a register based architecture is
or could be.
I know nothing about hardware, but I see the complexity
of the Intel series. there are thousands of operations
and addressing modes. I couldn't believe
that this all really is needed. I would prefer fast
data- and retrnstack instead.
Forth has a long and well known history playing
this way. others simply ignores this fact. this
is the same story Sun did with Java. they never
had a look on Smalltalk and have built a terrible
OO implementation. many years of experience were
wasted.
I think we are talking about magnitues of more
complexity and I simply ask what we get back.
probably one should ask Mr. Knuth directly.
BTW, I have big respect to Mr. Knuths work.
best wishes
--
Andreas Klimas
Of course. It's often easier to come up with a complex answer then to
understand the problem well enough to come up with a simple answer. It's
partly due to poor management, and partly due to poorly trained engineers.
In my career, I've worked with a small handful of good engineers and exactly
one good manager.
--
-GJC
-gch...@TheWorld.com
-War is the last resort of the incompetent.
>Then he cops out completely by using a fake artificial
>academic architecture. ;-) It negates his five reasons
>for wanting to use actual machine language.
Is this something new, or the virtual machine from The Art of Computer
Programming? I thought it was very close to the 6502 (but I haven't looked at
it in many years and was irritated by it at the time -- it seemed out of place
with the analysis in the rest of the texts).
Charlie Springer
Jeff Fox wrote:
>
> Helmut Leitner wrote:
> > I'm also surprised ... about how quickly both of you judge
> > Knuth's effort. To me it seems that his design is pretty
> > flexible and allows a lot of different usage options.
> > E.g. a to split the 256 registers into globals and
> > a register stack...
>
> It is not just pretty flexible, it is very flexible and
> for a "full range of usage options". I would like one, if
> someone else is paying for it. ;-)
Don't we pay for Pentiums all the time?
> But it isn't real,
> the ramifications of the specified complexity would lead
> to so much other complexity in hardware that it would be a
> very expensive design in hardware.
I'm no processor expert, but I doubt that it would be more
expensive than existing complex designs.
> It is a software writer's fantasy.
> And it is theoretically a very powerfull design.
Agreed.
> But mostly it is a nice fit to his specified goal of demonstrating
> the stuff in machine language. ;-) It is fine for his
> purposes
What do we expect from him? To build something that is
not working towards his purposes?
> and it does look like many other people's fantasies.
> That is one of his purposes for it.
I don't see much interest in MMIX. I doubt that Knuth cares
for the fantasies of other people. In a way, he is very much
on an ego trip in his own - fantastic - world.
But people often don't follow him.
E.g. nobody does literate programming.
> My point is that it is not specialized for Forth, or
> demonstrating how simple stack based code, Forth, is
> on simple stack based hardware.
Ok. I understand your disappointment. But this should
not lead us to saying "...how can he be so silly...".
It would be better to implement FORTH on MMIX and
show how good it works (even) on this platform.
> My only judgment of
> Knuth's work is in the context of how it is sort of
> the opposite of a demonstration of Forth. That was what
> Andreas asked as far as I could tell.
I read him as
"...how can Knuth be so silly..."
> Instead of demonstating how beautifully simple stack
> stuff can be it demonstrates how wonderfully complex
> the whole thing can be. After all, why make it simple
> when one can make it wonderfully complex?
Usually I see the complexity of systems as a proof that
people don't understand or don't care. In Knuth's case
it's different: IMO he is beyond complexity, living in
a world were any problem can be solved and any solution
is simple.
IIRC there's an emulator for MMIX. It should be possible to retarget an
existing metacompiler to this "hardware." Wouldn't that be funny(1), Forth
running on MMIX even before Knuth has his C compiler ready (ISTR MMIX is to
be used in one of the last books in the series, the one on compiler writing
etc.)
-marcel
(1) Really funny, nothing but praise for Knuth from me.
I don't think he has such delusions. Even the Alpha people only
claimed a horizon of 25 years.
Anyway, the ideas he put into MMIX are mostly accepted as good ideas
for a high-performance workstation/server-targeted architecture in the
computer architecture community. And MMIX also has many similarities
to many currently popular architectures, and I think that is helpful
for his purpose.
>I thought that I have understood why stack
>based architectures are simple and easy to
>develop software with. but now I'm confused.
>
>does this mean that stack based architectures
>are wrong, at least wrong for the future ?
I am not sure what you mean be wrong, it depends on the goals. The
goal in designing a RISC architecture was to be able to build
implementations in 50-500mm^2, supported by a sophisticated compiler,
that get best performance on SPEC CPU, TPC-C and/or other
workstation/server benchmarks and also have competetive performance on
other workstation and server applications.
Simplicity and ease of (assembler) programming are only secondary
virtues in that setting.
For other sets of goals, stack-based architectures may be better.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
Simply, what characterized the good manager?
Rick Hohensee
> >I thought that I have understood why stack
> >based architectures are simple and easy to
> >develop software with. but now I'm confused.
> >
> >does this mean that stack based architectures
> >are wrong, at least wrong for the future ?
>
> I am not sure what you mean be wrong, it depends on the goals. The
> goal in designing a RISC architecture was to be able to build
> implementations in 50-500mm^2, supported by a sophisticated compiler,
> that get best performance on SPEC CPU, TPC-C and/or other
> workstation/server benchmarks and also have competetive performance on
> other workstation and server applications.
I don't think about hardware, I think more about software.
the goal in this case is to be able to develop reliable
and maintainable software with an minimum of effort.
I don't have much experience with Forth and stack computers.
currently I'm writing my first real business application in Forth.
till now Smalltalk was my ideal toolset to achieve mentioned
goals and probably Forth could be as good or better.
anyway, register based architectures are better suited for
C style languages with long methods and many temporaries,
I guess.
Andreas
Yes, MMIX is horrible. MIX was pretty good for 1962, but hasn't aged well.
MMIX won't age as well as MIX, since RISC was born obsolete.
The above is reflected in the stark horror of the MMIX assembler. It is
the usual rustic atrocity, with Hollerith Card syntax, the attendant
non-mnemonics and so on. Assemblers of this type are only instructional to
the extent that they convey some history, and actually requiring the use
of such fossils is excessive. My suspicion is that this style of assembler
is a holdover from assemblers written without an assembler, which is an
important historical point, but that's all. I have attempted to convey to
one of The Professor's volunteer assistants, vlad...@acm.org, that a more
modern assembler such as osimplay might be more appropriate for an
instructional work. osimplay however, like RISC, was also born obsolete,
and why this is the case should concern this newsgroup.
osimplay exists and is for the 80386 only because I don't have any stack
machines. The lack of stack machines on the desktop should not be an
impediment to The Professor, however, since his goal is to convey timeless
truths of programming, and thus he winds up implementing virtual machines
anyway. A virtual RISC is not doing his students any favors that I can
see. Register twiddling is not a timeless truth of programming any more
than stack-dancing is. More compelling, MMIX is not something the diligent
student can implement in silicon or programmable physical logic, and stack
machines are. The Professor's work generally on algorithms and so on
deserves such treatment, and would be particularly educational as silicon,
which approach is much less likely with a RISC basis.
A Forth engine is the best distillation of timeless computing truths to
hardware. A Forth engine leaves you worrying about what matters to a
greater extent than any other architecture, stack-dancing not
withstanding. Forth's appeal to engineers should translate undiminished to
students, even CS students. The Professor and the Forth community have an
opportunity here.
Rick Hohensee
ftp://ftp.gwdg.de/pub/cLIeNUX/interim/ABOUT
Rick Hohensee
> shorter:
>
> Warum einfach, wenns auch kompliziert geht?
Why write it the short way, when you can write it the long one?
> Seems the credo of modern management :-)
And, sadly, applications programming. And development environments.
And programming languages. And... :-)
> Bye from germany
Tschuesz.
He saw his most important job as making it possible for us to do our
jobs. This was in a corporate research center and he knew that he only
needed to point us in "the right direction", and then keep the rest of the
organization from interfering.
He didn't impose his answers, even when he had suggestions to
contribute.
There was a young man of Japan
Whose limericks never would scan.
When told this was so,
he replied, "Yes I know,
But I always try to cram as many syllables
into the last line as ever I possibly can."
Who was it who wrote "Please excuse this long letter. I didn't have time
to write a short one."? Brahms?
Jerry
--
Engineering is the art of making what you want from things you can get.
-----------------------------------------------------------------------
That was MIX. MMIX is a newer creation.
The present letter is a very long one, simply
because I had no leisure to make it shorter.
Blaise Pascal
--
George Morrison
Aberdeen, Scotland
Thanks for the reference. Looks as though MMIX is going to be less
otherworldly than MIX. Rather interesting to have a virtual computer with
2048 bytes of RAM divided up as 256 64-bit registers. Reminds me of some
early small IBM computers from the 1950's with their OPCODE X Y Z coding
scheme. Basically programmed them using the equivalent of a spreadsheet.
Well, it's either MMIX or Pseudocode when an academic explains algorithms.
As he says, if you want algorithms explained in popular languages, go buy
their book. Don't blame Knuth that there's no "Algorithms in Forth" book.
MMIX is not a stack verses register thing. It's a Knuth thing.
Chuck says that Forth is 2 Stacks, a Dictionary, and Blocks. But what is the
purpose of Forth? In his Byte article he says that it is to increase
productivity. How? In modern terms, as an IDDE for XP. Careful design aided
by prototyping, and incremental coding with early testing. 'They' call it
new, but Chuck has been advocating this for 30 years. Chuck designed the
Forth VM as the simplest way to accomplish this, and the Forth language
evolved as the simplest way to talk to the VM. Why complexify this?
Walter Rottenkolber
>
>"Wolfgang Allinger" <A...@business.forth-ev.de> wrote in message
>news:8L00O...@business.forth-ev.de...
>>
>> Why doit simple if it could be done complicated?
>>
>> Seems the credo of modern management :-)
> Of course. It's often easier to come up with a complex answer
>then to understand the problem well enough to come up with a simple
>answer.
It will also impress other people, telling "it was soooooo complicated,
but I managed it" :-)
Another examples are (german) patent claims. Nobody tells, that he has
the idea while taking a shower. But is lying, that it was looooooong
work :-)
>It's partly due to poor management, and partly due to poorly
>trained engineers. In my career, I've worked with a small handful of
>good engineers
Same with me :-)
>and exactly one good manager.
Lucky guy, I never had a good manager while I was an employee. So I
decided to become a self employee (Is that the right wording?)
bye from germany
>>>Warum sollte Man etwas einfach machen wenn Man es wundershoen
>>>compliziert machen kann?
>> shorter:
>>
>> Warum einfach, wenns auch kompliziert geht?
>Why write it the short way, when you can write it the long one?
1 point for you :-)
>> Bye from germany
>Tschuesz.
Seems you were in germany, so
Tschuess Wolfgang
For the others: Tschuess is a slang word coming from the french adieu.
The french were long enough occupying the Rhineland :-(
So adieu became atschoe...atschuess...tschuess (Tschüß :-)
sometimes also tschoe.
"Gary Chanson" <gcha...@no.spam.TheWorld.com> wrote in message
news:a718r1$pqa$1...@pcls4.std.com...
>
Every MMIX instruction is 32 bits. Even H3sm, which scales easily to the
basic bus dimensions of MMIX and well beyond, is less than 8 bits per
primitive. In fact there's room for a not-a-call bit. Current
implementations of H3sm are not emulators in the sense analagous to a MMIX
emulator. H3sm is subroutine threaded, so actually primitives are
full-blown calls, but.... For comparison to MMIX, it or similar would have
to be an opcode interpreter, with a distinct call mechanism for
non-primitives. This is probably well worth doing for a stack machine on
the scale of MMIX for a direct comparison, in which comparison the stack
machine should significantly outperform MMIX, in a setting the world can
comprehend.
Failing that, performance metrics are knowable for simple code. Knuth's
interest in the performance of code sometimes seems to exceed his concern
for what the code does. The MMIX machine model has affiliated performance
predictions or estimates. This could be another valuable comparison point
between register machines and stack machines, if a succinct 64 bit stack
machine model (or a H3sm, where bus sizes are hidden somewhat) existed.
Something somewhere between H3sm as it is now, with some operating system
support stuff perhaps (very little though), or just a 64 bit ANS-like
Forth, crossed with something like a real hardware emulator such as what
Jeff does for F21 and so on, could make a point that Forth has been having
trouble making.
Knuth is the establishment, and MMIX is basically an averaging of the
established hardware, but The Professor might not want to be teaching
noticeable slow, bloated near-unimplementable stuff if he can be shown
concisely just how much better stack machines are, how easily they can be
implemented as some form of hardware, and so on.
Rick Hohensee
That's right, he makes the same mistakes as the IA-64 designers. His
register file also is ways too large and has an additional
indirection via the stack pointer into the "local" section of the
register file. It looks nice from the C compiler builder's point of
view, though.
Why are these killers for high-performance computers?
a) Register file access slows down quadratically with increasing size
of the register file, since both R and C go up with the line length.
Sort-of-solution: hierarchically partitioned register file with
strong drivers for the upper levels (like McKinley L1 cache). Drives
up the area and power budget, and you throw away the benefit of
splitting up the register file (faster access *and* more ports - with
the hierarchical solution, your access time is still higher, and you
completely throw away the "more ports" feature).
b) Additional indirection makes register renaming (for OOO) a pain in
the ass. That's IMHO the basic reason why there's no OOO SPARC yet.
If you have some small subroutines, a state-of-the-art OOO processor
will have several of them decoded and renamed in flight. With SPARC
and the fixed sized windows, an implementation would still be
feasible, but with variable sized register windows like MMIX and
IA-64, the complexity of the register renaming part would be quite
interesting ;-).
c) Context switch time can be ignored when you just look at
basic algorithms ;-).
However, I think that a simplistic version of MMIX can be implemented
in a larger FPGA. These things have block RAMs with an access time
comparable to a single LUT delay, so the 2k register file with
additional indirection isn't a problem. Though they are
single-ported, you can clock the access much faster than the rest of
the CPU, so you get your 3 ports back.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Here is another quote from a philosophical Frenchman that
should be on the mind of Forth programmers at all times.
La perfection est atteinte non quand il ne reste rien à ajouter,
mais quand il ne reste rien à enlever.
(You know you've achieved perfection in design, not when you
have nothing more to add, but when you have nothing more to
take away.)
-- Antoine-Marie-Roger de Saint-Exupery
http://www.westegg.com/exupery/
--
Michael Coughlin m-cou...@attbi.com Cambridge, MA USA
You're talking about a paper toy, designed to show algorithm
design in enough detail to show the kinds of choices and
optimizations a programmer can face, as if the design were
going to be popping out of chip ovens tomorrow.
The older MIX was a lot more fun. I enjoyed trying to write
correct programs for a processor that could be implemented
with bits or trits and byte sizes that weren't completely
defined, but I'm not going to make the mistake of confusing
a gendanken experiment as anywhere near an implementable
or even desirable chip specification.
-------------------
Geoff
Is this in use today? I haven't been involved with Forth for years, so
I'm almost completely out of touch. I worked on this method for a
while, only to discover someone else had beaten me to it. It's quite
elegant how a stack technique can simplify code optimization.
-mike
I dono, but welcome back. Do you remember who beat you to it?
I find that sort of thing tends to be memorable :o)
Rick Hohensee
Very larger. Simplistic how? Fake all that each-register-is-an-FPU
stuff with one FPU?
Rick Hohensee
Or as paraphrased by Leo F. Brodie, (Thinking Forth?)
Brevity is the soul, the essence, the very apotheosis of wit.
;o)
Same difference.
Rick Hohensee
Shhhh! MPE's VFX compiler uses a technique that sounds similar to
the one you describe. There are a couple of papers about the VFX
compiler on the EuroForth web site linked from www.forth.org.
If you can point to earlier papers I would be grateful. As I
have stated before, I believe that there is very little in the
VFX compiler that is actually new, we just applied classical
compiler techniques to Forth.
Stephen
--
Stephen Pelc, s...@mpeltd.demon.co.uk
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeltd.demon.co.uk - free VFX Forth downloads
[ for a register machine ]
> It's quite
> elegant how a stack technique can simplify code optimization.
Do you know of any literature on optimizing for a stack machine
that supports memory ops? Surprisingly, I find that it is much more
difficult to do than for a pure register machine. The problem is
with temporary results. When you find out a temporary is not necessary
or can be fused with another temp, all references to the physical
stack must be recomputed. This is a nightmare. I'm sorry if this is
unclear, it is difficult to explain.
-marcel
>
> "Bernd Paysan" <bernd....@gmx.de> wrote in message
>> That's right, he makes the same mistakes as the IA-64 designers.
>> His register file also is ways too large and has an additional
>> indirection via the stack pointer into the "local" section of the
>> register file. It looks nice from the C compiler builder's point of
>> view, though.
>
> You're talking about a paper toy, designed to show algorithm
> design in enough detail to show the kinds of choices and
> optimizations a programmer can face, as if the design were
> going to be popping out of chip ovens tomorrow.
Don definitely sais that MMIX is intented to be a implementable
design. It might not get actually implemented, but it should be
possible. Don talked to several serious chip-designers (Patterson,
some of the senior Alpha designers), who made suggestions. One of the
important points about using MMIX instead of a high level language is
that you can accurately predict how many cycles a program takes to
run, and the MMIX simulator takes a great deal of effort to really
simulate things like caches and that.
The mistakes (like too large register file with additional
indirection) is something Knuth did not do alone, he's inspired by
architectures like IA-64 or AMD 29K.
I don't think that is Knuth's goal with MMIX; last I heard his
prefered way of software development is to use Web/C (or C-Web), and I
don't think he has replaced that with MMIX assembly programming in
general. MMIX is primarily designed for TAOCP examples, and I think
maintainability is not s important there.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
For IA-64 these may be mistakes, for MMIX I don't think so. I think
MMIX is more like Playdoh, i.e. an exploratory architecture that
offers most of the possibilities that might be interesting (although
having all of them in one commercial architecture is probably not a
good idea). You can then go ahead and explore different subsets
easily. E.g., to explore the effect of having fewer registers just
use fewer registers; similar for splitting the register file into
integer and FP regs, eliminating the register stack or restricting it
to fixed-size windows, etc.
>Why are these killers for high-performance computers?
>
>a) Register file access slows down quadratically with increasing size
>of the register file, since both R and C go up with the line length.
For a given drive strength and linearly arranged register file, yes.
However, you will probably adjust the drive strength to the length of
the lines you want to drive (I think this costs time logarithmic in
the amount of current you want to drive; OTOH, you may have to make
the lines wider/thicker, which increases C, but decreases R), and you
have a plane to arrange the transistors (which asymptotically reduces
the line lengths to the sqrt of the number of registers).
Given the physical register file size of the Pentium 4 (IIRC 128
regs), I don't think that's the problem with IA-64.
>b) Additional indirection makes register renaming (for OOO) a pain in
>the ass. That's IMHO the basic reason why there's no OOO SPARC yet.
>If you have some small subroutines, a state-of-the-art OOO processor
>will have several of them decoded and renamed in flight. With SPARC
>and the fixed sized windows, an implementation would still be
>feasible, but with variable sized register windows like MMIX and
>IA-64, the complexity of the register renaming part would be quite
>interesting ;-).
How about doing the architectural renumbering before the renaming?
Ok, you would have to know early in the pipeline what the offset is;
this could be achieved with appropriate encoding and by treating the
offset change instructions as special cases (e.g., you don't need to
rename the offset register). The alternative would be to suffer a
several-cycle pipeline bubble on changing offsets, and maybe use an
offset predictor to reduce the frequency of these bubbles.
Crossposted and followups to comp.arch.
My compiler design (which is all in my head at the moment) will be doing
something like this. Stack operations will just change "pointers" to
stack elements, whether they are in registers or on the stack in memory.
At the end of a primitive, each stack element will either be in its
proper place in memory, or in a register. That should make it easy to
clean up at the end of a block (I'm trying to keep it a simple, one-pass
compiler).
>Shhhh! MPE's VFX compiler uses a technique that sounds similar to
>the one you describe. There are a couple of papers about the VFX
>compiler on the EuroForth web site linked from www.forth.org.
>If you can point to earlier papers I would be grateful. As I
>have stated before, I believe that there is very little in the
>VFX compiler that is actually new, we just applied classical
>compiler techniques to Forth.
>
>Stephen
>--
>Stephen Pelc, s...@mpeltd.demon.co.uk
>MicroProcessor Engineering Ltd - More Real, Less Time
>133 Hill Lane, Southampton SO15 5AF, England
>tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
>web: http://www.mpeltd.demon.co.uk - free VFX Forth downloads
I'm virtually certain I read it in either Forth Dimensions or The
Journal of Forth Application and Research. I remember one issue
devoted to optimization, although I'm not sure the article on the
virtual stack was in it. I do remember reading an article on edge
optimization, looking at whether I could bolt it on the front of my
optimizer, and realizing the virtual stack took care of most of the
optimizations (or maybe all) that I could do with edge optimization.
That makes me think the virtual stack article appeared later. In any
case I plan on digging out my collection of Forth literature and
searching through it, this weekend if not sooner.
-mike
>I dono, but welcome back. Do you remember who beat you to it?
>I find that sort of thing tends to be memorable :o)
>
>Rick Hohensee
Unfortunately, I have a horrible memory for names. But it wasn't a
huge matter to me, I had read a book on the Bulldog compiler, which
optimized code for Long Instruction Word machines, and I got the
optimizer bug. So it was just a personal project. When I saw the
article I was actually a bit happy because I thought someone would do
the heavy lifting for me and I could buy their forth. Didn't happen
though.
-Mike
Are you talking about getting an address with SP@ and then accessing
into the stack relative to that. That certainly is a problem. (That's
forbidden by the Standard isn't it?) Words like PICK and ROLL were a
problem if the number on the top of the stack wasn't known at compile
time. My virtual stack tagged values that were known at compile time
and could deal with that. If PICK or ROLL were encountered with an
indeterminate value, the virtual stack would be flushed and a
traditional PICK or ROLL would be performed.
But I guess that doesn't sound like what you're talking about. Oh
geez, I think I see. Do you mean optimizing for a physical stack
machine, like the old Novix chip? I thought about that and realized
what I was working on was no help at all for that case.
I remember seeing some articles for peephole optimization of Forth.
But that's about it.
-mike
>
>My compiler design (which is all in my head at the moment) will be doing
>something like this. Stack operations will just change "pointers" to
>stack elements, whether they are in registers or on the stack in memory.
>At the end of a primitive, each stack element will either be in its
>proper place in memory, or in a register. That should make it easy to
>clean up at the end of a block (I'm trying to keep it a simple, one-pass
>compiler).
>
>--
>George Morrison
>Aberdeen, Scotland
My compiler never got beyond the toy stage, It just output assembly
code so I could look at it and decide how good it was.
If you tag values that are determinate, you can do some extra
optimizations. Say you push the base address of an array and add some
fixed value to it, you can perform that at compile time. It might be
good to start off this way, because you don't need to code these
optimizations from the start, you can just let your compiler stick
values into registers, but you can come back later and add them.
One thought I had while working on this was to create a sort of
universal stack to register translator, the machine independent part
would be told how many registers were available and would feed a
machine specific compiler with a sort of modified Forth which included
register ALU ops. Then folks would just have to implement the coder
for specific machines.
-mike
> One thought I had while working on this was to create a sort of universal
> stack to register translator, the machine independent part would be told how
> many registers were available and would feed a machine specific compiler with
> a sort of modified Forth which included register ALU ops. Then folks would
> just have to implement the coder for specific machines.
That's almost exactly what I do! Unfortunately, there are CPU's with both
registers _and_ a stack (e.g. x86 FPU). The stack compiler poses problems
that seem to necessitate a completely different approach. So I think there
is still room for a few original ideas.
-marcel
Maybe you are thinking of
@Article{almy86,
author = "Thomas Almy",
title = "Compiling {Forth} for Performance",
journal = jfar,
year = "1986",
volume = "4",
number = "3",
pages = "379--388",
annote = "A batch Forth compiler for the 8086 and the Z80. It
produces machine code in executable files. It uses
peephole optimization and keeps up to two values from
the top of the stack in registers."
}
That's the earliest one I know of.
You are thinking of the 387 architecture, right? Well, there are the
various optimization manuals of the manufacturers. My impression is:
For the P5 (Pentium), you could treat it as a register machine and
expand the register-register operations to a stack-based operation
plus fxch; this only makes sense if you want to reorder the operations
or for optimizing FP-stack stack manipulations, otherwise just treat
it as a stack machine.
For newer implementations (e.g., P6, K7), just use them as stack
machines, i.e., no particular optimization. You may combine FLDs with
operations; this will probably help the K7 slightly and may hurt the
P6 slightly.
If you want your FP stack to be more than 8 items deep and cannot get
the overflow handling to work, you will probably have to store/reload
the items in compiler-generated code.
> Surprisingly, I find that it is much more
>difficult to do than for a pure register machine. The problem is
>with temporary results. When you find out a temporary is not necessary
>or can be fused with another temp, all references to the physical
>stack must be recomputed. This is a nightmare. I'm sorry if this is
>unclear, it is difficult to explain.
It is unclear. Please give an example.
> [ for a register machine ]
>> It's quite
>> elegant how a stack technique can simplify code optimization.
> Do you know of any literature on optimizing for a stack machine
> that supports memory ops? Surprisingly, I find that it is much more
> difficult to do than for a pure register machine.
What do you mean by "a stack machine that supports memory ops"? I
can't see what issue you're referring to.
Andrew.
For an x86 FPU FSWAP -> fxch etc. (most standard Forth words are
one or two instructions). But you can also do fstp <eff. address> and
fld <eff. address>. Apart from using these for F! and F@, you need
them for spills and fills to a memory stack.
-marcel
It's already on my TODO list 8-)
>For an x86 FPU FSWAP -> fxch etc. (most standard Forth words are
>one or two instructions). But you can also do fstp <eff. address> and
>fld <eff. address>. Apart from using these for F! and F@, you need
>them for spills and fills to a memory stack.
>
>-marcel
>
OK, THAT'S what you meant. Yes that is a problem, once you've pushed a
bunch of stuff on the FP stack, if you've got to flush some values to
a stack in memory they're on the bottom.
On the Pentium, when you switch to MMX mode, are the contents of the
FP registers preserved? You might be able to get at the values to save
them to memory that way. Other than nothing comes to mind.
-mike
>That's almost exactly what I do! Unfortunately, there are CPU's with both
>registers _and_ a stack (e.g. x86 FPU). The stack compiler poses problems
>that seem to necessitate a completely different approach. So I think there
>is still room for a few original ideas.
>
>-marcel
It would be great to have a portable model of an advanced Forth
compiler. Might it be possible to revisit the old days when Fig-Forth
was popping up on everything with a CPU in it? I wish you luck!
I haven't written x86 code since the 16-bit days, so I don't know how
things may have improved. But do you think the compiler needs to be
aware of the differences between registers intended as data or address
registers? When I was working on the virtual stack idea, it appeared
the compiler should deal with this (you had cpu's like the 68000 and
32000 then). With the RISC philosophy, it doesn't seem necessary.
-mike
Great! I think I started out with a rather large set of tags, but
reduced the number as I went on. As I alluded to in another post, it
seemed like a good idea to tag anything that looked like it might be
an address.
-mike
I have been thinking about optimisation of Forth for a long time.
One of the important conclusions is that Forth optimisation must be
global, i.e. transcending the boundaries of words, that tend to
be small.
I came up with this. (A chapter of my attempt for a thesis, hence
the reference to AI.). It certainly is not complete, loop and recursion
optimisation is missing, although I have some ideas about it.
I do have finished the analyser using ciforth as a test bed.
(Section "Data collecting").
The "optimisation by recursive in lining" (see bottom) has been
implemented for linear code only i.e. no control structures.
This is a program that will find out all stack effects and
optimisation classes for the Forth kernel words.
In a separate post I will give some examples and code.
----------8<--------------------8<---------------------
Introduction
============
You may wonder why an optimizer for a computer language would be
considered an AI application. This optimizer is not so much for a
particular language as well related to a Computer Intelligence that has
insight in her own code.
Different types of optimisations interfere and finding ones way
through this certainly requires some heuristics. The bottom line is
that an optimiser qualifies as an AI application.
Properties
----------
A Forth system is a database of small programs. It is worthwhile to
investigate what properties these small programs (words) might have.
The flag field of a word allow to add this information to the header.
A certain combination of flags allow a particular optimisation.
Definitions
-----------
An annihilator is a word that only deletes an item from the stack.
Examples are DROP 2DROP NIP RDROP.
A juggler reorders the stack without adding or removing items.
Examples are SWAP 2SWAP ROT.
A duplicator copies an item from the stack. Examples are DUP OVER
2DUP.
Optimisations
-------------
Optimisations are manipulations on a program source or intermediate
code to improve the speed of the resulting program. In other respect
the result is inferior. Symbolic debugging - one of Forth's strong
points - goes through the drain. (The name "optimisation" is a
misnomer.)
* Constant folding is a well known technique in optimisation. It
means that if an operator works on constants the result may be
replaced by a constant that is calculated at compile time. In
Forth we generalise this to folding. Folding refers to all words
that can be replaced by simpler words in case they receive
constant data on the stack.
* Reordering is not so much an optimisation per se, but it allows
other optimisations to kick in. As a rule of thumb constants are
moved to the top of the stack, where they fall prey to folding.
Reordering might also eliminate a juggler.
* Annihilation is the elimination of e a whole sequence of
operations. In Forth sometimes the result of a calculation is
dropped. Depending on the properties of the calculation, the
calculation itself can be removed. This type of annihilation is
related to an annihilator. Another type is related to conditional
branching where the condition is known at compile time. Code known
to be skipped is removed.
* Inlining means replacing a Forth word with its constituents. This
technique is very important in Forth, more so than in other
languages, due to the small size of Forth words. Inlining is
always a winner in speed, and mostly even also a winner with
regard to space.
Even more important is the fact that inlining allows folding to be
applied across constituent words. This applies to high level and
low level code alike.
Inlining high level code is trivial. A further inlining stage
replaces a high level definition that only calls code words, by a
code definition which concatenates the code words.
Data collecting
---------------
In order to add introspective information to a Forth, in the first
place the machine code words must be analysed, because ultimately
everything is defined in terms of code words. For this purpose the
code words are disassembled using a disassembler that allows to readily
inspect the parts of a disassembled instruction. A Postit-Fixup
assembler and disassembler is well suited.
* By inspecting register words in the disassembly, registers usage
can be accumulated. This information is then added to the header.
* At the same time information about whether memory or I/O ports are
accessed for read or write can be collected. It turns out to be
useful make a difference between input and output side effects.
Here the words to look out for are the MOVE and IN/OUT
instructions, operations that access memory (in fact all
operations that not target registers) and special instructions
like the string operations on Pentia.
* Finally the stack effect can be deduced from the number of POP's
and PUSH'es. And the use of the return stack can be marked, which
mostly warrants a special treatment.
After all code words have been analysed, the stack effects and
register usage can be concluded for all high level words. The stack
effect of a high level words is the concatenation of the stack effect
of its constituents. The register usage of a high level word is the
logical or of the register usage of the constituents, as are its side
effects.
There will no doubt be exceptions. It is wrong to cater for too
many exceptional situation in such a heuristic tool. Instead, the
exception are filled in by hand before the automated collection is
started, it fills in only as yet unknown items. Of course it helps to
have a simple and straightforward Forth to begin with.
Purpose
-------
A \ci in general will use optimisation to generate a temporary
definition that is much faster, and retain all the valuable partial
information about words.
In normal non-AI applications, words are recursively replaced by
faster words, and those eventually by code definitions. Meanwhile words
that are no longer directly used in the final application are
eliminated. For space conservation headers may be removed as well,
provided in the application no dictionary lookup is needed.
Implementation
==============
The following implementation notes apply to a 32 bits Pentium Forth
where a full cell (4 bytes, 0..3) is reserved for the flags. They must
be considered as an example. The information about a word,
optimisation opportunities and stack effect, sits in the flag field.
Whenever nothing is filled in in the flag field, it means unknown.
This applies equally to the stack effect as to the optimisation flags.
Stack effects
-------------
The information about the stack effects sits in the byte 3 of the
flag field. The highest nibble of this third byte applies to input.
It is the number of stack items popped plus one. The lowest nibble
thusly indicates the number of pushed items. 0 means an unknown (not
yet analysed) stack effect. 0FH indicates a variable number of items.
The stack effect is added in three steps. For all low level words
the stack effect is found by counting pops and pushes. Irregular stack
effects are corrected as well as filled in for high level words. All
high level stack effects are derived from the stack effect of their
constituents.
Code words are analysed by disassembling the code that is pointed to
by the code field to the first "next" code encountered. For each
instruction the opcode, which is the first part of its disassembly, is
looked up in a set of possible pop and push instructions.
Irregularities are just tabulated in the source code of the analyser.
Words are recognized by their code field. High level words are
either created by ":" or by a "CREATE .. DOES>" construction . They
are recognised by the code field containing DOCOL or DODOES
respectively. For both the data field points to a chain of high level
calls, i.e. a number of such calls possibly with inlined data and
ending in a "(;)", the word compiled by ";". (The result of this "high
level next" is to return control is returned to the word that called
this one.) For a linear chain of calls the stack effect is calculated
as follows:
* Start with a effect of ( 0 - 0 ).
* For each constituent
Subtract the pops from the left (output) nibble. If the
output nibble is negative, add its (absolute) value to inputs, and
make it zero. Add the pushes to the left (output) nibble.
(Correction by 11H is not yet done).
The following exceptions to a linear chain have special treatment:
* LIT BRANCH'es and SKIP are followed by inline data that must be
taken care off
* A BRANCH or 0BRANCH forward is always taken, analysing just one
path through a definition: the shortest one. A more sophisticated
way is to analyse all paths and conclude a variable outcome if it
is not consistent or any of the paths contains a variable
constituent.
* If the stack effect of a constituent is variable, the result is
variable, overruling any other outcome
* If the stack effect of a constituent is unknown, the result is
unknown, overruling any other outcome except variable.
* For a CREATE .. DOES> word the linear chain pointed to by the
DOES> pointer is analysed. However the stack effect is initialised
to ( 0 - 1) to reflect the passing of the data pointer to the
DOES> part.
* '<SOME-WORD> EXECUTE has the stack effect of <SOME-WORD> . Other
occurrences of EXECUTE lead to a variable stack effect. Lateron
we will leave this to the optimiser, but at the stage of analysing
the kernel this is useful, especially because all usage of EXECUTE
in the kernel is of this type.
* '<SOME-WORD> CATCH has the stack effect of <SOME-WORD> plus an
extra output. Other occurrences of CATCH lead to a variable stack
effect. So a word is treated as if exceptions do not occur. This
is okay because the stack effect is not relevant in case of
exceptions.
A high level word is recognised by its code field address containing
DOCOL , i.e. the nesting routine for the interpreter. A CREATE ..
DOES> word is detected by is code field address containing DODOES ,
i.e. the common code that starts up words defined by compiler extension.
All other words are treated as code.
The whole of Forth is treated as follows:
* Fill in the exception
* Fill in the code words
* Sweep repeatedly through the dictionary, from early to latest:
For each unknown stack effect, try to find it by discriminating
between DODOES DOCOL and other words, Stop if no progress is
made any more.
Hopefully everything is known now, but maybe we must add to the
exceptions. And repeat the above process.
The notion of a simple sequence is one that doesn't reach to the
stack outside what is defined within the sequence.
Optimisation classes
--------------------
As has been pointed out, the optimisation class of a word is
indicated by a bit set in the flags field. Each bit set to true opens
up a particular opportunity for optimisation. Further a sequence has a
certain class if each constituent has that class. For example, if one
of the words called does a store, the sequence is found to do a store
and the optimisations that would be allowed by "no stores" are blocked.
So the optimisation class of a sequence is the logical or of the oc's
of the constituents. This can be done efficiently by bit-wise or
operations.
The no store bit.
.................
The "no store" bit would better be named "no output side effect" bit.
It indicates that the outside world doesn't change by executing this
word. Again not that the stacks and internal registers are inside.
Note that fetching from an input port has an output side effect, (as
well as an input side effect.)
The following optimisation are opened up:
* In combination with an annihilator. If the output of a "no store"
sequence is annihilated, the whole sequence and the annihilator
may be left out. Example: BASE CELL+ XX NIP becomes XX
* In combination with a juggler. If the outputs of "no store"
sequence are juggled, the sequences itself may be juggled,
eliminating the juggler. Example: XX CELL+ BASE SWAP becomes
BASE XX CELL+
* In combination with a duplicator. Again a sequence may be
duplicated and the duplicator eliminated. This is not an
optimisation, except for the duplication of constants. Going the
other direction can be an optimisation. Two identical sequences
with no output side effect can be replaced by one and a duplicator.
Example: (for a recursive definition with stack effect ( 1 - 1 )
and no side effects) 12 RECURSE OVER RECURSE becomes 12 RECURSE 12
RECURSE (elimination duplicator)
12 RECURSE 12 RECURSE becomes 12 RECURSE DUP. (introducing
duplicator)
The no fetch property.
......................
The "no fetch" bit would better be named "no input side effect" bit.
It indicates that the outside world affects the outcome of this word.
Input side effects are weaker than output side effects and the
knowledge that they are absent allows less optimisation.
The no stack effect property.
.............................
The "no stack effect fetch" bit refers to absolute accesses of the
stacks, i.e. where the data or return stack are not used as stacks.
Typical examples are DEPTH and RSP. These words are rare but prevent
considerable optimisation.
The no side effect property.
............................
The combination of the "no store ", "no fetch " and "no stack effect
" properties is quite common. Such a word is said to have the "no side
effect" property. The combination allows substantially more
optimisation than each alone. We will use the abbreviation NS for this
important concept. Examples are CONSTANT's, VARIABLE's, operators like
+ or NEGATE, and all stack manipulations: jugglers, annihilator,
duplicators.
NS-words are amenable to folding:
* If a NS-sequence has only constant inputs, it may be run at
compile time. Its inputs and the code sequence may be replaced by
the resulting constant outputs. Example: After "12 4 3 SWAP * +"
is replaced by 24.
* If a NS-sequence has no inputs, it may be run at compile time and
replaced by the resulting constant outputs. The difference with
the preceeding example is that the sequence starts with 12 instead
of *. Any literals are of course NS.
On closer inspection the second condition is equivalent to the first.
It is the more easy one to implement.
Associativity.
..............
An operator with two inputs and one output, so called "binary
operators" can have, in addition to NS, the property of associativity.
This refers to a situation where three operands are involved. Examples
are OR and + . However not F+ . In the following we will denote an
associative operator by %. Associativity allows to replace the
sequence "x % y %" with "x y % %" where it may be that "x y %" can be
folded into a constant. Example: (assuming a 64-bit Forth) "CELL+
CELL+" is first inlined to "8 + 8 +" then associated to "8 8 + +" then
folded to "16 +".
Note that it is not necessary to look for other patterns, in view of
other transformation that are done.
Short circuit evaluation.
.........................
Another optimisation aplicable to binary NS-operators is short
circuit evaluation. This is the situation where the result is known,
while only one of the operands is known, such as "FFFF AND" "FFFF OR"
"0 +" "0 XOR" and "0 *". Some of these operations can be just dropped,
while in other cases the result is known and the other operand
(possibly non-constant) can be dropped.
Optimisation by recursive inlining
----------------------------------
A word is optimized by first optimizing its constituents, then
inlining the constituents and apply any optimisation opportunities like
folding that open up.
In more detail we have the following steps:
* Check. First of all check whether the item has been optimised
already. We do in fact a "depth first" optimisation, so the words
lowest in the call hierarchy are optimised first. It is important
to only attempt optimisation once. This cuts the recursion short.
* Recurse. For all constituent words of this definition, do an
optimisation, such as defined in these steps.
* Inline. Build up an executable sequence in the dictionary. Inline
a constituents word, keeping track of all opportunities to
optimise. Try to build up a sequence of NS-words that starts with
constants and where each word following doesn't consume more
inputs than are available. Consequently the outputs are available
as constants.
* Breakdown. When a sequence of NS-words breaks down, we have
identified a sequence that can be run at compile time. This
sequence is run, and removed from the output sequence. Then the
output of the run is compiled, as a sequence of constants.
* Special opportunities. After each inlining the constituent is
checked whether it allows special optimisations. Here the likes of
the associativity optimisation is done, by looking for a "operand
% operand %" pattern. In a fashion similar to the breakdown of a
NS-sequence the tail of the sequence that is being built up is, is
replaced.
* Replace. After inlining is finished, the sequence is now attached
to the word we are optimizing to replace the original sequence.
Maybe the original code is kept if no folding took place and the
sequence is longer that a certain limit.
* Mark properties. The current word is marked as optimised. Its stack
effect and its optimization classes are derived from its
constituents and added to the flags header.
--
--
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
To suffer is the prerogative of the strong. The weak -- perish.
alb...@spenarnc.xs4all.nl http://home.hccnet.nl/a.w.m.van.der.horst