I have been interested in Forth ever since I got hold of a cheap
Jupiter Ace many years ago. However I am not really a Forth
programmer and would describe myself as a recreational Forther.
Recently I have been playing with Camel Forth running on CP/M and,
because I changed some of the internal code, had to single-step
through it with a debugger. What I saw in the debugger really
changed my view of Forth and its performance.
Up until now I have always thought of Forth as small, neat, and
efficient. But during one of my debug sessions I had to
single-step through every machine instruction in the Forth boot
sequence and by the end of the session any feeling of efficiency
was gone. The language still looked small and neat to me, but
comparing it with machine code (the defacto standard on Z80 and
CP/M) it almost looked like bloatware.
It is the overhead of threading that looks inefficient. Stepping
the Forth machine's instruction pointer along a thread with NEXT,
pushing the pointer onto the return stack when entering a new
thread, and popping it off at the end is all overhead. On the Z80
this overhead uses more computing time than the actual program
that is trying to run.
An extreme example is the word 1+. To increment the register that
holds the top of the stack takes 6 T cycles on the Z80, but when
this has been done the overhead to move to the next word in the
program takes 38 T's. That's an appalling ratio of 5 lots of
overhead for 1 lot of useful work.
More typical words have much better ratios because the overhead of
moving from one word to the next remains fixed at 38 T's while the
amount of useful work goes up. The word + does 29 T's of work and
! does 49 T's. At the far end of the scale is UM/MOD which does
hundreds of T's of useful work for the same fixed overhead of 38 T
cycles when it finishes.
However, in my programs words like +, ! and 1+ appear a lot more
often than the rather rare UM/MOD, so overall I suspect that Forth
code running on my CP/M system will run at about one half or one
third the speed of assembler code that is trying to do the same
job. It is that half speed or one third speed that feels
inefficient to me.
Brad Rodriguez who wrote Camel Forth made a very good job of it
and he also documented it so well in 'Moving Forth' that it is
easy to understand how it works. (Brad's combination of source
code and good documentation should be used as an object lesson to
teach programmer's that the documents they write are more
valuable than their code). Looking at Brad's work I can't see any
way to reduce the overhead of running Forth on a Z80 processor.
The upshot of all this is that I now feel that I don't want to
write any real applications in Forth if they are going to run so
slowly. I couldn't imagine writing a text editor in it and I
certainly wouldn't use it for operating system code. It might be
a useful tool for writing a small program where speed doesn't
matter or for prototyping an algorithm before coding it in
assembler, a little bit like hacking a shell script together under
UNIX and then recoding it in C when a full speed version is
needed.
So is this Forth's role on a Z80: a kind of scripting language?
Thanks
dawa
---
Perth, Western Australia
---
This world is but a dream.
Paradise, delusion, or nightmare: take your pick.
---
If you want to reply by email then change the p's to m's
So, you've seen one Forth, and judged it on (what you think) decides
execution speed.
Your broad conclusions are not correct. Stick around, or rummage
through DejaNews.
-marcel
Not necessarily. My most recent (but still ancient) unpublished
Z80-CP/M Forth used subroutine threading with inlining, and was
reasonably efficient. Maybe I should put it up somewhere, but it
uses a coding style I find a bit embarrassing now (extreme
decomposition into modules), and I have no way of checking that it
would run on anything now existing.
--David
_ _________________________________________________________________
(_\(__
_|__) David N. Williams Phone: 1-(734)-764-5236
__|___ University of Michigan Fax: 1-(734)-763-2213
\ |:-) Physics Department Email: David.N....@umich.edu
\| Ann Arbor, MI 48109-1120 Office: 3421 Randall Laboratory
>Up until now I have always thought of Forth as small, neat, and
>efficient. But during one of my debug sessions I had to
>single-step through every machine instruction in the Forth boot
>sequence and by the end of the session any feeling of efficiency
>was gone. The language still looked small and neat to me, but
>comparing it with machine code (the defacto standard on Z80 and
>CP/M) it almost looked like bloatware.
Before you can judge the efficiency of anything, you have to judge its
effectiveness. Which is the more efficient road, the expressway
heading downtown or the gravel road which it crosses over on the way?
The one that gets you where you are going is going to be more
effective, and you should not start judging efficiency until you have
narrowed down to the roads that get you there.
As you read the articles which discuss the process of bringing up
CamelForth on a variety of 8-bit microprocessors, keep some
effectiveness considerations in mind --
(1) what is the effectiveness of investing extra writing tight
assembly code which will run once, and whose execution speed will be
dominated by the time required to load the code from disk?
(2) what is the effectiveness of the Forth development cycle to a Z80
assembler development cycle?
(3) if you can isolate a specific section of code where a substantial
portion of the execution time in your application code is spent in
pushing and pulling the return stack, is there anything stopping you
from writing that section, and that section alone, in machine
language?
I'd reckon that small and neat is a major part of effective for a boot
sequence. You may well save more time on small in run-once code than
you lose roaming around the return stack.
(
----------
Virtually,
Bruce McFarling, Newcastle,
ec...@cc.newcastle.edu.au
)
Jerry
--
When a discovery is new, people say, "It isn't true."
When it becomes demonstrably true, they say, "It isn't useful."
Later, when its utility is evident, they say, "So what? It's old."
a paraphrase of William James
-----------------------------------------------------------------------
dawa wrote:
>
> Hi,
>
> I have been interested in Forth ever since I got hold of a cheap
> Jupiter Ace many years ago. However I am not really a Forth
> programmer and would describe myself as a recreational Forther.
>
> Recently I have been playing with Camel Forth running on CP/M and,
> because I changed some of the internal code, had to single-step
> through it with a debugger. What I saw in the debugger really
> changed my view of Forth and its performance.
>
> Up until now I have always thought of Forth as small, neat, and
> efficient. But during one of my debug sessions I had to
> single-step through every machine instruction in the Forth boot
> sequence and by the end of the session any feeling of efficiency
> was gone. The language still looked small and neat to me, but
> comparing it with machine code (the defacto standard on Z80 and
> CP/M) it almost looked like bloatware.
>
> It is the overhead of threading that looks inefficient. Stepping
> the Forth machine's instruction pointer along a thread with NEXT,
> pushing the pointer onto the return stack when entering a new
> thread, and popping it off at the end is all overhead. On the Z80
> this overhead uses more computing time than the actual program
> that is trying to run.
>
> So is this Forth's role on a Z80: a kind of scripting language?
>
I think if you ask him Brad will agree with your analysis. The marriage
of Forth to the Z80 is less than ideal. Brad's favourite 8 bit machine
is the 6809 where you can perform a Forth NEXT in 1 instruction. Every
processor maps to Forth differently. You have to chose if it is what
you need.
On the other hand if you use Forth as a dynamic linker/debugger for your
Z80 project you can hand code the whole project and get the performance
back. My experience is that you would find the 80/20 rule applies and
you would not get a large benefit by using assembler for the whole
project
--
Brian Fox Executive VP
Email : bf...@microtronix.ca
Web site: http://www.microtronix.ca/
MICROTRONIX SYSTEMS LTD. Ph: (519) 649-4900, Ext 134
955 Green Valley Road FAX (519) 649-0355
London Ont.
Canada N6N 1E4
> So is this Forth's role on a Z80: a kind of scripting language?
TCOM which is PD and makes a nice addition to FPC has support
for 8080 in tcom80.zip. The target code generated runs under a
z80 emulator on the PC or on real z80. It could very easily
be extended to support Z80 specific features. If you want
a native code compiler with optimization that supports multiple
targets and has lots of sample code and examples and is free
you can get it at http://www.forth.org
Jeff Fox
>(2) what is the effectiveness of the Forth development cycle to a Z80
>assembler development cycle?
In my experience:
For relatively simple definitions ("words" in FORTH lingo), they run
close together. This is the case as long it is possibly to pass
parameters and return results through the registers. Well, actually,
FORTH has a small of an advantage because in assmelber, you may need
some register juggling, to get the parameters into the correct register.
I programmed a complete assembler (actually a "monitor", it poked the
bytes directly into memory) this way.
But with a more complex program, as soon as the registers are filled up,
things change. In assembler, you may spend more than half of your time
on pushing and popping registers, in order to preserve their value after
a function call. For example, does a certain function leave the register
DE alone, or does it mess it up? This is really annoying and
counter-productive.
What I found striking, is that this seems to happen around a sharp
threshold. A small increase in complexity, and you could be in it
kneedeep.
This problem doesn't occurs when using FORTH, thanks to it's stack.
--
Bart.
You have gotten a number of excellent responses to your question. If I may
summarize:
1) "Forth" the language must be considered independent of any particular
implementation strategy. There are numerous viable ways to implement Forth on a
particular platform, depending on your objectives. Brad, who is an excellent
programmer, was probably striving for architectural clarity and object
compactness, and willing to compromise some in performance.
2) As several people have noted, you can probably take this approach and tune
it to the performance needs of a particular application easily by coding the
critical functions.
3) As several others have noted, there are implementation strategies
(subroutine threading with direct code compilation) that can yield faster
performance at some potential cost in size and compiler complexity.
The essential characteristics of Forth include its rich command set (including
its explicit use of the stack and extreme syntactic simplicity), its modularity,
and its intrinsically interactive programming style. There are many
implementation strategies, all of which can produce a valid Forth programming
system.
> So is this Forth's role on a Z80: a kind of scripting language?
I and my company have used Forth professionally in literally hundreds of
applications over the past 25+ years. Many (if not most) of these applications
have had intense performance requirements, which we have always met. These have
indeed included both text editors and operating systems, plus instrumentation,
industrial controls, multiuser databases, and much more. The multitasker used
in our embedded system products, for example, is the fastest in the industry
One of my favorite stories (which the regulars here will have heard before)
deals with an airport baggage handling system we re-wrote for American Airlines
a number of years ago. The prior system was all assembly language, and
expensive to maintain. They wanted a maintainable, high-level version, but
couldn't afford much degradation in performance because even the assembly
language version only handled 80 bags/min. while the spec had been 100
bags/min. In fact, our implementation was 25% _faster_ than the assembly
language version, easily meeting the spec.
Now, this certainly isn't because high-level Forth is faster than hand-coded
assembler; no one would claim that.
It was faster because what really determines how well an application performs
tends to be architectural, system design issues, and there Forth excells in its
ability to encourage architecturally simple (and, hence, efficient) designs. In
addition, our underlying OS handled interrupts, hardware control, and
multiasking faster than the prior version.
Cheers,
Elizabeth
--
===============================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310-372-8493
111 N. Sepulveda Blvd. Fax: +1 310-318-7130
Manhattan Beach, CA 90266
http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
===============================================
The performance of the Forth you are running is a consequence of
trade-offs the implementor made in the initial design stage. Many
users of Z80s need code density more than they need speed. On
many 8 bit CPUs, Direct or Indirect Threaded code is a suitable
choice. On others Subroutine Threaded Code is more appropriate.
In nearly all cases, STC will be significantly faster for almost the
same code size. Inlining short sequences also helps.
The picture changes when a really good optimiser is added to STC.
Code size goes down, and performance goes way, way up. If you
want to see the impact of a good optimiser, download the ProForth
VFX trial version from our web site, and use DIS or DASM to
look at the generated code. There is also a set of benchmark
results and code on the web site.
Writing an optimiser for the Z80 will certainly improve performance
substantially.
--
Stephen Pelc, s...@mpeltd.demon.co.uk
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)2380 631441, fax: +44 (0)2380 339691
web: http://www.mpeltd.demon.co.uk - free ProForth downloads!
If you're a Z80 programmer, then convert CamelForth to STC, and see
if that suits your speed requirements.
CamelForth as implemented on the Z80 and published in The Computer
Journal is an excellent teaching tool, but certainly not a speed-
optimized implementation. That's what the source code is for.
Obviously, 'dawa' was modifying it anyway, so.. why don't you rewrite
it? { ..sounds like fun to me.. }
--
Douglas Beattie Jr. http://www2.whidbey.net/~beattidp/
Even after Elizabeth's grand post, there's one point I didn't see
in this thread.
You might have the option of porting your app from a Z/80 to a
Forth engine someday, in which case, I am convinced, a stack machine
will trounce a register machine for the algorithms that take up most
of the world's CPU time. As a matter of language comparison one must
keep in mind that Forth is normally hobbled by non-Forth hardware.
Rick Hohensee
On Thu, 03 Feb 2000 18:38:24 GMT, bart....@skynet.be (Bart Lateur)
wrote:
> Bart.
You get stack manipulation words to compensate.
Simon - http://www.spacetimepro.com Control the World from a Parallel Port
Free Software Source Code - Free CNC Source Code
A reasonable suggestion, but I think it would actually make Forth
slower on a Z80 for the following reasons:
Brad's direct threaded code does NEXT in 38 T cycles. If I
changed NEXT to use a RET followed by a CALL then it would be
reduced to 27 T's. So far this looks like a win: chaining from
one word to the next will run 40% faster.
Unfortunately subroutine threading makes it very difficult to
implement the data stack. Once the Z80's stack is used as the
return stack by RET/CALL the data stack has to be fabricated by
the Forth implementation.
The cheapest way I can see of fabricating a stack would be to use
the HL register as a pointer, load and store values into (HL), and
increment and decrement the HL register to push and pop items off
the stack.
This extra overhead of this fabricated stack would be so expensive
that it would lose any benefit from making NEXT run faster. So my
conclusion is that subroutine threading would actually run slower
on a Z80 and CamelForth is about as good as you can get.
Be careful with your assumptions there Marcel.
I actually looked thoroughly at all the freely available Forths
that I could find to run on CP/M:
- Laxton and Perry's F83
- Wonyong Koh's hforth
- Ed Smeda's DxForth
- Brad's CamelForth
- various FigForth implementations
After analysing all of these I selected CamelForth as the one of
the most efficient. I say "one of" because I would judge hforth
to have the exact same efficiency because it uses the exact same
code for threading.
As you will see in my reply to Douglas Beattie I have even
considered subroutine threading and concluded that it would be
slower on a Z80 than the direct threading that is used in
CamelForth.
All of this is what I had in mind when I wrote in my original
post:
>Brad Rodriguez who wrote Camel Forth made a very good job of it
I know I didn't say this as explicitly as I am saying it here.
Anyway, the upshot is that I still believe my conclusions are
correct about Forth running under CP/M on a Z80.
>TCOM which is PD and makes a nice addition to FPC has support
>for 8080 in tcom80.zip. The target code generated runs under a
>z80 emulator on the PC or on real z80. It could very easily
>be extended to support Z80 specific features. If you want
>a native code compiler with optimization that supports multiple
>targets and has lots of sample code and examples and is free
>you can get it at http://www.forth.org
A nice suggestion but unfortunately I am looking for a Z80 Forth
that runs native on a CP/M box I have, rather than a cross
compiler.
I don't think so. I believe CamelForth is about as good as you
can get on a native Z80 Forth system. See my answers to Marcel
Hendrix and Douglas Beattie for reasons.
>2) As several people have noted, you can probably take this approach and tune
>it to the performance needs of a particular application easily by coding the
>critical functions.
Fair enough.
>3) As several others have noted, there are implementation strategies
>(subroutine threading with direct code compilation) that can yield faster
>performance at some potential cost in size and compiler complexity.
I don't think so. Instead I believe subroutine threading would
actually work out slower on a Z80. See my reply to Douglas
Beattie for reasons.
Direct code compilation could give some improvements, but then if
I want to make my compiler that complex why not work in C?
>One of my favorite stories (which the regulars here will have heard before)
>deals with an airport baggage handling system we re-wrote for American Airlines
>a number of years ago. The prior system was all assembly language, and
>expensive to maintain. They wanted a maintainable, high-level version, but
>couldn't afford much degradation in performance because even the assembly
>language version only handled 80 bags/min. while the spec had been 100
>bags/min. In fact, our implementation was 25% _faster_ than the assembly
>language version, easily meeting the spec.
>
>Now, this certainly isn't because high-level Forth is faster than hand-coded
>assembler; no one would claim that.
>
>It was faster because what really determines how well an application performs
>tends to be architectural, system design issues, and there Forth excells in its
>ability to encourage architecturally simple (and, hence, efficient) designs.
I would like to rebut this with one of your own stories from the
"Last of the Algorithm Developers" thread:
>Over a period of time, Chuck would reluctantly add feature
>after feature. At the point at which the software was almost
>acceptable, Chuck would wake up one morning with a "Eureka!"
>reaction and completely redesign the code for an implementation
>that incorporated all the functionality he had been persuaded was
>essential in a clean, consistent fashion that was much better
>than the patchwork resulting from the incremental add-ons.
I think this second story shows that you are wrong when you say
"Forth excels in its ability to encourage architecturally simple
(and, hence, efficient) designs." Instead I think it is a great
example of the need for planning and design.
If Hewhosenameischuck can make a poor system by incrementally
adding pieces instead of designing the whole of it first, then I
think the difference lies in the planning and design rather than
the programming language that is being used.
I believe that you succeeded so well with your airport baggage
handling system because you had a prior system that acted as a
prototype for you to follow.
All my experience in software development taught me that rewriting
a system produces a much better result than starting from scratch.
This is because the goals and criteria are well defined by the
existing system and because of this planning and design of the new
system are done thoroughly and done well.
Oh go on! Overcome your embarrasment and contribute something so
that other people can benefit from it.
I for one would be happy to run it on my CP/M boxes to check it
out.
The later version (2.5) can include a Forth interpreter/compiler
into the generated target. So the package can generate a
stand alone Forth for you. You can use it as a metacompiler
and then move to your stand alone Forth, you are not tied to
cross development that way. It is an 8080 package so could
be improved with Z80 specific optimizations. ;-)
Jeff Fox
OK. You came on rather strong with your first posting :-)
[..]
>As you will see in my reply to Douglas Beattie I have even
>considered subroutine threading and concluded that it would be
>slower on a Z80 than the direct threading that is used in
>CamelForth.
Subroutine-threading is slower than the other ones. But because one can
start inlining code and do some peepholing on that, the endresult is much
faster. I know what I'm talking about: my first Forth was written like this.
It had the datastack pointed to by HL.
No CP/M though, I was a pure student at the time. Diskdrives and 64K memory
were priced at several 1000K$ each IIRC ...
-marcel
You can't design unless you understood the system. You won't understand
the system unless you have programmed it. The result is the "plan one to
throw away" way to build programs, as written in The Mythical Man-Month.
Chuck does the design when the program almost works (90%), because then
he knows what's realy necessary. That's wise, because everybody knows
that the first 90% take 10% of the time. The other 90% would take the
rest of the time, so the only way to make it faster is to throw away the
code before it rots and start from scratch, now with a clear idea how to
design.
I can't understand that 25 years after the first release of TMMM, people
still are wildly ignorant about these things, and think that you can do
design on paper before starting coding and suppose that you would
actually get something out of it.
Yes, good programs are designed/architected. Few of them are done so
before coding (then typically it's a well-understood problem).
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Oooh! Isn't that rather close to the Wintel argument:
"The new version of the software will run fine if you will just
upgrade your CPU/RAM/disk/scanner/keyboard/desk"
:-)
Thanks for that. I will take a look at it.
>Unfortunately subroutine threading makes it very difficult to
>implement the data stack. Once the Z80's stack is used as the
>return stack by RET/CALL the data stack has to be fabricated by
>the Forth implementation.
>
>The cheapest way I can see of fabricating a stack would be to use
>the HL register as a pointer, load and store values into (HL), and
>increment and decrement the HL register to push and pop items off
>the stack.
Have a look at
1) caching TOS in a register
2) using IX as the data stack pointer, so that you can use base plus
offset addressing.
3) inlining simple definitions
4) writing an optimiser.
Anyone need a VFX optimiser for Z80/64180/EZ80/Rabbit?
CamelForth already does this, so a subroutine threaded Forth that
did this would not be any faster than CamelForth on this count.
I really did mean it when I said in my original post:
>Brad Rodriguez who wrote Camel Forth made a very good job of it
Stephen also suggested:
>2) using IX as the data stack pointer, so that you can use base plus
>offset addressing.
If I fabricated a stack using IX as the pointer, rather than HL,
the instructions to access data off the stack would take three
times longer and the instructions to step the stack pointer would
take twice as long!
>3) inlining simple definitions
>4) writing an optimiser.
Well if I want to go that far I may as well use a full-blown
compiler such as C or Pascal.
No, it is still my conclusion that CamelForth is about as good as
threaded Forth can get on a Z80 and it will run at one half to one
third the speed of assembly language code.
Thanks
That's a case of the cure being worse than the disease :) In fact inlining
with simple optimization is not 'going that far' -- it's easily implemented.
Turns a subroutine-threaded Forth into a native-code Forth, fast as blazes
and with all the benefits.
--
Neal Bridges
<http://www.quartus.net> Quartus Handheld Software!
>>...
>>But with a more complex program, as soon as the registers are filled up,
>>things change. In assembler, you may spend more than half of your time
>>on pushing and popping registers, in order to preserve their value after
>>a function call. For example, does a certain function leave the register
>>DE alone, or does it mess it up? This is really annoying and
>>counter-productive.
>>What I found striking, is that this seems to happen around a sharp
>>threshold. A small increase in complexity, and you could be in it
>>kneedeep.
>>This problem doesn't occurs when using FORTH, thanks to it's stack.
>You get stack manipulation words to compensate.
But as you build the application up, stack manipulation decreases
rather than increases.
I haven't yet explored the idea (and I may not, in the end), but there
might be some use to swapping register sets aside from interrupt
processing and the like.
Jerry
--
Engineering is the art of making what you want from things you can get.
-----------------------------------------------------------------------
Just to be clear: CamelForth was originally intended as an *educational*
project. To this end, I kept the set of machine-language primitives
fairly small -- about 70. (And let's please not open that particular
debate right now.) The Z80 implementation in particular was to be a
"classical" implementation with only two improvements: direct-threaded
code, and TOS in register. This does indeed suffer from a 100-200%
overhead over Z80 machine code. For many past Forth applications, such
an overhead has been tolerable.
If ultimate speed is your goal, then your probably want a language
compiler which produces optimized native code. I'm sure Elizabeth or
Stephen would be happy to produce such a Forth compiler for the Z80 if
the price was right. (There's not a big market for CP/M software these
days.) However -- as Elizabeth has already pointed out -- algorithm
design can yield much greater improvements than compiler optimization.
Cheers,
Brad
--
Brad Rodriguez, Ph.D. T-Recursive Technology bj...@aazetetics.com
Email address spoofed to block spam. Remove "aa" from name & domain.
Embedded software & hardware design... http://www.zetetics.com/recursiv
See the CamelForth page at............ http://www.zetetics.com/camel
>I'm sure Elizabeth or
>Stephen would be happy to produce such a Forth compiler for the Z80 if
>the price was right. (There's not a big market for CP/M software these
>days.)
Z80 is not limited to CP/M. For example, the GameBoy is alleged to be
using a Z80 compatible processor (which started these threads anyway).
I guess the Z80 isn't too unpopular for microcontroller applications
too. I certainly prefer it over a 8051 any day.
--
Bart.
Well, I wouldn't call it "compatible"... No shadow registers, no IX,
no IY, new incremented and decremented indexed mode for HL register...
Greetings,
Jorge
GameBoy uses a 6502 clone.
--
KC5TJA/6, DM13, QRP-L #1447
Samuel A. Falvo II
Oceanside, CA
"Samuel A. Falvo II" <kc5...@garnet.armored.net> wrote in message
news:slrn8brtpe...@garnet.armored.net...