Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

^Forth improvements

3 views
Skip to first unread message

David N. Williams

unread,
Dec 11, 2002, 11:07:52 AM12/11/02
to
I've upgraded the ^Forth to C translator for pfe to include
floating point words, with a few other code generation
revisions. This works mainly for MacOS X/ppc and linux/intel.
For the ppc, there are options to have TOS and FTOS in
registers. Here's the tree:

http://www-personal.umich.edu/~williams/archive/forth/hatforth/dir.html

Besides the traditional gforth integer benchmarks, I've added
Krishna Myneni's semiconductor laser benchmark (slbench) for
floating point:

http://www-personal.umich.edu/~williams/archive/forth/hatforth/slbench.fs.html
http://www-personal.umich.edu/~williams/archive/forth/hatforth/slbench.hf.html

On my ppc, the speedups (inverse sys time ratios) in the integer
benchmarks for translation with a register TOS, compared to
translation without, range from about 1.3 to about 1.7.

For slbench, the speedup for translation with an FTOS register
compared to translation without is about 1.08, whether or not
there's a TOS register. The total speedup for both in registers
compared to neither is about 1.3.

Gforth is pretty close to ^Forth on the integer benchmarks for
linux/intel -- in fact it wins for sieve and bubble.

I think the integer matrix benchmark results are kind of
interesting. Here's the ^Forth dialect for CMATRIX-MAIN1:

def: cmatrix-main1 "c_matrix_main_one" ( -- )
initiate-seed
ima cinitiate-matrix
imb cinitiate-matrix
imr ima mat-byte-size bounds DO
imb row-byte-size bounds DO
j i cinnerproduct over ! cell+
cell +LOOP
row-size cells +LOOP
drop
;def

The ^Forth dialect for MATRIX-MAIN is the same, except that the
words CINITIATE-MATRIX and CINNERPRODUCT, defined in

http://www-personal.umich.edu/~williams/archive/forth/hatforth/gfbench.hf.html

as hand-coded C, are replaced by INITIATE-MATRIX and
INNERPRODUCT, defined as high-level ^Forth for automatic
translation. The speedup for CMATRIX-MAIN1 compared to
MATRIX-MAIN is substantial, about 5.4 for the ppc results with a
TOS register (7.6 without), and about 22* for the intel results.

It's just common wisdom that you can get big speedups by
optimizing critical words. The point I'm advertising is that
the ^Forth translator makes it almost as easy to do that by
mixing in C as many Forth's do by mixing in assembly language.
As far as I understand, TIMBRE does that (with C), too.

-- David

* Yep, I checked that multiple times, including checks that
MATRIX-MAIN and CMATRIX-MAIN1 actually calculate what they're
supposed to.

Krishna Myneni

unread,
Dec 11, 2002, 7:45:50 PM12/11/02
to
"David N. Williams" wrote:
>
> I've upgraded the ^Forth to C translator for pfe to include
> floating point words, with a few other code generation
> revisions. [...]

> For slbench, the speedup for translation with an FTOS register
> compared to translation without is about 1.08, whether or not
> there's a TOS register. The total speedup for both in registers
> compared to neither is about 1.3.
> [...]

David,

What is the speedup in using the ^Forth generated pfe module
compared to running the original code in either pfe or gforth
on the same machine? Those who have not heard of ^Forth before
may be interested to know this.

Krishna

David N. Williams

unread,
Dec 11, 2002, 11:56:46 PM12/11/02
to
Krishna Myneni wrote:
>
> What is the speedup in using the ^Forth generated pfe module
> compared to running the original code in either pfe or gforth
> on the same machine? Those who have not heard of ^Forth before
> may be interested to know this.

Yeah, I should have included a direct link to the benchmark
results:

http://www-personal.umich.edu/~williams/archive/forth/hatforth/bench-results.html

Here's part of that, expressed in terms of the speedup of ^Forth
translation compared to pfe and gforth interpretation.

For the ppc, ^Forth uses a TOS register, plus an FTOS register
for the sl benchmark. For the intel benchmarks, neither TOS nor
FTOS is in a register.

ppc intel
---------- ----------
pfe gforth pfe gforth
sieve 5.1 2.9 4.0 .78
bubble 3.5 2.0 3.6 .91
matrix 4.2 2.1 3.4 1.3
fib 4.0 2.3 4.8 1.1
sl 4.0 2.3 3.9 1.8

It looks tricky to compare these results to those quoted by
Anton Ertl and Martin Maierhofer for their prototype f2c
translater on older 486 systems:

http://mips.complang.tuwien.ac.at/papers/ertl&maierhofer95.ps.gz

However, we do see that their translator was nearly as fast as
hand-coded C, whereas the ^Forth benchmarks are substantially
slower than hand-coded C. ^Forth optimization is not at all
sophisticated. But its speedup seems generally worth doing, and
it's easy to mix in hand-coded C to improve that.

-- David

0 new messages