A couple of days back I had written to the Forum about switching from
LMI Forth to a newer well supported Forth for 68hc12 processor.
All the replies I got were immensely helpful. Finally it seems that
there are two companies which are in contention. SwiftX from Forth
Inc. and VFX Forth 6 from MPE. I tried searching on the net about any
comparitive reviews about these two Forth's, but I was unable to find
one.
I would be really greatful if you can point me to a comparison of the
two compilers or share you experience of using them.
Thanks in advance.
Regards
Nihar Desai
Some benchmark comparison testing is shown in my presentation,
"Introduction to Forth...", at the following link:
http://ccreweb.org/documents/programming/programming.html
You may download the presentation in one of several formats
or view the corresponding html files. Comparsion benchmarks
for several Forths are at the link
http://ccreweb.org/documents/programming/html/Forth-Presentation/text29.html
(Note: The benchmark for kForth has improved considerably, 20 to 50%,
depending on the application, for the latest release).
Krishna Myneni
Your benchmark info is very useful. I might suggest that you provide
not only the charts, but also the data printed. This allows for
comparisons that are not clear on a chart with such a wide range of
data. The bars for VFX and SwiftForth are both rather short and hard to
compare.
--
Rick "rickman" Collins
rick.c...@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
===snipped
> >
> > Some benchmark comparison testing is shown in my presentation,
> > "Introduction to Forth...", at the following link:
> >
> > http://ccreweb.org/documents/programming/programming.html
> >
> > You may download the presentation in one of several formats
> > or view the corresponding html files. Comparsion benchmarks
> > for several Forths are at the link
> >
> >
http://ccreweb.org/documents/programming/html/Forth-Presentation/text29.html
> >
> > (Note: The benchmark for kForth has improved considerably, 20 to 50%,
> > depending on the application, for the latest release).
>
> Your benchmark info is very useful. I might suggest that you provide
> not only the charts, but also the data printed. This allows for
> comparisons that are not clear on a chart with such a wide range of
> data. The bars for VFX and SwiftForth are both rather short and hard to
> compare.
>
I'd be interested in the benchmark data too; it appears that Win32Forth
didn't or couldn't run the matrix-mult benchmark, and I'd like to know what
version of Win32Forth was used for the test?
--
Regards
Alex McDonald
> ...
> Your benchmark info is very useful. I might suggest that you provide
> not only the charts, but also the data printed. This allows for
> comparisons that are not clear on a chart with such a wide range of
> data. The bars for VFX and SwiftForth are both rather short and hard to
> compare.
>
>
Tests run on 7 Jan. 2004
Bubble-Sort Matrix-Mult Sieve Fib
------------------------------------------------------
gcc 0.55 0.16 0.34 1.00
VFX Forth 1.01 0.42 0.81 0.64
iForth 1.18 0.54 0.63 1.17
SwiftForth 1.40 0.60 1.08 0.60
gforth-fast 3.7 1.7 2.7 3.6
gforth 4.1 2.1 3.0 4.1
Win32Forth 4.89 --- 3.73 5.22
kforth-fast 8.09 8.57 5.68 12.46
PFE 16.8 12.4 15.2 19.6
kforth 19.54 16.54 16.57 24.98
-----------------------------------------------------
Notes:
------
1) gforth, gforth-fast, PFE, and kforth-fast were timed under Linux
on a 330 MHz Pentium II. The others were timed under Windows 98,
running on the same machine (dual-boot system). The relative speed
of iForth to VFX Forth was communicated to me by Marcel Hendrix
and I used the supplied values to estimate the iForth
execution times on my platform.
2) Versions: gcc 3.3, VFX Forth 3.60, iForth 2.?, SwiftForth 2.2.2.9,
gforth-fast 0.6.0, gforth 0.6.0, Win32Forth 6.01,
kforth-fast (development version based on 1.0.14-2), PFE 0.32.94,
kforth 1.0.14-2
3) Win32Forth ran out of application memory when attempting to run
the matrix-mult benchmark.
KM
>
> I'd be interested in the benchmark data too; it appears that Win32Forth
> didn't or couldn't run the matrix-mult benchmark, and I'd like to know what
> version of Win32Forth was used for the test?
>
See my response to "rickman". Yes Win32Forth 6.01 failed to run the
matrix-mult benchmark, apparently because of dictionary overflow.
I didn't investigate to see if Win32Forth could be reconfigured to
have a larger dictionary space.
Cheers,
Krishna
The dictionary can be increased by running the meta-compiler. 6.01 is quite
an old version of W32F ( we're up to V6.08 on production versions and
V6.09.07 on beta ). The V6.09 versions feature a setup program which makes
it easy to re-meta-compile with a larger dictionary ( just follow the
instructions ) and use a fixed loadpoint which results in about 25-30 %
speed reduction ( probably a bit more with some other optimizations ).
George Hubert
| Bubble-Sort Matrix-Mult Sieve Fib
| ------------------------------------------------------
| gcc 0.55 0.16 0.34 1.00
| VFX Forth 1.01 0.42 0.81 0.64
| iForth 1.18 0.54 0.63 1.17
[..]
I protest. The algorithm used in the C code for gcc is (probably)
different from the Forth one. Then the comparison between gcc
and the rest is misleading.
-marcel
The C programs compiled by gcc were apparently
generated *from* the Forth code by using a program
called forth2c. If anything, the gcc results
given above are likely worse in execution time
than if the Forth algorithms were coded from
scratch in C. As an example, below is the C version
of the sieve benchmark. Compare it to the Forth
version, and it's hard to argue that this is somehow
optimized C code.
/* Generated by forth2c, (C) Martin Maierhofer 1994 */
/* size of data space, can be changed ! */
int _C_DATA_SIZE = 8192;
#include "forth.h" /* general forth definitions */
#include "forth2c.h" /* forth2c specific definitions */
Char flags[8212];
Char eflag[4];
Cell primes(void)
{
Cell _c_result;
Cell x0;
Cell x1;
Cell x2;
Cell x3;
Cell x4;
Cell r0;
Cell r1;
Cell r2;
Cell r3;
{
x0 = (Cell) (&flags[0]);
}
{ /* Literal */
x1 = 8190;
}
{ /* Literal */
x2 = 1;
}
{ /* fill */
UCell u;
Char *addr, c;
c = (Char) x2;
u = (UCell) x1;
addr = (Char *) x0;
memset(addr, c, u);
}
{ /* Literal */
x0 = 0;
}
{ /* Literal */
x1 = 3;
}
{
x2 = (Cell) (&eflag[0]);
}
{ /* fetch */
Cell *a;
a = (Cell *) x2;
x2 = *a;
}
{
x3 = (Cell) (&flags[0]);
}
{ /* do */
Cell bound, index;
index = x3;
bound = x2;
r0 = bound;
r1 = index;
}
label0:
{ /* i */
Cell n = r1;
x2 = n;
}
{ /* cfetch */
Char *a;
a = (Char *) x2;
x2 = (Cell) *a;
}
if (!x2) goto label1;
{ /* dup */
Cell n;
n = x1;
x1 = n;
x2 = n;
}
{ /* i */
Cell n = r1;
x3 = n;
}
{ /* plus */
Cell n1, n2, n;
n1 = x3;
n2 = x2;
n = n2 + n1;
x2 = n;
}
{ /* dup */
Cell n;
n = x2;
x2 = n;
x3 = n;
}
{
x4 = (Cell) (&eflag[0]);
}
{ /* fetch */
Cell *a;
a = (Cell *) x4;
x4 = *a;
}
{ /* less-than */
Cell n1, n2, n;
n1 = x4;
n2 = x3;
n = FLAG(n2 < n1);
x3 = n;
}
if (!x3) goto label2;
{
x3 = (Cell) (&eflag[0]);
}
{ /* fetch */
Cell *a;
a = (Cell *) x3;
x3 = *a;
}
{ /* swap */
Cell n1, n2;
n1 = x3;
n2 = x2;
x2 = n1;
x3 = n2;
}
{ /* do */
Cell bound, index;
index = x3;
bound = x2;
r2 = bound;
r3 = index;
}
label3:
{ /* Literal */
x2 = 0;
}
{ /* i */
Cell n = r3;
x3 = n;
}
{ /* cstore */
Char *a;
Cell c;
a = (Char *) x3;
c = (Char) x2;
*a = c;
}
{ /* dup */
Cell n;
n = x1;
x1 = n;
x2 = n;
}
{ /* +loop */
Cell step, index, bound;
index = r3;
bound = r2;
step = x2;
index += step;
if ((step > 0 && index < bound) ||
(step <= 0 && index >= bound))
{
r2 = bound;
r3 = index;
goto label3;
}
}
{ /* unloop */
}
goto label4;
label2:
{ /* drop */
}
label4:
{ /* swap */
Cell n1, n2;
n1 = x1;
n2 = x0;
x0 = n1;
x1 = n2;
}
{ /* one_plus */
Cell n;
n = x1;
x1 = n + 1;
}
{ /* swap */
Cell n1, n2;
n1 = x1;
n2 = x0;
x0 = n1;
x1 = n2;
}
label1:
{ /* Literal */
x2 = 2;
}
{ /* plus */
Cell n1, n2, n;
n1 = x2;
n2 = x1;
n = n2 + n1;
x1 = n;
}
{ /* loop */
Cell index, bound;
index = r1;
bound = r0;
index++;
if (index != bound)
{
r0 = bound;
r1 = index;
goto label0;
}
}
{ /* unloop */
}
{ /* drop */
}
{ /* exit */
_c_result = x0;
return (_c_result);
}
}
Cell benchmark(void)
{
Cell _c_result;
Cell x0;
Cell x1;
Cell x2;
Cell r0;
Cell r1;
{ /* Literal */
x0 = 0;
}
{ /* Literal */
x1 = 1000;
}
{ /* Literal */
x2 = 0;
}
{ /* do */
Cell bound, index;
index = x2;
bound = x1;
r0 = bound;
r1 = index;
}
label0:
{
Cell _C_locret;
_C_locret = primes();
x1 = _C_locret;
}
{ /* nip */
Cell n1, n2;
n2 = x1;
n1 = x0;
x0 = n2;
}
{ /* loop */
Cell index, bound;
index = r1;
bound = r0;
index++;
if (index != bound)
{
r0 = bound;
r1 = index;
goto label0;
}
}
{ /* unloop */
}
{ /* exit */
_c_result = x0;
return (_c_result);
}
}
void main(void)
{
Cell x0;
Cell x1;
{
x0 = (Cell) (&flags[0]);
}
{ /* Literal */
x1 = 8190;
}
{ /* plus */
Cell n1, n2, n;
n1 = x1;
n2 = x0;
n = n2 + n1;
x0 = n;
}
{
x1 = (Cell) (&eflag[0]);
}
{ /* store */
Cell *a, n;
a = (Cell *) x1;
n = x0;
*a = n;
}
{
Cell _C_locret;
_C_locret = benchmark();
x0 = _C_locret;
}
{ /* dot */
printf("%d ", x0);
fflush(stdout);
}
{ /* exit */
return;
}
}
-----------------
Forth version:
\ #! /usr/stud/paysan/bin/forth
DECIMAL
\ : SECS TIME&DATE SWAP 60 * + SWAP 3600 * + NIP NIP NIP ;
CREATE FLAGS 8190 ALLOT
variable eflag
\ FLAGS 8190 + CONSTANT EFLAG
: PRIMES ( -- n ) FLAGS 8190 1 FILL 0 3 EFLAG @ FLAGS
DO I C@
IF DUP I + DUP EFLAG @ <
IF EFLAG @ SWAP
DO 0 I C! DUP +LOOP
ELSE DROP THEN SWAP 1+ SWAP
THEN 2 +
LOOP DROP ;
: BENCHMARK 0 1000 0 DO PRIMES NIP LOOP ;
\ SECS BENCHMARK . SECS SWAP - CR . .( secs)
: main
flags 8190 + eflag !
benchmark .
;
\ HPPA/720, 50 MHz: user 3.90s
> Marcel Hendrix wrote:
>> Krishna Myneni <krishn...@compuserve.com> writes Re: Review of SwiftX and VFX Forth
[..]
>> I protest. The algorithm used in the C code for gcc is (probably)
>> different from the Forth one. Then the comparison between gcc
>> and the rest is misleading.
[..]
> The C programs compiled by gcc were apparently
> generated *from* the Forth code by using a program
> called forth2c. If anything, the gcc results
> given above are likely worse in execution time
> than if the Forth algorithms were coded from
> scratch in C. As an example, below is the C version
> of the sieve benchmark. Compare it to the Forth
> version, and it's hard to argue that this is somehow
> optimized C code.
Maybe Anton Ertl will care to make an authorative
comment here.
IMHO: check the dragon books. Using that enormous number
of variables and no pointers is exactly what a classical
C optimizer likes.
It's amazing that gcc can be *twice* faster than iForth
and VFX on the Sieve. I have looked very intensively into
that particular code and I couldn't improve iForth's output,
not even using in-line (or even pure) assembler. My experiments
with VC++ and the original(*) C Sieve code also didn't turn up
such large differences.
*) the algorithm used above is not the original but a variant
due to Bernd Paysan.
-marcel
It is the same algorithm, because the "C" program was generated
automatically by translating Forth source.
There are also manually-coded C programs for (mostly) the same
algorithms, and they resulted (on the 486) in very similar timings to
the translated code. Read all about the translator and the timings on
http://www.complang.tuwien.ac.at/papers/ertl%26maierhofer95.ps.gz
You can get the benchmarks (Forth code, forth2c-generated C code, and
manually written C code) at
http://www.complang.tuwien.ac.at/forth/bench.zip
and do your own timings and check the generated native code.
Somewhat updated results (from 1999, still on the 486) can be found at
http://www.complang.tuwien.ac.at/forth/performance.html
>IMHO: check the dragon books. Using that enormous number
>of variables and no pointers is exactly what a classical
>C optimizer likes.
At least it does not hurt the run-time of code produced by "gcc -O",
and it made the translator simpler (the idea was to generate it mostly
automatically from the Gforth primitive descriptions). It does not
make the code faster than using fewer variables and eliminating some
of the variable-to-variable copies. Gcc without optimization and
Borland C suck on the forth2c-generated code, however.
Representing the Forth stack as local C variables (i.e., hopefully
machine registers eventually) certainly results in faster code than
representing it as an array with a stack pointer.
>It's amazing that gcc can be *twice* faster than iForth
>and VFX on the Sieve. I have looked very intensively into
>that particular code and I couldn't improve iForth's output,
>not even using in-line (or even pure) assembler. My experiments
>with VC++ and the original(*) C Sieve code also didn't turn up
>such large differences.
Krishna Myeni produced the iforth numbers for his machine by scaling
some results you produced. This scaling can introduce an error
(especially since your results did not come from a P6-based CPU),
which could be an explanation for unexpected results. So better do
your own timings.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
> CREATE FLAGS 8190 ALLOT
> variable eflag
>
> : PRIMES ( -- n ) FLAGS 8190 1 FILL 0 3 EFLAG @ FLAGS
> DO I C@
> IF DUP I + DUP EFLAG @ <
> IF EFLAG @ SWAP
> DO 0 I C! DUP +LOOP
> ELSE DROP THEN SWAP 1+ SWAP
> THEN 2 +
> LOOP DROP ;
> flags 8190 + eflag !
This is pretty ugly code. Not readily apparent without study:
1) The 0 3 on the first line must be maintained inside both DO loops and to
the finish. The 0 is the count (# of primes found), the 3 is just re-used
again and again.
2) The count is incremented after the ELSE using "swap 1+ swap". The two
swaps are needed because the "3" is in the way. Stack juggling.
3) All prime numbers between 3 and 16,381 are found. Since all even numbers
greater than 2 are non-prime, an array that is half the size of this range
can be used as the "sieve". The indices for this array are not the prime
numbers themselves, but are indexes to the primes ( prime=(2*index)+3 ).
4) The basic flow of the sieve is to step through an array of 8190 flags,
one at a time. Only if the flag is set, then indicate that a prime has been
found by incrementing the counter and then use an inner loop to mark each
multiple of that prime as being non-prime by clearing the corresponding
flags in the array. The "0 i c!" after the second DO clears the flags.
5) The DUPs peppered about are needed to maintain the count and the "3" on
the stack. And the final DROP is needed to get rid of the "3" so the count
is left on the stack at the end.
6) Quick, what is the 2 being added to after the second THEN and why?
Actually, this code is better than that first presented by Jim Gibreath in
the Sept. 1981 Byte article. Gilbreath claimed to need help in codeing it
in Forth and he still got it wrong. He corrected it in the Jan 1983 Byte
follow-up article, but it was still very ugly. Gilbreath commented that he
"didn't have the kind of brain" that could write Forth code. I haven't the
kind of brain that likes this example. With stuff like that Byte article
it's no wonder Forth has gotten a bum rap over the years. IMHO
Personally, I would code it with local variables even if it meant a (likely
small) speed hit. There is more to programming than drag speed contests.
Regards,
-Doug
>It's amazing that gcc can be *twice* faster than iForth
>and VFX on the Sieve. I have looked very intensively into
>that particular code and I couldn't improve iForth's output,
>not even using in-line (or even pure) assembler. My experiments
>with VC++ and the original(*) C Sieve code also didn't turn up
>such large differences.
If the tests are being run on a P3 or P4 try enabling the
P4 optimisations with the +IDATA switch in VFX Forth. This
keeps data well away from code. Without this switch I have
seen a 7:1 variation in a benchmark that uses a single
variable. Most C compilers keep data well away from code.
Stephen
--
Stephen Pelc, steph...@INVALID.mpeltd.demon.co.uk
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeltd.demon.co.uk - free VFX Forth downloads
Actually, it is faster with local variables, using Mops on a G3 iMac 500MHz
(YMMV):
8190 constant size
variable flags size allot
: sieve { \ prime k index cnt -- cnt }
\ prime k index cnt all initialized to 0
flags size 1 fill
BEGIN
size index >
WHILE
index flags + c@
IF
index dup + 3 + -> prime
1 ++> cnt
index prime + -> k
BEGIN
size k >
WHILE
0 k flags + c!
prime ++> k
REPEAT
THEN
1 ++> index
REPEAT cnt ;
: BENCHMARK 1000 0 DO sieve drop LOOP ;
0.317 seconds with locals
0.350 seconds with original code
A possibly rewarding exercise is to start with the clearest code you can
write, regardless of speed, and optimize by stages.
Leo Wong
--
http://www.albany.net/~hello/
"Truth will sooner come out from error than from confusion."
-- Francis Bacon, quoted in Barzun and Graff, The Modern Researcher,
6th edition, p. 308.
Using Gforth-fast, this version takes about 14% longer:
variable pcount
: primes ( -- n )
3 { step }
flags 8190 1 fill
0 pcount !
eflag @ flags
do
i c@
if step i + dup eflag @ < ( step+i bool)
if
eflag @ swap
do
0 i c!
step
+loop
else drop
endif
1 pcount +!
endif
step 2 + to step
loop
pcount @ ;
While we're speaking of ugly code, the Rot13 site
http://www.miranda.org/~jkominek/rot13/ has these C programs:
int main ()
{
register char byte, cap;
for(;read (0, &byte, 1);)
{
cap = byte & 32;
byte &= ~cap;
byte = ((byte >= 'A') && (byte <= 'Z') ? ((byte - 'A' + 13) % 26
+ 'A') : byte) | cap;
write (1, &byte, 1);
}
}
main(a){while(a=~getchar())putchar(~a-1/(~(a|32)/13*2-11)*13);}
I think this Forth version is easier to understand:
13 constant eol
: rot13
begin key dup dup toupper dup ( c c upcased upcased)
'N 'Z 1+ within swap
'A 'N within ( c c 0|-1 0|-1 )
- 13 * ( c c offset) + emit ( c)
eol =
until cr ;
A table lookup version.
13 constant eol
create table 256 chars allot
: populate ( addition start end -- )
1+ swap do i over + i chars table + c! loop drop ;
0 0 255 populate
13 char A char M populate
-13 char N char Z populate
13 char a char m populate
-13 char n char z populate
: rotchar ( ch -- ch' ) chars table + c@ ;
: rot13 ( -- ) begin key dup rotchar emit eol = until cr ;
--
George Morrison | george morrison gedamo demon co uk
Aberdeen, Scotland | . @ . . .
I copied the Forth benchmark code and 4 Forth compilers
to three of my machines and ran all combinations.
I also compiled SIEV.C from the Forth2C distributions with MS VC6,
without definining any SYMBOLS, but setting VC's switches for maximum
speed (i.e. framepointer removal etc.)
I checked the ASM file for SIEVE.exe and it looked correct to me
(that is, every variable was kept in registers, code looked extremely
efficient, no trace of the 'virtual machine' left).
This is the result for the Gforth Sieve (with eflag):
GForth Sieve | P54C 166 MHz Athlon 900 MHz P4 3 GHz HT
----------------+-------------------------------------------------
mxForth 3.0 | 0.767 sec 0.156 sec 0.069 sec
MS VC6 max. opt | 0.880 sec 0.160 sec 0.060 sec
iForth 2.0 | 1.792 sec 0.328 sec 0.122 sec
VFX 3.40.0851 | 2.093 sec 0.411 sec 0.172 sec
GForth 0.6.1 | 6.101 sec 0.990 sec 0.411 sec
So indeed, C is two times faster than iForth and VFX on all my hardware.
However, mxForth is equally fast as C. (mxForth's model allows it to use
one more register than iForth).
I had to leave an unexpectedly large gap between the flags array and
PRIMES on the P4.
Also I found that the original Gilbreath Sieve is quite a lot faster
than the GForth variant on most other Forth/hardware combinations.
-marcel
-- ---------------------------------------------------------------------
[DEFINED] TIMER-RESET 0= [IF] INCLUDE compat.fth [THEN]
( selection -- )
ANEW -sieve
BASE @ DECIMAL
8190 CONSTANT size
CREATE flags size ALLOT
flags size + CONSTANT eflag
1024 ALLOT ALIGN
( sel base -- ) SWAP VALUE sel
sel 0= [IF] CR .( GForth Sieve)
: PRIMES ( -- n )
flags size 1 FILL
0 3
eflag flags DO I C@ IF DUP I + DUP
eflag < IF eflag SWAP DO 0 I C! DUP +LOOP
ELSE DROP
THEN SWAP 1+ SWAP
THEN 2 +
LOOP DROP ;
[THEN]
sel 1 = [IF] CR .( Gilbreath Sieve)
: PRIMES ( -- )
flags size 1 FILL
0 size 0 DO I flags +
C@ IF I 2* 3 +
DUP I +
BEGIN DUP size
U< WHILE 0 OVER flags + C!
OVER +
REPEAT
2DROP 1+
ENDIF
LOOP ;
[THEN]
sel 2 = [IF] CR .( GForth + local Sieve)
: PRIMES ( -- n )
3 LOCALS| step |
0 ( pcount)
flags size 1 FILL
eflag flags
DO I C@ IF step I + DUP eflag < ( step+i bool)
IF eflag SWAP DO 0 I C! step +LOOP
ELSE DROP
ENDIF
1+ ( pcount)
ENDIF
step 2 + TO step
LOOP ;
[THEN]
sel 3 = [IF] CR .( Mops Sieve)
: PRIMES ( -- u )
0 0 0 LOCALS| prime k index |
flags size 1 FILL
0 ( cnt)
BEGIN
size index >
WHILE
index flags + C@
IF index 2* 3 + TO prime
1+ ( cnt)
index prime + TO k
BEGIN size k >
WHILE 0 k flags + C!
k prime + TO k
REPEAT
THEN
index 1+ TO index
REPEAT ;
\ G3 iMac 500 MHz with MOPS
\ 0.317 seconds with locals
\ 0.350 seconds with original code
[THEN]
: BENCHMARK ( -- )
BASE @ >R DECIMAL
24 HTAB TIMER-RESET
0 1000 0 DO PRIMES NIP LOOP DROP
.ELAPSED
R> BASE ! ;
benchmark
BASE !
That's slow for Gforth-0.6 (bad compiler/flags?). With an optimally
compiled version (gcc-2.95.*, configure --enable-force-reg) and
gforth-fast you should get something like:
Gforth 0.6 | 0.49s 0.20s
>Also I found that the original Gilbreath Sieve is quite a lot faster
>than the GForth variant on most other Forth/hardware combinations.
Looking at the code, it appears that the "Gforth sieve" uses +LOOP to
reduce the number of words in the inner loop, which helps
threaded-code systems and probably also simple native-code compilers,
but not sophisticated native-code compilers like iforth/mxforth/VFX
(i.e.most other:-); the slowdown may come from the fact that the +LOOP
test is more complex than U<, or from other differences.
I am not sure where the "Gforth sieve" comes from. Probable sources
are Bernd Paysan, or Marty Fraemans Forth versions of the Stanford
benchmarks (bubble and matrix come from there).
> sel 3 = [IF] CR .( Mops Sieve)
> : PRIMES ( -- u )
> 0 0 0 LOCALS| prime k index |
> flags size 1 FILL
> 0 ( cnt)
> BEGIN
> size index >
> WHILE
> index flags + C@
> IF index 2* 3 + TO prime
> 1+ ( cnt)
> index prime + TO k
> BEGIN size k >
> WHILE 0 k flags + C!
> k prime + TO k
> REPEAT
> THEN
> index 1+ TO index
> REPEAT ;
>
> \ G3 iMac 500 MHz with MOPS
> \ 0.317 seconds with locals
> \ 0.350 seconds with original code
Thanks for that, Marcel.
The "2*" you used instead of the "dup +" I used sped it up a bit more to
0.300 seconds. More readable too. Keeping the count on the stack had no
measurable effect on the time, on my setup.
Regards,
-Doug
> m...@iae.nl (Marcel Hendrix) writes:
>> This is the result for the Gforth Sieve (with eflag):
>> GForth Sieve | P54C 166 MHz Athlon 900 MHz P4 3 GHz HT
> ...
>> GForth 0.6.1 | 6.101 sec 0.990 sec 0.411 sec
> That's slow for Gforth-0.6 (bad compiler/flags?). With an optimally
> compiled version (gcc-2.95.*, configure --enable-force-reg) and
> gforth-fast you should get something like:
> Gforth 0.6 | 0.49s 0.20s
[..]
I did all tests under Windows NT/2000/XP, so I used the windows
binary (see thread in gforth mailinglist when Gforth 0.6 came out).
AFAIK this one was made by Bernd Paysan. I don't have the tools
to compile Gforth for windows myself.
The speeds you show indicate that Gforth and VFX produce code that is
equally fast...
-marcel
\ Slow version
16384 CONSTANT #nums \ includes 1900 prime numbers
CREATE nums #nums ALLOT
\ Address of n
: num ( n -- a ) nums + ;
\ Is n a prime number?
: prime? ( n -- flag ) num C@ ;
\ Multiples of n are not prime
: -multiples ( n -- )
DUP BEGIN OVER + DUP #nums U<
WHILE FALSE OVER num C! REPEAT 2DROP ;
: primes-a ( -- u )
nums #nums TRUE FILL
0 #nums 2 DO I prime? IF I -multiples 1+ THEN LOOP ;
\ Somewhat faster
8190 CONSTANT size \ consider this many numbers only
CREATE flags size ALLOT
flags size + CONSTANT eflag
: squared ( n -- n*n ) DUP * ;
\ square root after Wil Baden
: sqrt ( n1 -- n2 )
0 TUCK DO 1+ DUP 2* 1+ +LOOP ;
size 2* sqrt 2/ CONSTANT maxfactor
\ Increment count of primes
: +primes ( n1 a -- n1+1 a ) >R 1+ R> ;
\ Return number represented by ith flag
: >n ( i -- n ) 2* 3 + ;
\ Reset flags of odd multiples of n
: -flags ( n -- )
>n eflag OVER squared 3 - 2/ flags +
DO FALSE I C! DUP +LOOP DROP ;
: primes-b ( -- u )
flags size TRUE FILL
1 flags maxfactor 0 DO COUNT IF +primes I -flags THEN LOOP
size maxfactor DO COUNT IF +primes THEN LOOP DROP ;
Leo Wong
http://www.albany.net/~hello/
"There is more to life than increasing its speed" -- Gandhi
How much nicer is 'c than [CHAR] c
\ rot13.f Leo Wong, June 28, 2004 +
: between ( n1 n2 n3 -- flag ) 1+ WITHIN ;
: rot13 ( c1 -- c2 )
DUP [CHAR] a [CHAR] m between IF 13 + EXIT THEN
DUP [CHAR] n [CHAR] z between IF 13 - EXIT THEN
DUP [CHAR] A [CHAR] M between IF 13 + EXIT THEN
DUP [CHAR] N [CHAR] Z between IF 13 - EXIT THEN ;
: test ( ca u )
CR 0 ?DO COUNT rot13 EMIT LOOP DROP CR ;
s" Ybeq Wrfhf Puevfg, unir zrepl ba zr, n fvaare." test
Leo Wong
http://www.albany.net/~hello/
He lived. He loved. He prayed.
There is a more efficient, better factored, and better commented
version on the Forth Freak Wiki:
http://www.forthfreak.net/wiki/index.cgi?SieveOfEratosthenes
You do extra work by counting and setting flags in the same loop. You
only need sqrt(n)/2 iterations to set flags for prime numbers up to n.
It is better to have seperate words for setting the flags, counting
the primes, and listing the primes.
Ian
Thanks,
> There is a more efficient, better factored, and better commented
> version on the Forth Freak Wiki:
> http://www.forthfreak.net/wiki/index.cgi?SieveOfEratosthenes
The only thing it lacks is a final test before production :-)
\ : clear-multiples ( prime start -- prime )
\ maxp bounds do 0 I c! dup +loop ;
This writes all over the end of the flags array.
As a quick hack, it can be fixed with:
: clear-multiples ( prime start -- prime )
flags maxp + swap
do 0 I c! dup +loop ;
I guess the Wiki stuff was tried on a system where DATA and CODE are in
very widely separated areas or even in different 'segments.' On iForth
it overwrote its own code.
> You do extra work by counting and setting flags in the same loop. You
> only need sqrt(n)/2 iterations to set flags for prime numbers up to n.
> It is better to have seperate words for setting the flags, counting
> the primes, and listing the primes.
The overwriting implies clear-multiples does too much work. On my system
the new Sieve is slightly slower than Gilbreath's original.
-marcel
: +fill ( startvalue startaddress n -- )
0 DO OVER I + OVER C! CHAR+ LOOP 2DROP ;
: enumerated
CREATE ( startvalue n -- ) HERE SWAP DUP CHARS ALLOT +fill
DOES> ( n -- a ) SWAP CHARS + ;
0 256 enumerated table
CHAR A 13 + CHAR A table 13 +fill
CHAR N 13 - CHAR N table 13 +fill
CHAR a 13 + CHAR a table 13 +fill
CHAR n 13 - CHAR n table 13 +fill
With Marcel's correction it is quite speedily on my machine.
Leo Wong
--
http://www.albany.net/~hello/
"All artists argue about their ways of performing. This is taken for
granted." -- Jacques Barzun, Simple & Direct
Nope, the 3 is incremented by 2 each time. It's 2i+3.
The code wasn't written by me, but came with volksForth. I didn't change it
to allow comparison (if any, there were only modest changes). There are
several important mistakes. First of all, EFLAGS should not be a variable,
but a constant (it points to the end of the array). Then, the loop should
go to the square root of the last number (16k, so 128, which means "first
64 entries"). Counting the number of primes found should be separated. The
inner EFLAGS comparison is completely superfluous (when getting the maximum
index right). There's no factoring, but this function is screaming for
factoring:
Create flags 8190 allot
here Constant eflags
: init-flags ( -- ) flags 8190 1 fill ;
: clear-multiples ( n addr -- n )
eflags swap DO 0 I c! dup +LOOP ;
\ or: BEGIN 0 over c! over + dup eflags u>= UNTIL drop ;
: count-primes ( -- n )
0 eflags flags DO I c@ + LOOP ;
: list-primes ( -- ) 2 .
3 eflags flags DO I c@ IF dup . THEN 2 + LOOP drop ;
: find-primes ( -- ) 3 flags 64 bounds DO
I c@ IF dup I + clear-multiples THEN 2 +
LOOP drop ;
: primes ( -- 1899 ) init-flags find-primes count-primes ;
: benchmark 1000 0 DO primes drop LOOP ;
Note that this is not that much of an improvement, since the clear-multiples
still takes most of the time for the smaller numbers. Since many larger
numbers have several small factors, this is not a really efficient way to
do sieve.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Thanks for the fix! This was a leftover from replacing the pointer
eflags with the limit maxp. The broken code worked on GForth
(amazingly). I have updated the wiki page.
> > You do extra work by counting and setting flags in the same loop. You
> > only need sqrt(n)/2 iterations to set flags for prime numbers up to n.
> > It is better to have seperate words for setting the flags, counting
> > the primes, and listing the primes.
>
> The overwriting implies clear-multiples does too much work. On my system
> the new Sieve is slightly slower than Gilbreath's original.
>
> -marcel
Interesting! Does that imply that the overhead of a second loop to
count the flags is higher than the extra n-sqrt(n) calls to
clear-multiples? Just goes to show that profiling trumps theory.
Ian
> m...@iae.nl (Marcel Hendrix) wrote in message news:<8386342...@frunobulax.edu>...
>> ia...@quirkster.com (Ian Osgood) writes Re: Sieve (was Re: Review of SwiftX and VFX Forth)
>> > There is a more efficient, better factored, and better commented
>> > version on the Forth Freak Wiki:
>> > http://www.forthfreak.net/wiki/index.cgi?SieveOfEratosthenes
[..]
>> The overwriting implies clear-multiples does too much work. On my system
>> the new Sieve is slightly slower than Gilbreath's original.
> Interesting! Does that imply that the overhead of a second loop to
> count the flags is higher than the extra n-sqrt(n) calls to
> clear-multiples? Just goes to show that profiling trumps theory.
Apparently, yes.
This, however, is twice faster ( 607 ms instead of 1349 ms for 1000 times
on a P54c 166 MHz under Windows NT).
( ./examples/benchmar/sieveros.frt; Douglas Ross, FD Vol XII, No 2, page 6 )
#1000 constant #times
#90 constant root
#8190 constant size
CREATE flags size ALLOT
: DO-PRIME
flags size 1 FILL
root 1 DO flags I +
C@ IF I 2* 1+
DUP 2* SWAP DUP *
BEGIN DUP size 2*
U< WHILE 0 OVER 2/ flags + C!
OVER +
REPEAT
2DROP
ENDIF
LOOP ;
: PRIMES CR #times DEC. ." iterations." TIMER-RESET
#times 0 DO DO-PRIME LOOP
CR .ELAPSED ;
: #PRIMES DO-PRIME
1 size 1 DO flags I + C@ + LOOP . ." primes." ;
: .PRIMES 1 LOCAL cnt \ <size> --- <>
DEPTH 0= ABORT" No limit"
DUP size U> ABORT" above SIZE "
DO-PRIME
CR 2 8 U.R
1 DO
flags I + C@ IF I 2* 1+ 8 U.R
cnt 8 = IF CR CLEAR cnt
ELSE 1 +TO cnt
ENDIF
ENDIF
LOOP ;
-marcel
I don't like all those people tweaking the original benchmark to fit
their compiler or their coding habits. It also doesn't matter a bit
whether it is ugly code.
I would like to have a version that could be a model of good factoring,
and then see how it fares one different optimising compilers.
First decide on it. Then run it. Never allow to run changed versions.
Or at least hit everybody on the head when they are presenting benchmark
results of changed versions.
(I tried something like that with a recursion benchmark, where the
copyright tries to forbid distributing without notice of changed
versions.)
>Leo Wong
Groetjes Albert
--
--
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
One man-hour to invent,
One man-week to implement,
One lawyer-year to patent.
I'm sorry to see that no one has responded to your real question.
Benchmarks involving running the Sieve on SwiftForth and other Windows
Forths have no bearing at all on your 68HC12 question! Not to mention that
performance benchmarks are only one measure of the usefulness of a Forth
product: ease of use, compactness of code, and library support are surely
also important issues.
I invite you to look at our web site www.forth.com for more information of
SwiftX, if you haven't already... you'll find some more information there,
and you can also order a $15 SwiftX evaluation CD that contains a lot of
detailed information, including a complete documentation set, and demo
versions of SwiftX for the 68HC11 and other targets.
Cheers,
Elizabeth
--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310-491-3356
5155 W. Rosecrans Ave. #1018 Fax: +1 310-978-9454
Hawthorne, CA 90250
http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
==================================================
> How much nicer is 'c than [CHAR] c
If one were using F83 by LaXen & Perry, this would work:
=========================================================
1 width !
\ Since width is 1, only the first letter of the name and its
\ length will be stored.
: 'a ( -- ascii)
>in @ 1-
blk @ ?dup ( Are we loading a file?)
if block
else
tib ( No, input is from keyboard.)
then +
begin
dup c@ BL =
while
1-
repeat
c@
state @ if [compile] literal then
; immediate
31 width !
\ Now when Forth parses 'B or 'x or '= or any two-letter name
\ beginning with ' , the word we just defined will be executed.
===============================================================
Of course, this won't work when using an ANS Forth.
This creates a set of constants, 'A 'B etc.
: $'char s" char ! constant '!" ;
: ('char) $'char drop 2dup 5 chars + c! 17 chars + c! ;
: 'char 127 33 do i ('char) $'char evaluate loop ;
Of course, this won't work if your Forth is case insensitive (no
difference between 'A and 'a).
This c-ism is ill-advised in Forth, where ' is associated with
execution tokens, and 'something is conventionally either an execution
token, or an address that contains one.
I dislike
db char x char a char c 4
in a disassembler, (where it would run for pages on end),
but replace it with
db &x &a &c 4
which is quite common too.
"It's a fair cop guv, but society is to blame."
"Agreed. We will be charging them too."
-- Monty Python
arrest, v.t. Formally to detain one accused of unusualness.
-- Ambrose Bierce
For 1000 times counting the primes up to 16384 (or so, defining maxp
as 8191), it takes 0.15s instead of 0.25s with gforth-fast on a
2.26GHz Pentium 4. Very nice. I just wonder why it counts 1900
primes while the other version counts 1899.
In any case, I have put up the program with SQRT, changed maxp, and a
BENCHMARK word at
http://www.complang.tuwien.ac.at/forth/sieve-factored.fs
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
EuroForth 2004: http://www.complang.tuwien.ac.at/anton/euroforth2004/
Deadline (refereed): August 28, 2004; Conference: November 19-21, 2004
Ok, I have now tested under Windows ME, where the gforth-fast.exe from
our 0.6.2 package is really slow (0.5s-0.8s real time for siev.fs on a
2GHz Athlon 64), no CPU times available), and then taken the same
binary to a W2K machine, where I get 0.4s user time on a 2GHz Athlon
MP. More precisely, for the small benchmarks I get (user time):
siev bubble matrix fib
0.405 0.452 0.249 0.499 2GHz Athlon MP, gcc-3.2, W2K
0.29 0.34 0.14 0.37 2GHz Athlon MP, gcc-3.2, Linux
0.22 0.33 0.14 0.35 2GHz Athlon MP, gcc-2.95.1, Linux
So, for a good part Windows (or Cygwin) seems to be responsible for
the slowness. I wonder what's going on there; differences in system
time would not be a big surprise, but I have no explanation for
differences in user time.
Anyway, if I scale the 0.405s to a 900MHz machine, you should still be
seeing a slightly faster result with the more recent
http://www.complang.tuwien.ac.at/forth/gforth/gforth-0.6.2.exe
package.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
If that is any help: a small sloppy C program of mine, not using a sieve
but a simple loop and primality test (by trial division), finds exactly
1900 primes lower than 16384. My guess is that the programs which find
only 1899 primes forget the number "2".
--Thomas Pornin
It depends on the stated conditions. The Sept. 1981 Byte Sieve article by
Jim Gilbreath, which may have popularized the Sieve as a benchmark, stated
the conditions as: "... which finds all of the prime numbers between 3 and
16,381."
Regards,
-Doug
Maybe a difference in the C compilers (or their associated libraries)
used to create GForth?
--
Julian V. Noble
Professor Emeritus of Physics
j...@lessspamformother.virginia.edu
^^^^^^^^^^^^^^^^^^
http://galileo.phys.virginia.edu/~jvn/
"For there was never yet philosopher that could endure the
toothache patiently."
-- Wm. Shakespeare, Much Ado about Nothing. Act v. Sc. 1.
As indicated above, for the first two lines the C compilers are the
same; I have also checked the code of some primitives, and it is the
same.
Since hardly any time is spent in libraries in these benchmarks, they
should not matter; and anyway, I am pretty sure that the Cygwin
library is derived from the glibc that's also used on Linux.
I agree that the generic (non language specific) algorithm should not
be changed for benchmarks. But I see nothing wrong with recoding that
algorithm. The same algorithm can be coded several different ways, at
least with Forth. For example: using locals vs stack gymnastics.
>It also doesn't matter a bit whether it is ugly code.
When the code gets published in a widely read magazine (as was Byte at
one time) it can leave a lasting positive or negative impression with
those not familiar with Forth.
Sorry for the late response but I only just now noticed this.
Regards,
-Doug