Mark Wills wrote:
>
> It uses locals. From some quick experiments, locals on a dumb (non-
> optimising) ITC makes things faster. The reason? All the stack juggling
> goes away. Those ROTs DUPs SWAPs etc are burning up time just getting
> the stack arranged. Locals on a dumb system takes that away. The
> argument doesn't hold on modern optimising systems.
I implemented TurboForth locals for LMI PC/Forth 3.2 (8086 ITC)
and ran a benchmark based on MOVBLK with and without locals.
A Pentium PC was used for the test.
Speed (I/O and CMOVE code replaced with 2DROP)
locals without locals
63 sec 52 sec
Code size (less headers)
locals without locals
122 bytes 92 bytes
Using an 8086 DTC forth
locals without locals
21 sec 9 sec
The benchmark used for the test is shown below. I'll post the source for
the LMI TurboForth locals if anyone wants it and you have no objection.
Your website doesn't mention whether derivative works are permitted.
13
0 \ TEST2
1
2 : bcopy ( src dest -- )
3 \ CR ." Copy " SWAP U. ." to " U.
4 SWAP BLOCK SWAP BUFFER 1024 CMOVE UPDATE FLUSH
5 ;
6 : movblk1 ( src dest cnt -- )
7 ?DUP IF
8 SWAP 2 PICK - >R OVER +
9 R@ 0< IF SWAP 1 ELSE 1- -1 THEN
10 R> 2SWAP DO I 2DUP + 2DROP ( bcopy) OVER +LOOP
11 THEN 2DROP ;
12
13
14
15
14
0 \ TEST2
1
2 : movblk2 ( src dest cnt -- )
3 LOCALS{ src dest cnt cpydir }
4 SET cnt SET dest SET src
5 1 SET cpydir src dest < IF
6 cnt 1- +SET src cnt 1- +SET dest -1 SET cpydir
7 THEN ( flush? ) cnt 0 ?DO
8 \ 2 SPACES src . ." to " dest .
9 \ src block drop 0 dest setblk 0 dirty flush
10 \ src BLOCK dest BUFFER 1024 CMOVE UPDATE FLUSH
11 src dest 2DROP
12 cpydir +SET src cpydir +SET dest
13 LOOP ;
14
15
15
0 \ TEST2
1 : T1 7 EMIT 20 0 DO
2 0 0 DO 10 12 5 movblk1 12 10 5 movblk1 LOOP
3 LOOP 7 EMIT ;
4
5 : T2 7 EMIT 20 0 DO
6 0 0 DO 10 12 5 movblk2 12 10 5 movblk2 LOOP
7 LOOP 7 EMIT ;
8
9
10
11
12
13
14
15