Paul Rubin <no.e...@nospam.invalid> writes Re: Permutation Tensor
My hypothesis was wrong, the multiplication isn't important.
I completely rewrote the test because the loop overhead was too high
with respect to the work done by the lcsymbol code. With the new test,
all non-local implementations have approximately the same speed, and the
locals add a sizable overhead (because iForth doesn't inline words that
use locals.)
Also interesting: lcsymbol4 is very fast, but only for 64 bit code where
more free registers are available to the code generator.
-marcel
-- ----------
ANEW -lcsymbol
\ For i,j,k in {1,2,3}, returns n = -1, 0, or 1
: lcsymbol ( i j k -- n )
>r >r 10 * r> + 10 * r> +
case
123 of 1 endof
231 of 1 endof
312 of 1 endof
132 of -1 endof
213 of -1 endof
321 of -1 endof
0 swap
endcase
;
4 BASE !
: lcsymbol2 ( i j k -- n )
swap 2 lshift or swap 10 lshift or
case
123 of 1 endof
231 of 1 endof
312 of 1 endof
132 of -1 endof
213 of -1 endof
321 of -1 endof
0 swap
endcase ;
DECIMAL
1 VALUE a
2 VALUE b
3 VALUE c
CREATE tab #64 CHARS ALLOT tab #64 CONST-DATA
MARKER -tab
: cases ( ix -- val )
case [ 4 base ! ]
012 of 2 endof
120 of 2 endof
201 of 2 endof
021 of 0 endof
102 of 0 endof
210 of 0 endof
[ decimal ]
1 swap
endcase ;
: lcsymbolX ( i j k -- n )
1-
swap 1- 2 lshift or
swap 1- 4 lshift or cases 1- ;
:NONAME tab #64 0 DO I cases OVER C! CHAR+ LOOP drop ; EXECUTE -tab
\ For i,j,k in {1,2,3}, returns n = -1, 0, or 1
: lcsymbol3 ( i j k -- n ) 1- swap 1- 2 lshift or swap 1- 4 lshift or tab + C@ 1- ;
create lctable 1639701 , 1316133 , 1381905 , lctable 3 cells CONST-DATA
\ i, j, k are zero based!
: lcsymbol4 ( i j k -- n )
cells lctable + @ \ return the k^th tensor plane
>r
3 lshift over 2* + nip \ compute the bit offset
r> swap rshift 3 and 1-
;
: eps params| i j k | ( i j k -- n ) i j - j k - * k i - * 2/ ;
: eps2 LOCALS| k j i | ( i j k -- n ) i j - j k - * k i - * 2/ ;
: weps params| i j k | ( i j k -- n ) i j - j k - + k i - + 2/ ;
: weps2 LOCALS| k j i | ( i j k -- n ) i j - j k - + k i - + 2/ ;
: TEST ( n1 -- n2 ) 0 SWAP 0 DO a b c lcsymbol + a b c lcsymbol + a b c lcsymbol + a b c lcsymbol + LOOP drop ;
: TEST2 ( n1 -- n2 ) 0 SWAP 0 DO a b c lcsymbol2 + a b c lcsymbol2 + a b c lcsymbol2 + a b c lcsymbol2 + LOOP drop ;
: TEST3 ( n1 -- n2 ) 0 SWAP 0 DO a b c lcsymbol3 + a b c lcsymbol3 + a b c lcsymbol3 + a b c lcsymbol3 + LOOP drop ;
: TEST4 ( n1 -- n2 ) 0 SWAP 0 DO a b c lcsymbol4 + a b c lcsymbol4 + a b c lcsymbol4 + a b c lcsymbol4 + LOOP drop ;
: TESTe ( n1 -- n2 ) 0 SWAP 0 DO a b c eps + a b c eps + a b c eps + a b c eps + LOOP drop ;
: TESTe2 ( n1 -- n2 ) 0 SWAP 0 DO a b c eps2 + a b c eps2 + a b c eps2 + a b c eps2 + LOOP drop ;
: TESTwe ( n1 -- n2 ) 0 SWAP 0 DO a b c weps + a b c weps + a b c weps + a b c weps + LOOP drop ;
: TESTwe2 ( n1 -- n2 ) 0 SWAP 0 DO a b c weps2 + a b c weps2 + a b c weps2 + a b c weps2 + LOOP drop ;
: TOPTEST ( u -- )
2/ 2/ 1 UMAX LOCAL #times
CR ." \ lcsymbol : " timer-reset #times TEST .elapsed
CR ." \ lcsymbol2 : " timer-reset #times TEST2 .elapsed
CR ." \ lcsymbol3 : " timer-reset #times TEST3 .elapsed
CR ." \ lcsymbol4 : " timer-reset #times TEST4 .elapsed
CR ." \ eps : " timer-reset #times TESTe .elapsed
CR ." \ eps2 : " timer-reset #times TESTe2 .elapsed
CR ." \ weps : " timer-reset #times TESTwe .elapsed
CR ." \ weps2 : " timer-reset #times TESTwe2 .elapsed ;
\ 1,000,000,000 toptest ( 64bit code)
\ lcsymbol : 2.495 seconds elapsed.
\ lcsymbol2 : 2.217 seconds elapsed.
\ lcsymbol3 : 2.075 seconds elapsed.
\ lcsymbol4 : 1.778 seconds elapsed.
\ eps : 2.281 seconds elapsed.
\ eps2 : 7.390 seconds elapsed.
\ weps : 2.307 seconds elapsed.
\ weps2 : 7.264 seconds elapsed. ok
\ 1,000,000,000 toptest ( 32bit code)
\ lcsymbol : 2.400 seconds elapsed.
\ lcsymbol2 : 2.093 seconds elapsed.
\ lcsymbol3 : 2.069 seconds elapsed.
\ lcsymbol4 : 4.320 seconds elapsed.
\ eps : 2.265 seconds elapsed.
\ eps2 : 7.352 seconds elapsed.
\ weps : 2.298 seconds elapsed.
\ weps2 : 7.234 seconds elapsed. ok