Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Mersenne twister example code in x86 asm

237 views
Skip to first unread message

̇ęń.com Ray M. Ransom

unread,
Jul 2, 2004, 5:05:36 PM7/2/04
to
I needed an assembly implementation of the Mersenne twister pseudo-random
sequence generator. All I found on the Internet was the original horrid C
code replicated a million times and ported to assembly using the C compiler
"-S" method.

To get a concise and understandable version, I had to write it myself. This
is the base 32-bit, 624-sample version. Unlike previous versions I have
seen, it never pauses to pre-compute a block of samples. Each call computes
the 624th future sample and stores it in the array over the sample it has
just read. The twister code is 163 bytes and needs 626 dwords of data
storage. Output has been validated against the horrid C code on Makoto
Matsumoto' site.

The example code below contains the Mersenne subroutines, wrapped in my basic
Windoze console program template (available from
http://www.micosyen.com/winasm.php).

I am posting this here to serve as an example of a viable Windoze assembly
program. It took less than five minutes to incorporate the twister
subroutines into the template and generate the program. You couldn't do
better in C. I hope this is of some use to somebody.

=============================================================
.486 ;07/01/04
.model flat
locals
.code

;Windoze structures
COORD struc
x DW ?
y DW ?
COORD ends

CHARTYPE union
UnicodeChar DW ?
AsciiChar DB ?
CHARTYPE ends

KEY_EVENT_RECORD struc
wEventType DD ?
bKeyDown DD ?
wRepeatCount DW ?
wVirtualKeyCode DW ?
wVirtualScanCode DW ?
uChar CHARTYPE <>
dwControlKeyState DD ?
KEY_EVENT_RECORD ends

MOUSE_EVENT_RECORD struc
wEventType DD ?
dwMousePosition COORD <>
dwButtonState DD ?
MdwControlKeyState DD ?
dwEventFlags DD ?
MOUSE_EVENT_RECORD ends

;Constants
FROM_LEFT_1ST_BUTTON_PRESSED EQU 1
STD_INPUT_HANDLE EQU -10
STD_OUTPUT_HANDLE EQU -11

;Global variables (in BSS)
VARBLE struc
CONHAN DD ? ;Console screen buffer handle
KEYHAN DD ? ;Standard input handle
KEYCNT DD ? ;Input event count
CFLG DB ? 00000000B ;Command line flags
; |_____________ Invoked by Windoze
VARBLE ends

START: MOV EBP,offset VARBAS ;Point to variable base
XOR EAX,EAX
MOV ECX,offset BSSEnd ;Point to BSS end+1
MOV EDI,offset BSSBase ;Point to BSS base
SUB ECX,EDI ;Calculate BSS size
SHR ECX,2 ;Form dword fill pointer
REP STOSD ;Clear entire BSS
MOV ESI,offset ERRMSG ;Point to environment 'CMDLINE'
buffer
PUSH 256 ;Buffer size
PUSH ESI ;Buffer pointer
PUSH offset CmdLineStr ;Environment string pointer
CALL GetEnvironmentVariableA ;Get environment 'CMDLINE' value
MOV EAX,[ESI] ;Get 1st 4 chars of returned string
AND EAX,5F5F5F5FH ;Map to upper case
CMP EAX,0+'NIW' ;'WIN',0?
JNZ short @@1 ;No (Windoze 98 test
only)
OR [EBP+CFLG],00000001B ;Set "launched by Windoze" flag
@@1: CALL GetCommandLineA ;Get command line pointer
XCHG EAX,EDI ;Command line pointer to EDI
MOV ESI,EDI ;Command line pointer to ESI
SCHQUO: LODSB ;Get a command line character
TEST AL,AL ;EOS?
JZ short SETCON ;Yes
CMP AL,'"' ;Quote?
JNZ short SCHQUO ;No
OR [EBP+CFLG],00000001B ;Set "launched by Windoze" flag
SETCON: PUSH STD_OUTPUT_HANDLE
CALL GetStdHandle ;Get standard output handle
MOV [EBP+CONHAN],EAX ;Store console output buffer handle
PUSH STD_INPUT_HANDLE
CALL GetStdHandle ;Get standard input handle
MOV [EBP+KEYHAN],EAX ;Store standard input handle
CALL WINATR ;Set black on white if run by Windoze

;Example code begins here
; CALL GetTickCount ;Get system tick count
; BSWAP EAX
; XOR EAX,EBP ;Create initial default seed

MOV EAX,12345678H ;Initial seed
CALL InitMersenne ;Initialize Mersenne twister
MOV ECX,10000
GETRND: CALL MersenneTwist ;Get pseudo-random number
XCHG EAX,EBX ;Random number to EBX
CALL PRHEXD ;Print random number
CALL DMS
DB ' ',' '+80H ;Print 2 spaces
LOOP GETRND
;Example code ends here

TEST [EBP+CFLG],0000001B ;Invoked by Windoze?
JZ short KEYDWN ;No
MOV AL,3EH
CALL SETATR ;Set screen attribute
CALL DMS
DB ' Type any key or click in window to clos','e'+80H
MOV ESI,offset KEYBUF ;Point to input event buffer
LEA EDI,[EBP+KEYCNT] ;Pnt to input event count ret
variable
MOV dword ptr[EDI],1 ;Set key event count
WTKEY: PUSH EDI ;Input buffer event read count
pointer
PUSH 1 ;Input event buffer size
PUSH ESI ;Console input event buffer pointer
PUSH dword ptr[EBP+KEYHAN] ;Standard input handle
CALL ReadConsoleInputA ;Read console input event
MOVZX EAX,byte ptr[ESI+wEventType] ;Get input event index
DEC EAX ;Keyboard event?
JNZ short TSMOUS ;No
CMP dword ptr[KEYBUF+bKeyDown],0 ;Key down?
JZ short WTKEY ;No
JMP short KEYDWN
TSMOUS: DEC EAX ;Mouse event?
JNZ short WTKEY ;No
TEST byte ptr[ESI+dwButtonState],FROM_LEFT_1ST_BUTTON_PRESSED
JZ short WTKEY ;No
KEYDWN: XOR EDI,EDI ;Exit code
TRMNAT: PUSH EDI ;Exit code
JMP ExitProcess ;Terminate

;subr
CRDMS: CALL PRCRLF ;CR/LF
DMS: XCHG ESI,[ESP] ;Save ESI, get string return address
CALL PRTSTR ;Print string
XCHG ESI,[ESP] ;Save return address, restore ESI
RET

PRTSTR: LODSB ;Get a character
AND AL,7FH ;Mask off "end" flag
JZ short PSTRET ;EOS
CALL CONOUT ;Print character
CMP byte ptr[ESI-1],80H ;End?
JB short PRTSTR ;No
PSTRET: RET

PRHEXW: PUSH ECX
XOR ECX,ECX
MOV CL,4 ;Digit count
SHL EBX,16 ;Left justify output word
JMP short PRHEXZ
PRHEXD: PUSH ECX
XOR ECX,ECX
MOV CL,8 ;Digit count
PRHEXZ: ROL EBX,4 ;Right justify hi digit
MOV AL,BL ;Digit to AL
CALL PRDIG ;Print digit
LOOP PRHEXZ
POP ECX
RET

PRHEXB: PUSH EAX ;Save output byte
SHR AL,4 ;Right justify hi digit
CALL PRDIG ;Print hi digit
POP EAX ;Restore output byte
PRDIG: AND AL,0FH ;Mask for low nibble
ADD AL,90H
DAA
ADC AL,40H
DAA ;Convert to ASCII
JMP short CONOUT
PRCRLF: MOV AL,0DH
CALL CONOUT ;CR
MOV AL,0AH ;LF
CONOUT: PUSH ECX
PUSH EDX
PUSH EAX ;Save output character
MOV ECX,ESP ;Point to saved character
PUSH EAX ;Create temp chars-written return var
MOV EAX,ESP ;Temp variable pointer to EAX
PUSH 0 ;Reserved
PUSH EAX ;Chars-written return variable
pointer
PUSH 1 ;Character count
PUSH ECX ;Character or string pointer
PUSH [EBP+CONHAN] ;Console screen buffer handle
CALL WriteConsoleA ;Write character to console
POP EAX ;Throw away temporary variable
COTDON: POP EAX
POP EDX
POP ECX
RET

WINATR: MOV AL,0F0H
SETATR: TEST [EBP+CFLG],0000001B ;Invoked by Windoze?
JZ short @@1 ;No
PUSH ECX
PUSH EAX ;Screen attribute
PUSH [EBP+CONHAN] ;Console buffer handle
CALL SetConsoleTextAttribute
POP ECX
@@1: RET

;Mersenne Twister code begins here
N EQU 624
M EQU 397

;Randomize with initial seed in EAX
InitMersenne:
XOR ECX,ECX ;Clear count/array index
MOV EDI,offset mt ;Point to Mersenne table
MOV [EDI+Nindex-mt],ECX ;Clear current position index
MOV dword ptr[EDI+Mindex-mt],M ;Initialize recirculate index
STOSD ;Store seed at mt[0]
NITMER: INC ECX ;Adjust array index
MOV EBX,EAX ;Previous seed value to EBX (x)
SHR EBX,30 ;x>>30
XOR EAX,EBX ;x ^ (x>>30)
IMUL EAX,1812433253 ;((x ^ (x>>30))*1812433253
ADD EAX,ECX ;((x ^ (x>>30))*1812433253+i
STOSD ;Store seed table value
CMP ECX,N-1 ;Done?
JB short NITMER ;No
RET

;Return 32-bit pseudo-random number in EAX
MersenneTwist:
PUSH EBX
PUSH ECX
PUSH EDX
PUSH ESI
PUSH EDI
MOV ECX,offset mt ;Point to Mersenne table
MOV ESI,[ECX+Nindex-mt] ;Get current position index
MOV EDI,[ECX+Mindex-mt] ;Get recirculate position index
MOV EDX,[ECX+ESI*4] ;Get next raw output dword
PUSH EDX ;Save next raw output dword
LEA EBX,[ESI+1] ;Index to next position
CMP EBX,N ;Past table end?
JB short @@1 ;No
XOR EBX,EBX ;Wrap next index to table start
@@1: MOV EAX,[ECX+EBX*4] ;Get mt[Nindex+1]
SHL EAX,1 ;Throw away high dword top bit
SHL EDX,1 ;Lo dword top bit to C flag
RCR EAX,1 ;Combine two dwords (y)
SHR EAX,1 ;y >> 1 (lsb = 0?)
JNC short @@2 ;Yes
XOR EAX,9908B0DFH
@@2: XOR EAX,[ECX+EDI*4] ;mt[m] ^ y >> 1
MOV [ECX+ESI*4],EAX ;Write new seed value at cur position
INC EDI ;Adjust recirculate index
CMP EDI,N ;Past table end?
JB short @@3 ;No
XOR EDI,EDI ;Wrap next position to table start
@@3: MOV [ECX+Nindex-mt],EBX ;Update current position index
MOV [ECX+Mindex-mt],EDI ;Update recirculate position index
POP EAX ;Restore raw output dword
MOV EBX,EAX
SHR EAX,11
XOR EAX,EBX ;y ^= (y >> 11);
MOV EDX,EAX
SHL EAX,7
AND EAX,9D2C5680H
XOR EAX,EDX ;y ^= (y << 7) & 0x9d2c5680UL;
MOV EBX,EAX
SHL EAX,15
AND EAX,0EFC60000H
XOR EAX,EBX ;y ^= (y << 15) & 0xefc60000UL;
MOV EDX,EAX
SHR EAX,18
XOR EAX,EDX ;y ^= (y >> 18);
POP EDI
POP ESI
POP EDX
POP ECX
POP EBX
RET

.data?

BSSBase:
Nindex DD ? ;Current position index
Mindex DD ? ;Recirculate table index
mt DD N DUP(?) ;Mersenne table
;Mersenne twister code end

.data

CmdLineStr DB 'CMDLINE',0 ;Environment variable name

.data?

VARBAS VARBLE <> ;Global variables
KEYBUF KEY_EVENT_RECORD <> ;Input event buffer
ERRMSG DB 256 DUP(?) ;Error message buffer

.code
CodEnd:
.data
DatEnd:
.data?
BSSEnd:

extrn ExitProcess : Proc
extrn GetCommandLineA : Proc
extrn GetEnvironmentVariableA : Proc
extrn GetStdHandle : Proc
extrn GetTickCount : Proc
extrn ReadConsoleInputA : Proc
extrn SetConsoleTextAttribute : Proc
extrn WriteConsoleA : Proc
end START


wolfgang kern

unread,
Jul 3, 2004, 8:26:05 AM7/3/04
to

Ray M. Ransom posted useful code.

Thanks,
I like your coding style. ;)


Here is my small 1..32-bit random-generator:
(good for games, but not recommeded for table creation)
---
RANDOM:
MOV ecx [max_bits] ;limit to needs ie: 7 for a dice,(2^n-1)
RDTSC ;ignore edx
MOV edx,eax
XOR eax,eax
L1: ;reverse bit order
SHR edx ;LSB->Cy becomes:
RCL eax ;MSB<-Cy
DEC ecx ;
JNZ L1 ;
;scale it [ie: 0...7]
MOV cl,[divider] ;2^n ie: 3 (/8)
MUL [range] ;scale ie:*6 [ 0..42]
SHR eax,cl ; /8 [ 0...5]
INC eax ;now we got: [ 1...6]
MOV [RAND],eax ;and that's it.
RET
---
For 'odd' ranges the slower FMUL/FDIV scaling can be used instead also.

Of course this is not a mathematical correct randomness,
but an astonishing good statistical distribution if called on
user events as mouse-moves or key-strokes.

Even if called consecutive in a loop the windoze
mulitasking will change the cache-lines that often,
that the clock-count wont be reproducable.
[the only advantage of windoze I know :)]

__
wolfgang

The Wannabee

unread,
Jul 3, 2004, 9:57:30 PM7/3/04
to
På Sat, 3 Jul 2004 14:26:05 +0200, skrev wolfgang kern
<now...@nevernet.at>:

> [the only advantage of windoze I know :)]

:-))

also, other advantages it created dedicated programmers like you. Think,
if windows wore a very good OS, you would probably programm in it full
time, and KeSys might not exist.

;-)

>
> __
> wolfgang
>
>
>

--
Do not Upgrade.

wolfgang kern

unread,
Jul 4, 2004, 9:03:05 AM7/4/04
to

"The Wannabee" wrote:

| > [the only advantage of windoze I know :)]
| :-))
| also, other advantages it created dedicated programmers like you. Think,
| if windows wore a very good OS, you would probably programm in it full
| time, and KeSys might not exist.
| ;-)

Oh yes, beside the daily curses: many thanks to Big Bill !

__
wolfgang

C

unread,
Jul 5, 2004, 8:36:46 AM7/5/04
to
"Ray M. Ransom" <ray@mîcôs˙ęń.com> wrote in message news:<A6kFc.2390$6e7....@nwrddc03.gnilink.net>...

> To get a concise and understandable version, I had to write it myself. This
> is the base 32-bit, 624-sample version. Unlike previous versions I have
> seen, it never pauses to pre-compute a block of samples. Each call computes
> the 624th future sample and stores it in the array over the sample it has
> just read. The twister code is 163 bytes and needs 626 dwords of data
> storage. Output has been validated against the horrid C code on Makoto
> Matsumoto' site.

Neat, I have optimised the relevant parts of your code and
translated it to Luxasm (plus macros). It will probably be
placed in the standard Luxasm library soon, but for now here
it is...

;-------.________________________________________
;
; Pseudo Random Number Generator Abstract Class
# macro LXL.Random
# define _lxl..Random..Base
abstract class Random

abstract method reseed ( seed : 4 )
abstract method random () : 4

end_class
# end#macro
;________________________________________
;
; Mersenne Twister Pseudo Random Number Generator
; By Ray M. Ransom <ray micosyen.com> : 2004-07-02
; { alt.lang.asm : Mersenne twister example code in x86 asm }
; Optimisation & Luxasm conversion by C.R.Chafer : 2004-07-03
# macro LXL.Random.Twister
# ifndef _lxl..Random..Base
LXL.Random
# endif
class Random.Twister : Random ; Mersenne twister pseudo random number generator

variable n_index : dword ; Current index
variable m_index : dword ; Recirculate index
variable table : [624]dword ; Mersenne table
alias N : 624 ; Control values
alias M : 397

method create ( seed : 4 ) ; Initialise random number generator with seed

xor ecx, ecx ; Clear count/array index
mov ebp->m_index, Random.Twister.M:4;Initialise recirculate index
mov ebp->n_index, ecx ;0 ; Clear current position index
do
mov ebx, eax ; Previous seed value to EBX
mov ebp->table[ ecx*4 ], eax ; Store seet at mt[ecx]
shr ebx, 30 ; eax = ((ebx^(ebx>>30))*1812433253
add ecx, 1
xor eax, ebx
imul eax, eax, 1812433253
add eax, ecx
loop_until ecx >= Random.Twister.N-1; Loop till all entries filled
mov ebp->table[ ecx*4 ], eax ; Store seed at mt[ecx]

end_method

method reseed ( seed : 4 ) ; Reseed random number generator

xor ecx, ecx ; Clear count/array index
mov ebp->m_index, Random.Twister.M:4;Initialise recirculate index
mov ebp->n_index, ecx ;0 ; Clear current position index
do
mov ebx, eax ; Previous seed value to EBX
mov ebp->table[ ecx*4 ], eax ; Store seet at mt[ecx]
shr ebx, 30 ; eax = ((ebx^(ebx>>30))*1812433253
add ecx, 1
xor eax, ebx
imul eax, eax, 1812433253
add eax, ecx
loop_until ecx >= Random.Twister.N-1; Loop till all entries filled
mov ebp->table[ ecx*4 ], eax ; Store seed at mt[ecx]

end_method

method random () : 4 ; Get next random number (32-bit)
local raw_output : 4

mov esi, ebp->n_index ; Get current position index
mov edi, ebp->m_index ; Get recirculate position index
mov edx, ebp->table[ esi*4 ] ; Get next raw output dword

cmp esi, Random.Twister.N-1 ; Wrap to start, if past table end
lea ebx, [ esi + 1 ]
mov eax, ebp->table[ ebx*4 ] ; Get mt[n_index+1]
sbb ecx, ecx ; ecx = CF ? -1 : 0
mov esp->raw_output, edx ; Save raw output dword
and ebx, ecx ; ebx = ~ecx & ( esi + 1 )
add eax, eax ; Throw away bit 31
add edx, edx ; Load bit 31 to C flag
rcr eax, 1 ; Combine two dword (y)
shr eax, 1 ; CF = eax & 1, eax >>= 1
sbb ecx, ecx
xor eax, ebp->table[ edi*4 ] ; mt[m] ^ eax >> 1
and ecx, 9908b0dfh ; y ^= ( CF ? 9908b0dfh : 0 )
xor eax, ecx

cmp edi, Random.Twister.N-1 ; Wrap to start, if past table end
sbb ecx, ecx
add edi, 1 ; Adjust recirculate index, with wrap
mov eax, esp->raw_output ; Restore raw output dword
and edi, ecx
mov ebp->n_index, ebx ; Update current position index
mov ebp->m_index, edi ; Update recirculate position index

mov ebx, eax ; Cook raw output dword
shr eax, 11
xor eax, ebx ; y ^= ( y >> 11 )
mov edx, eax
shl eax, 7
and eax, 9d2c5680h
xor eax, edx ; y ^= ( y << 7 ) & 0x9d2c5680UL
mov ebx, eax
shl eax, 15
and eax, 0efc60000h
xor eax, ebx ; y ^= ( y << 15 ) & 0xefc60000UL
mov edx, eax
shr eax, 18
xor eax, edx ; y ^= ( y >> 18 )

end_method ; Serve eax = pseudo random number

end_class
# end#macro

C
2004-06-05

̇ęń.com Ray M. Ransom

unread,
Jul 5, 2004, 2:43:11 PM7/5/04
to
"C" wrote:

> ...I have optimised the relevant parts of your code and
> translated it to Luxasm...

Those are good optimizations on paper, but they have a mixed effect across
the range of Pentium-class machines (See speed comparison results below).
The version I posted is meant to run on all 32-bit x86 processors, so was not
optimized at all.

Your version produces the correct output, but it seems to have a hole in the
algorithm. In this section,

> cmp esi, Random.Twister.N-1 ; Wrap to start, if past table end
> lea ebx, [ esi + 1 ]
> mov eax, ebp->table[ ebx*4 ] ; Get mt[n_index+1]
> sbb ecx, ecx ; ecx = CF ? -1 : 0
> mov esp->raw_output, edx ; Save raw output dword
> and ebx, ecx ; ebx = ~ecx & ( esi + 1 )

EBX is supposed to point to the next element in the table, mod N. When N has
its maximum value, N+1 should be 0. I don't believe that to be the case in
the above code. The funny thing is, it still works for every case I tried.
Unless I am missing something, I still think it would be a good idea to move
the "mov eax, ebp->table[ ebx*4 ]" to the end of the instructions shown
above.

* * *

I ran some tests, comparing the speed of the original code to the optimized
version. The results below are the number clock cycles consumed by computing
1 million pseudo-random numbers. The test is repeated 20 times to help
filter out the aberrations caused by Windoze. All tests were run under
Windoze 98 on all machines except the P4, which runs Windoze 2000 (notice how
slow it is).

1.7 Ghz Pentium 4
Old New
031EBBF8 02C0BC74
032AA844 02D323DC
031A32D8 02C03C88
031C229C 02F9B264
031C4238 02BFB16C
0317C350 02BF6690
0318F954 02BF3184
031A91B8 02C3D6E8
0321FCCC 02BF4D8C
031C6D08 02C06010
0320DF24 02C17F44
0319FEB4 02C52014
031AE988 02F5A098
0318D91C 02C11060
031A0BBC 02CB184C
031EE0D4 02BF86BC
031856B0 02BFAE40
031AF540 02C01140
031B522C 02CE4110
031B9848 02D8FDD4

1.0 GHz AMD Thunderbird
Old New
02E4ED6E 02209010
029D0783 022345EE
02A13C58 021F5113
029D20C2 02208419
029AB1CE 02226167
029D45B6 021DC855
02A1F0F1 021F04F3
029D6709 022089DB
029EA347 02216990
029DDE2D 021D714B
029E8848 0220CC16
034F25FD 0222087D
029DECA5 0221A242
029D6934 021E41A5
029E15A1 02243F85
029BBD92 0221DE0B
029D1D68 022209CD
0299F016 0221A96F
029D03AE 02209F3F
029D0858 022159A2

600 MHz Pentium 3
Old New
0302FC5A 02C24AB6
0309BB9F 02B4423A
0307EE4A 02B52A7D
0302BA7B 02AF6836
03057023 02AE64E6
03014CF8 02AEAB59
038D1EC7 02AF6BB8
0300DF7D 02AEF6F1
0303DAEA 02AE22EF
0300BE72 02AE87AD
03037940 02AE42A5
03041D4B 02AECC4E
03045988 02AE3CBF
03018D07 02AE2ACA
0303A79F 02AEF296
030040DE 02AEF47B
0304471D 02AEF842
0300F3E0 02AE7BA7
03047A97 0337397D
03009E90 02AF1B56

233 MHz Pentium MMX
Old New
02DA690B 033BEFDB
02E39058 03361B6C
02E6217F 033347AD
02E18C36 03348BFC
02E0A2F3 03322A80
02DCFF81 032FBA64
02E3F1BD 0331DB59
02DF7BBC 03334CF7
02E3F48F 0331B299
02DCA231 03346114
02E61836 0333826E
02DFAB8A 0333803C
02E54D54 03326EB4
02DF670E 033396A1
02E46F12 03327B70
02DFE56C 033368EC
02E184E9 03343745
02DFC076 0333A5D7
02E4490E 03339766
02E02003 0331EEC2

200 MHz Pentium non-MMX
Old New
032393EB 031675E2
02F94C14 03153845
02FA50BA 03148A8C
02FC1D61 03102DA0
02FC1327 03136A20
02F76F59 03121E7C
02FDAC3D 0311DB58
02F8F423 0311451B
02FB7D09 03131297
02F7411B 031399B7
02FD07AC 0314D0B7
02FBC15D 0312CBD0
02FBA815 031288B6
02F6CBF0 03135310
02FCA547 0311DE69
02FB27DB 0316931F
02FD7CB2 0312FDEA
02F81BDE 0312AD65
02FC7C2E 03125689
02FB796A 0312EC9E

It is clear from the data that this optimization favors new fast processors
at the expense of those who really could benefit from it. These results are
consistent with numerous other similar tests I have run. I concluded a long
time ago that keeping your code small and simple (so long as you avoid
obvious stalls) has a much better effect, when averaged across all target
machines, than following the optimizing hints provided by Intel and AMD.


C

unread,
Jul 6, 2004, 1:02:25 PM7/6/04
to
"Ray M. Ransom" <ray@mîcôs˙ęń.com> wrote in message news:<3jhGc.20902$6e7....@nwrddc03.gnilink.net>...

> "C" wrote:
>
> > ...I have optimised the relevant parts of your code and
> > translated it to Luxasm...
>
> Those are good optimizations on paper, but they have a mixed
> effect across the range of Pentium-class machines (See speed
> comparison results below).

Yup, I made a couple of mistakes and compounded them by not
testing the code. I have reposted an improved version below
which should (and hopefully does) run better across the
earlier processors. (Maybe playing with the conditional
mov (MOVcc) could produce further improvements.) Optimisation
was using the P4 guide anyway, so I expect the best improvements
there anyway. (Unfortunatly it now looks almost nothing like
your code -- and I still have yet to test it.)

The main problem I have worked on is removing the ADD/ADD/RCR/SHR
sequence, which I expect is a major bottle next, preventing the
586 from pairing instructions and the P4 executes shifts & rotates
slowly anyway. Memory access is also more spread out to reduce
load on the MMU.

> Your version produces the correct output, but it seems to
> have a hole in the algorithm. In this section,
>
> > cmp esi, Random.Twister.N-1 ; Wrap to start, if past table end
> > lea ebx, [ esi + 1 ]
> > mov eax, ebp->table[ ebx*4 ] ; Get mt[n_index+1]

Yes, that line looks like a mistake -- even if not, the AGI
stall being caused will kill performance on 486 and 586
processors. (That is what you get when programming while
half asleep at 5 in the morning :-) .)

> > sbb ecx, ecx ; ecx = CF ? -1 : 0
> > mov esp->raw_output, edx ; Save raw output dword
> > and ebx, ecx ; ebx = ~ecx & ( esi + 1 )

method random () : 4 ; Get next random number (32-bit)

mov esi, ebp->n_index ; Get current position index, n
push esi ; Save n


cmp esi, Random.Twister.N-1 ; Wrap to start, if past table end

mov eax, 7fffffffh ; a = mt[n+1] & 7fffffffh
sbb ecx, ecx ; i = ( i<N-1 ? i+1 : 0 )


mov edx, ebp->table[ esi*4 ] ; Get next raw output dword

add esi, 1
push edx ; Save raw output dword
shr edx, 1 ; d = ( d >> 1 ) & ( 1<<30 )
and esi, ecx


mov edi, ebp->m_index ; Get recirculate position index

and eax, ebp->table[ esi*4 ]
shr eax, 1 ; a = (a>>1)^( a&1 ? 9909b0df : 0 )^mt[m]
mov ebp->n_index, esi ; Update current position index
sbb ecx, ecx
and edx, 40000000h


xor eax, ebp->table[ edi*4 ]

and ecx, 9909b0dfh
pop ebx
xor eax, ecx
cmp edi, Random.Twister.N-1 ; j = ( j<N-1 ? j+1 : 0 )
pop esi


sbb ecx, ecx
add edi, 1

xor edx, eax ; New seed, mt[n] = a^d
mov eax, ebx
shr ebx, 11 ; Cook raw output dword
xor ebx, eax ; y ^= ( y >> 11 )
mov ebp->table[ esi*4 ], edx ; Write new seed value at current pos
mov eax, ebx
mov edx, ebx
shl eax, 7
and edi, ecx
and eax, 9d2c5680h
xor eax, edx ; y ^= ( y<<7 ) & 9d2c5680


mov ebx, eax
shl eax, 15

mov ebp->m_index, edi ; Update recirculate index
and eax, 0efc60000h
xor eax, ebx ; y ^= ( y<<15 ) & efc60000

̇ęń.com Ray M. Ransom

unread,
Jul 7, 2004, 10:46:32 AM7/7/04
to
"C" wrote:

> ...The main problem I have worked on is removing the ADD/ADD/RCR/SHR
> sequence...

You might try:

and eax,7fffffffh
and edx,80000000h
or eax,edx
shr eax,1

Resisters are probably not the ones you now use.


0 new messages