Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Optimize speed 8086 instruction "rep movsb" and "rep stosb"

124 views

Skip to first unread message

Phu Tran Hoang

unread,

Jul 22, 2022, 2:56:01 AM7/22/22

;Replace "rep movsb" by the following code
test di,1 ; alaign by word
jz $+4
movsb
dec cx

shr cx,1
rep movsw
jnc $+3
movsb

;Replace "rep stosb" by the following code
mov ah, al
test di,1 ; alaign by word
jz $+4
stosb
dec cx

shr cx,1
rep stosw
jnc $+3
stosb

wolfgang kern

unread,

Jul 22, 2022, 9:42:02 AM7/22/22

[jnc+1 ? stosb/stosw are only one byte code "AA/AB"]

Yes, pre- and post-aligning string operations are
the main speed-gain in my OS. It works with 32-bit
reduction/extension for any odd start and size.

But I also align source or destination to quad bounds.

TEST esi,3
JZ isAligned
... ;adjust for an aligned loop start here
isAligned:
SHR ecx,1 ;no action at all if ecx=0
JNC +1
LODSB
SHR ecx,1
JNC +2 ; +2 for use32
LODSW ; because prefix required here
REP LODSD ;falls through if ECX=Zero

and with similar dummy reads up front and at end it
can part-read disk sectors at any offset and size.
__
wolfgang

0 new messages