Optimize speed 8086 instruction "rep movsb" and "rep stosb"

64 views
Skip to first unread message

Phu Tran Hoang

unread,
Jul 22, 2022, 2:56:01 AM7/22/22
to
;Replace "rep movsb" by the following code
test di,1 ; alaign by word
jz $+4
movsb
dec cx

shr cx,1
rep movsw
jnc $+3
movsb



;Replace "rep stosb" by the following code
mov ah, al
test di,1 ; alaign by word
jz $+4
stosb
dec cx

shr cx,1
rep stosw
jnc $+3
stosb

wolfgang kern

unread,
Jul 22, 2022, 9:42:02 AM7/22/22
to
[jnc+1 ? stosb/stosw are only one byte code "AA/AB"]

Yes, pre- and post-aligning string operations are
the main speed-gain in my OS. It works with 32-bit
reduction/extension for any odd start and size.

But I also align source or destination to quad bounds.

TEST esi,3
JZ isAligned
... ;adjust for an aligned loop start here
isAligned:
SHR ecx,1 ;no action at all if ecx=0
JNC +1
LODSB
SHR ecx,1
JNC +2 ; +2 for use32
LODSW ; because prefix required here
REP LODSD ;falls through if ECX=Zero

and with similar dummy reads up front and at end it
can part-read disk sectors at any offset and size.
__
wolfgang

Reply all
Reply to author
Forward
0 new messages