On Fri, 11 Nov 2016 11:09:27 +0100
"wolfgang kern" <
now...@never.at> wrote:
> Edward Brekelbaum said (in part):
(from comp.lang.asm.x86)
I've also compiled such lists of various instruction features, and over
time it became a large document. This is about 80% of said document.
20% is not posted for various reasons. You'll need fixed width text
font. Also, slight editing and reformatting was done due to Usenet
line-wraps. You may need to unwrap some lines.
wolfgang, the sections with similar headings to yours are somewhat
further down.
<--start-->
x86 instruction information
compiled and authored by Rod Pemberton
----
NOTE: For brevity, most of this document will use 16-bit registers
when there is a choice between either a 16-bit or 32-bit register.
e.g.,
ax - This means ax or eax.
NOTE: In general, this document does has no 64-bit coverage.
NOTE: this document assumes some familiarity with x86 instructions.
e.g.,
r/m/16/32 - This means the instruction supports registers and
memory in 16 and 32 bit forms, but not 8 bit forms.
e.g.,
movs,movsb/w/d - This is an abbreviation for movs,movsb,movsw,movsd.
CPU instruction length:
386+ 15 bytes maximum, GP fault generated if exceeded
286 10 bytes maximum
86 no maximum - instruction size 1 to 4 bytes
registers:
8-bit: al,ah,bl,bh,cl,ch,dl,dh
16-bit: ax,bx,cx,dx,si,di,bp,sp
32-bit: eax,ebx,ecx,edx,esi,edi,ebp,esp
- Note that the 8 byte registers don't correspond with the
8 word or dword registers, but with bytes in the first four
registers only: ax,bx,cd,dx
- This register non-orthogonality affects the use of byte register
instructions together with: si,di,bp,sp, i.e., setcc
segments:
ss,cs,ds,es
fs,gs (386+)
default instruction segment:
"all" to ds: - except for ip, bp, sp, and for string instructions, di
ip to cs:
di to es: - for string instructions only, otherwise ds:
bp,sp to ss:
intended register usage:
ax - accumulator - used for calculations
bx - base
- used as a pointer for 16-bit indirect memory access
- used as a base address with the xlat,xlatb instruction
cx - count
- used with looping instructions and loop prefixes
- used with bitwise shift or rotate instructions
dx - data - used an accumulator extension
si - source index - used with string instructions for reading
di - destination index - used with string instructions for writing
sp - stack pointer - used with stack instructions
bp - base pointer - used as the base address of a stack frame
ip - instruction pointer
intended segment usage:
cs - code segment
ds - data segment
es - extra segment
ss - stack segment
fs - segment
gs - segment
instruction overrides:
ss,cs,ds,es,fs,gs (0x36,0x2E,0x3E,0x26,0x64,0x65)
rep,repne,repnz,repe,repz (0xF2,0xF3)
lock (0xF0)
0x66 - operand size
0x67 - address size
branch hint taken for jcc - 0x3E (ds override)
branch hint not taken for jcc - 0x2E (cs override)
sse etc. (0x66, 0xF2, 0xF3 - op size and rep)
16-bit register addressing modes:
al,cl,dl,bl,ah,ch,dh,bh (8-bit registers)
ax,cx,dx,bx,sp,bp,si,di (16-bit registers)
16-bit memory addressing modes:
ds:[bx+si+(no offset,8-bit offset,16-bit offset)]
ds:[bx+di+(no offset,8-bit offset,16-bit offset)]
ss:[bp+si+(no offset,8-bit offset,16-bit offset)]
ss:[bp+di+(no offset,8-bit offset,16-bit offset)]
ds:[si+(no offset,8-bit offset,16-bit offset)]
ds:[di+(no offset,8-bit offset,16-bit offset)]
ss:[bp+(8-bit offset,16-bit offset)] (NOTE: no 'no offset' form)
ds:[16-bit displacement]
(NOTE:above replaces 'no offset' form of bp+offset)
ds:[bx+(no offset,8-bit offset,16-bit offset)]
32-bit register addressing modes:
al,cl,dl,bl,ah,ch,dh,bh (8-bit registers)
eax,ecx,edx,ebx,esp,ebp,esi,edi (32-bit registers)
32-bit memory addressing modes:
ds:[eax+(no offset,8-bit offset,32-bit offset)]
ds:[ecx+(no offset,8-bit offset,32-bit offset)]
ds:[edx+(no offset,8-bit offset,32-bit offset)]
ds:[ebx+(no offset,8-bit offset,32-bit offset)]
SIB forms (**) (no esp+offset mode - SIB encoding allows esp+offset)
ss:[ebp+(8-bit offset,32-bit offset)] (NOTE: no 'no offset' form)
ds:[32-bit displacement]
(NOTE:above replaces 'no offset' form of ebp+offset)
ds:[esi+(no offset,8-bit offset,32-bit offset)]
ds:[edi+(no offset,8-bit offset,32-bit offset)]
(**) two SIB forms (scaled index byte):
ds/ss:[(reg0a,32-bit displacement)+(reg1*n,none)]
ds/ss:[reg0b+(reg1*n,none)+(8-bit offset,32-bit offset)]
segment is ds, except if reg0a or reg0b is esp or ebp, then it's ss
reg0a=eax,ecx,edx,ebx,esp,esi,edi,32-bit displacement(ebp) (no ebp)
reg0b=eax,ecx,edx,ebx,esp,esi,edi,ebp
reg1=eax,ecx,edx,ebx,ebp,esi,edi,none(esp) (no esp)
n=1,2,4,8
xlat address mode:
xlat is equivalent to 'mov al,ds:[bx+al]'
So, xlat effectively adds an additional
address mode for 8086 of 'ds:[bx+al]'.
registers by register encoding:
eax(0) ax(0) al(0) es(0)
ecx(1) cx(1) cl(1) cs(1)
edx(2) dx(2) dl(2) ss(2)
ebx(3) bx(3) bl(3) ds(3)
esp(4) sp(4) ah(4) fs(4)
ebp(5) bp(5) ch(5) gs(5)
esi(6) si(6) dh(6)
edi(7) di(7) bh(7)
registers by register group:
eax(0) ax(0) ah(4) al(0)
ecx(1) cx(1) ch(5) cl(1)
edx(2) dx(2) dh(6) dl(2)
ebx(3) bx(3) bh(7) bl(3)
esp(4) sp(4)
ebp(5) bp(5)
esi(6) si(6)
edi(7) di(7)
data movement:
The x86 has three (basic) locations where instructions can load
or store values: registers, stack, and memory.
From the diagram, note that there are instructions which will
move data between:
1) register and register
2) register and memory
3) register and stack
4) stack and memory
mov
register xchg register
+-----------------+
| |
push | | mov
pop | | xchg
mov [sp] | |
| |
+-----------------+
stack push mem memory
pop mem
Note that multiple instructions (using the stack or registers) are
usually required to move data between:
1) memory and memory
2) stack and stack
One exception to this is the movs instruction. The movs instruction
allows memory to memory moves. The instruction has setup overhead
because it uses many hardcoded registers. The instruction is slow
but does much work. When used with a repeat prefix, it is faster
than a loop composed of other instructions.
The diagram ignores segment registers and I/O ports.
fastest modes:
1) register,register
2) accumulator,immediate
3) register,immediate
'register,register' instructions:
adc,add,and,cmp,or,sbb,sub,test,xor
mov
arpl,bsf/r,cmpxchg,movsx,movzx,xchg
'accumulator,immediate' (or 'immediate,accumulator') mode instructions:
adc,add,and,cmp,or,sbb,sub,test,xor
in,out
'register,immediate' instructions:
adc,add,and,cmp,or,sbb,sub,test,xor
mov
rcl,rcr,rol,ror,sal,sar,shl,shr
hardcoded accumulator (al,ah,ax) instructions:
adc,add,and,cmp,or,sbb,sub,test,xor (*)-all
lods,stos,scas
in,out
mov(*)
cmpxchg,xchg(*),xlat,xlatb
div,idiv,imul,mul
cbw,cwde,cwd,cdq
aaa,aad,aam,aas,daa,das (64-bit obs.)
lahf,sahf,salc (64-bit obs.)
cpuid (eax, clobbers eax,ebx,ecx,edx)
(*) have a regular form and short form for accumulator
instructions with hardcoded ,0 or ,1 forms:
enter
rcl,rcr,rol,ror,sal,sar,shl,shr
hardcoded register instructions or prefixes (other than accumulator
only):
xlat,xlatb (bx al)
jcxz,jecxz (cx)
rep,repe/z/ne/nz (cx)
loop,loope/z/nz/ne (cx)
rcl,rcr,rol,ror (cl)
sal,sar,shl,shr (cl)
shld,shrd (cl)
in,out (dx ax,al)
cwd,cdq,div,idiv,mul,imul (dx:ax,al)
lds,les,lfs,lgs,lss
mov (for crx,drx,trx - trx are obs.)
push,pusha/ad/f/fd (ss:sp)
pop,popa/ad/f/fd (ss:sp)
cmps,cmpsb/w/d (ds:si es:di)
movs,movsb/w/d (ds:si es:di)
ins,insb/w/d (es:di dx)
outs,outsb/w/d (ds:si dx)
lods,lodsb/w/d (ds:si ax,al)
scas,scasb/w/d (es:di ax,al)
stos,stosb/w/d (es:di ax,al)
enter,leave (bp ss:sp)
string or memory block instructions, with useable rep prefixes:
A) between memory and memory
cmps,cmpsb/w/d (ds:si es:di) - repe/z/ne/nz (cx)
movs,movsb/w/d (ds:si es:di) - rep (cx)
B) between memory and ax
lods,lodsb/w/d (ds:si ax,al) - rep (cx)
scas,scasb/w/d (es:di ax,al) - repe/z/ne/nz (cx)
stos,stosb/w/d (es:di ax,al) - rep (cx)
C) between memory and ports
ins,insb/w/d (es:di dx) - rep (cx)
outs,outsb/w/d (ds:si dx) - rep (cx)
instructions which load or store segment registers:
lds,les,lfs,lgs,lss
mov
push
pop (except cs - cs allowed on 8086/80186 only)
jmp,call,ret,iret (cs)
instructions that support mixed sizes (all imm8 except for movzx/sx):
add,adc,and,cmp,or,sub,sbb,xor r/m/16/32,imm8 (sign-extended)
rcl,rcr,rol,ror,sal,sar,shl,shr r/m/16/32,imm8
shrd,shld r/m/16/32,r/16/32,imm8
bt,btc/r/s r/16/32,imm8
imul r/16/32,imm8 (sign-extended)
imul r/16/32,r/m/16/32,imm8 (sign-extended)
push imm8 (sign-extended)
movzx/sx r/16/32,r/m/8/16 (zero-extended,sign-extended)
instructions that support imm8:
add,adc,and,cmp,or,sub,sbb,xor
rcl,rcr,rol,ror,sal,sar,shl,shr
push
mov
test
int
in,out
shld,shrd
bt,btc/r/s
enter
imul
instructions with +r form encoding (bytes in paren):
inc,dec (1)
push,pop (1)
xchg (1)
bswap (2)
mov imm (offset)
instructions that support 8-bit instruction relative offsets:
jcc,jcxz,jecxz,jmp,loop,loope/z/nz/ne
[not call instruction]
instructions that support 16/32-bit instruction relative offsets:
call,jcc,jmp
instructions which can perform sign-extension:
add,adc,and,cmp,or,sub,sbb,xor r/m/16/32,imm8
cbw,cwde,cwd,cdq
movsx
imul
instructions which can perform zero-extension:
movzx
movd
instruction which requires a sign-extended argument:
idiv (i.e., requires cwd or cdq)
instructions with r/m/16/32 or r16/32 forms but without r/m/8 or r/8:
push,pop
bsf/r,bt,bts/r/c
jmp,call
bound
lar,lsl
lds,les,lfs,lgs,lss
lea
sldt,smsw
shrd,shld
instrucion with r/m8 only form:
setcc
instruction with r32 only form:
bswap
instructions with r/m16 only forms:
arpl
lldt,lmsw
ltr,str,verr/w
sldt,smsw
instructions with r/m32 only forms:
movd
cvtsi2ss/d
instructions with only single operand forms:
neg,not
pop (no push)
mul (no imul)
div,idiv
inc,dec
bswap
setcc
lldt,lmsw
ltr,str,verr/w
sldt,smsw
instructions that do not accept an operand-size override:
lldt,lmsw
ltr,str
sldt
NOTE: only info from 386 manual
instructions (and prefixes) that have an exact one byte form:
segment overrides (es,cs,ss,ds,fs,gs)
operand size override (0x66)
address size override (0x67)
lock,rep,repe/z/ne/nz
push (es,cs,ss,ds)
pop (es,ss,ds)
aaa,aas,daa,das
inc,dec,push,pop r/16/32
pusha/ad/f/fd
popa/ad/f/fd
cmps,cmpsb/w/d
movs,movsb/w/d
ins,insb/w/d
outs,outsb/w/d
lods,lodsb/w/d
scas,scasb/w/d
stos,stosb/w/d
nop,wait,hlt
xchg,xlat,xlatb
cbw,cwde,cwd,cdq
ret,leave,retf,iret,iretd
int 3,into,int1
salc,sahf,lahf
in,out
clc,cld,cli,cmc,stc,std,sti
instructions that have an exact two byte form:
clts
invd,wbinvd
push (fs,gs)
pop (fs,gs)
bswap r/32
cpuid
rdtsc,rdmsr,rdpmc,wrmsr
rsm
syscall,sysret
sysenter,sysexit
emms,femms
ud0,ud1,ud2
instructions which preserve flags:
bound
bswap,xchg,xlat,xlatb
cbw,cdq,cwd,cwde
clts (doesn't modify EFLAGS, but CR0)
cpuid
enter,leave
esc (all FPU instructions)
dec (only CF), inc (only CF)
in,ins,lods,stos
invd,invlpg,wbinvd
jcc,cmovcc,fcmovcc,setcc
call,jmp,jcxz,ret
lahf,lds/es/ss/fs/gs
lgdt,lidt,lldt,lmsw
hlt,lock,wait
loop,loope/ne,loopz/nz
ltr,str
monitor,mwait
lea,mov
movs,movsx,movzx
not,nop,ud2
out,outs
pop,popa
push,pusha,pushf
rdmsr,rdpmc,rdtsc,wrmsr
rep,repe/ne
sgdt,sidt,sldt,smsw
instructions which modify flags:
aaa,aad,aam,aas,daa,das
adc,add,sbb,sub,xadd,dec,inc
and,neg,or,xor
arpl,bound
bsf/r,bt,bts/r/c
clc,cmc,cld,cli
cmp,test
cmps,scas
cmpxchg,cmpxchg8b
comisd,comiss,ucomisd,ucomiss
div,idiv,imul,mul
dec (except CF), inc (except CF)
int,into
fcomi,fcomip,fucomi,fucomip
iret
lar,lsl
mov crx/drx/trx (trx are obs.)
popf,popfd,pushf,pushfd
rcl,rcr,rol,ror
rsm
sal,sar,shl,shr,shld,shrd
stc,std,sti
sahf
verr/w
instructions which modify the carry flag:
aaa,aas,daa,das
adc,add,sbb,sub,xadd
and(0),or(0),xor(0)
neg (CF=0 for zero)
clc(0),cmc(x),stc(1)
bt,bts/r/c
cmp,test(0)
cmps,scas
cmpxchg,cmpxchg8
comsid,comiss,ucomsid,ucomiss
fcomi,fcomip,fucomi,fucomip
imul,mul
iret,popf,rsm,sahf
rcr,rcr,rol,ror
sal,sar,shr,shl
shld,shrd
instructions which preserve the carry flag:
dec,inc
instructions which modify the zero flag:
aad,aam,daa,das
adc,add,sbb,sub,xadd,dec,inc
and,neg,or,xor
cmp,test
cmps,scas
bsf/r
cmpxchg,cmpxchg8b
comsid,comiss,ucomsid,ucomiss
fcomi,fcomip,fucomi,fucomip
iret,popf,rsm,sahf
lar,lsl,arpl
sal,sar,shr,shl
shld,shrd
verr/w
NOTE: neg sets CF=0 for zero (in carry flag section above)
arithmetic instructions which DO NOT update the carry flag:
inc,dec,div,idiv,not
won't generate partial flag stalls:
and,or,xor,add,adc,sub,sbb,cmp,neg
will generate partial flag stalls (when followed by
lahf,sahf,pushf,pushfd):
inc,dec,test
bt,btc/r/s
bsf/r
clc,cld,cli
cmc
stc,std,sti
mul,imul
rcl,rcr,rol,ror,sal,sar,shl,shr
override restricted instructions:
cmps/cmpsb/w/d no override of ES segment
ins/insb/w/d no override of ES segment
movs/movsb/w/d no override of ES segment
scas/scasb/w/d no override of ES segment
stos/stosb/w/d no override of ES segment
mov crx ignores operand size override
monitor no operand size override, no rep/repne/repnz, no lock
popf/fd #UD if v86 with I/O priv. < 3, and op. size override
jcc #GP if invalid address due to operand size override
loop/loopc #UD in RM if address size override
vmcall address size or segment ignored, #UD for operand size
vmlaunch address size or segment ignored, #UD for operand size
vmresume address size or segment ignored, #UD for operand size
vmclear operand size ignored
vmptrld operand size, equivalent to vmclear
vmptrst #UD for operand size
vmread #UD for operand size
vmwrite #UD for operand size
vmxoff #UD for operand size
vmxon operand size ignored
fxrstor #UD for lock override in RM and PM
fxsave #UD for lock override in RM and PM
atomic operations:
lock (atomic override prefix for multiprocessors - asserts lock
signal)
lss,lds,les,lfs,lgs (loads a segment and offset in single
instruction)
mov ss (following instruction executes without interrupt, e.g., mov
esp)
xchg (automatically locks for memory operands)
critical non-atomic operations:
mov crx (NMI and maskable interrupts must be disabled for atomic
usage)
lock atomic override prefix can be used on (when they have memory
operand):
adc,add,sbb,sub,dec,inc
and,neg,not,or,xor
btc,btr,bts
xadd,xchg
cmpxchg,cmpxchg8b
486 serializing instructions:
(forces completion of all prior instructions on out-of-order-execution
CPU's)
iret (non-privileged)
rsm (non-privileged, only in SMM)
wbinvd (privileged)
mov cr0 (privileged)
mov drx (privileged)
lmsw (won't serialize P5+)
exceptions: int xx,int1,int3,into,bound (won't serialize P5+)
branches: call,ret,retf,jmp,jcc (won't serialize P5+)
segment load: mov sr,pop sr,lds/es/fs/gs/ss (won't serialize P5+)
586+ serializing instructions:
(i.e., non-parallel execution for Pent,P4,P6,Xn)
(forces completion of all prior instructions on out-of-order-execution
CPU's)
iret (non-privileged)
cpuid (non-privileged, won't serialize on 486, modifies registers)
rsm (non-privileged, only in SMM)
wbinvd (privileged)
mov crx (privileged, not with cr8, only cr0 for 486)
mov drx (privileged)
invd (won't serialize on 486)
invlpg (won't serialize on 486)
lgdt (won't serialize on 486)
lidt (won't serialize on 486)
lldt (won't serialize on 486)
ltr (won't serialize on 486)
wsrmsr (not available on 486)
out (?)
protected mode (PM) setup instructions valid in real mode (RM)
clts,lidt,lgdt
protected mode (PM) setup instructions not valid in real mode (RM)
arpl,lldt,lsl,ltr,sldt,str,verr,verw
memory ordering instructions (P4,P6,Xn):
sfence (non-privileged, not available on 486)
mfence (non-privileged, not available on 486)
lfence (non-privileged, not available on 486)
rdtscp (non-privileged, not available on 486)
AMD64 obsolete instructions (64-bit obs.):
jmp far/call far - uses segment registers
inc/dec - single byte versions now REX prefixes
push/pop cs/ds/es/ss
lds/les
pusha,popa
into
bound
aaa,aas,aad,aam,daa,das
icebp
82h alias for 80h
sysenter,sysexit
arpl
salc
lahf (some cpu's)
sahf (some cpu's)
ss,ds,es
basic flags:
OF(11) Overflow flag (usually set for signed borrow)
DF(10) Direction Flag (inc/decrement of SI/DI string instructions)
IF(9) Interrupt enable Flag (set recognizes interrupts)
SF(7) Sign Flag (usually set to sign of signed result)
ZF(6) Zero Flag (usually set when equal)
AF(4) Auxillary carry Flag (set when nybble carry/borrow)
PF(2) Parity Flag (set for even, cleared for odd)
CF(0) Carry Flag (usually set for unsigned borrow)
8088/8086:
preference AX or AL register for shorter or faster instruction forms
preference XCHG reg,reg to MOV reg,reg or PUSH reg; POP reg sequence
use XLAT for tables
use XLAT for for an additional address mode, i.e., DS:[BX+AL]
unroll loops, including REP MOVS, which reduces use of CX to avoid
branches
quick guide to instruction timings
----
* pairable
/ partial pairing
F fxch pairable
-f no flag stall
+f preserves flags
+m mixed size
m memory
X 64-bit obsolete
d directpath 1 macrop
d2 directpath 2 macrop
v vectorpath 3+ macrops
timing: P5 486 386
1 cycle
----
nop * 1 1 3 +f d
mov * 1 1 2 +f d
mov m * 1 1 2/4 +f d
push * 1 1 2 +f +m d
pop * 1 4 4 +f d/d2
pop * 1 1 4 +f d/d2
lea * 1 1 2 +f d/v ;add,mul by 1,2,4,8
add * 1 1 2 -f +m d
sub * 1 1 2 -f +m d
and * 1 1 2 -f +m d
or * 1 1 2 -f +m d
xor * 1 1 2 -f +m d
shr * 1 3 3 +m d
shl * 1 3 3 +m d
sar * 1 3 3 +m d
sal * 1 3 3 +m d
cmp * 1 1 2 -f +m d ;sub - no write to register
test * 1 1 1 d ;and - no write to register
inc * 1 1 2 X d
dec * 1 1 2 X d
adc / 1 1 2 -f +m d
sbb / 1 1 2 -f +m d ;x-(x+CF)
ror 1 / 1 3 3 +m d
rol 1 / 1 3 3 +m d
rcr 1 / 1 3 9 +m d
rcl 1 / 1 3 9 +m d
jmpn / 1 3 7+ +f d
jcc / 1 1,3 3,7 +f d
push sr 1 3 2 X +f d2
neg 1 1 2 -f d ;0-x, CF=0 if x=0
not 1 1 2 +f d ;xor imm8 0xFF - sign extended
setcc 1 3 4 +f d
calln 1 3 7+ +f d2
bswap 1 1 - +f d
fld F
fst(p)
fchs F
fabs F
fcom(p)(pp) F
fucom(p)(pp) F
ftst F
fnop
fxch
wait 1 1-3 6+ +f
2 cycle
----
add m * 2 2,3 6,7 +m d
sub m * 2 2,3 6,7 +m d
and m * 2 1,3 6,7 +m d
or m * 2 2,3 6,7 +m d
xor m * 2 2,3 6,7 +m d
cmp m * 2 2 5,6 +m d
test m * 2 1,2 5,5 d
adc m / 2 2,3 6,7 +m d
sbb m / 2 2,3 6,7 +m d
push m 2 4 5 +f +m d2
setcc m 2 4 5 d
cwd 2 3 2 +f d
cdq 2 3 2 +f d
clc 2 2 2 d
stc 2 2 2 d
cmc 2 2 2 d
cld 2 2 2 d
std 2 2 2 d2
lods 2 5 5 +f v
lahf 2 3 2 X +f v
sahf 2 2 3 X d
xchg ax 2 3 3 +f d2
call r 2 5 10+ +f d2
jmp r 2 5 7+ +f d
retn 2 5 10+ +f d
fld m80
fst(p) m32/64
fldz
fld1
fnstcw m16
fincstp
fdecstp
ffree
3 cycle
----
shr m / 3 4 7 +m d
shl m / 3 4 7 +m d
sar m / 3 4 7 +m d
sal m / 3 4 7 +m d
pop m 3 6 5 +f v
inc m 3 3 6 X d
dec m 3 3 6 X d
neg m 3 3 6 d
not m 3 3 6 +f d
ror m 3 4 7 +m d
rol m 3 4 7 +m d
rcr m / 3 4 10 +m v
rcl m / 3 4 10 +m v
xchg 3 3 3 +f d2/v
cbw 3 3 3 +f d
cwde 3 3 3 +f d
stos 3 5 4 +f v
pop sr 3 3 2 X +f v
pushf 3/9 3/4 4 +f v
movsx 3 3 3 +f +m d
movzx 3 3 3 +f +m d
jmpf 3 13 12+ +f v
retn i 3 5 10+ +f d2
leave 3 5 4 d2
fadd(p) F
fsub(p)(r)(rp) F
fmul(p) F
fst(p) m80
fild m
4 cycle
----
popf 4/6 6/9 5 +f v
xlat 4 4 5 +f v
movs 4 7 7 +f v
scas 4 6 7 v
lds X +f
les X +f
lfs X +f
lgs X +f
lss X +f
shr cl +m d
shl cl +m d
sar cl +m d
sal cl +m d
ror cl +m d
rol cl +m d
shld v
shrd v
bt 4 3 3 +m d/v
retf +f d/d2
ficom
5 cycle
----
cmps 5 8 10 v
pusha 5 11 18/24 X +f v
popa 5 9 24 X +f v
shld m v
shrd m v
call m +f v
jmp m +f d/v
retn m +f d/d2
retf i +f d/d2
misc.
----
fdiv(p)(r)(rp) F (fxch pairable)
enter 11+ 14+ 10+ v
rep 8 7 5 (varies by instruction...)
loop 5 6 11+m v
salc
btc/r/s 7 6 6 +m d2/v
rcr cl 7/26 8/31 9/10 +m v
rcl cl 7/26 8/31 9/10 +m v
instructions by generation
----
186 -
SHR/ROT i immediate >1
BOUND
ENTER/LEAVE
INS/INSB/INSW
IMUL r,r,i
OUTS/OUTSB/OUTSW
PUSH i
POPA/POPAD
PUSHA/PUSHAD
286 -
ARPL
CLTS
LAR
LGDT
LIDT
LLDT
LMSW
LOADALL
LSL
LTR
SGDT
SIDT
SLDT
SMSW
STR
VERR
VERW
386 -
MOVZX
MOVSX
IMUL r,r
SHLD
SHRD
BT
BTR
BTS
BTC
BSF
BSR
SETcc
Jcc long-displacement
CDQ
CWDE
IRETD
LFS
LGS
LSS
MOVSD
OUTSD
POPFD
PUSHFD
MOV CRx
MOV TRx
MOV DRx
486 -
BSWAP
CPUID (some)
CMPXCHG
INVD
INVLPG
RSM
WBINVD
XADD
FSTSW AX
87 -
ST(0)-ST(7) registers
FSTCW mem
FLDCW mem
287 -
FSTSW AX
FSETPM
387 -
FCOS
FLDENVD
FNSAVED
FNSTENVD
FPREM1
FRSTORD
FSAVED
FSIN
FSINCOS
FSTENVD
FUCOM
FUCOMP
FUCOMPP
Pent -
CMPXCHG8B
CPUID
RDMSR
WRMSR
RSM
RDTSC
RDPMC (Pent w/MMX only)
MOVCRx
RSLDT
RVTS
SVDC
Pent2 -
SYSENTER
SYSEXIT
PPro -
CMOVcc
FCMOVcc
FCOMV
FCOMI
FCOMIP
FUCOMI
FUCOMIP
RDPMC
UD2
MMX -
MM0-MM7 registers
MOVD
MOVQ
PACKSSDW
PACKSSWB
PACKUSWB
PADDB
PADDW
PADDD
PADDSB
PADDSW
PADDUSB
PADDUSW
PAND
PANDN
PCMPEQB
PCMPEQW
PCMPEQD
PCMPGTB
PCMPGTW
PCMPGTD
PMADDWD
PMULHW
PMULLW
POR
PSLLD
PSLLW
PSLLQ
PSRAD
PSRAW
PSRLW
PSRLD
PSRLQ
PSUBB
PSUBW
PSUBD
PSUBSB
PSUBSW
PSUBUSB
PSUBUSW
PUNPCKHBW
PUNPCKHDQ
PUNPCKHWD
PUNPCKLBW
PUNPCKLWD
PUNPCKLDQ
PXOR
EMMS
SSE -
XMM0-XMM7 registers 64
PREFETCH
SFENCE
FXSAVE
FXRSTOR
MOVNTQ
MOVNTPS
- CVTSI2SS
- CVTSS2SI
- CVTTSS2SI
PSHUFW
PSADW
PMINUB
PMINSW
PMAXUB
PMAXSW
PMULHUW
PAVGB
PAVGW
PINSRW
PMOVMSKB
SSE2 -
XMM0-XMM7 registers 128
MOVNTI
MOVNTPD
PAUSE
LFENCE
MFENCE
- CVTSD2SI
- CVTSI2SD
- CVTTSD2SI
PADDQ
PSUBQ
PMULUDQ
SSE3 -
FISTTP
LDDQU
MOVDDUP
MOVSHDUP
MOVSLDUP
ADDSUBPS
ADDSUPPD
HADDPS
HADDPD
HSUBPS
HSUBPD
SSE4 -
PSHUFB
PHADDW
PHADDSW
PHADDD
PMADDUBSW
PHSUBW
PHSUBSW
PHSUBD
PSIGNB
PSIGNW
PSIGND
PMULHRSW
PABSB
PABSW
PABSD
PALIGNR
SSE4A -
- EXTRA
- INSERTQ
- MOVNTSD
- MOVNTSS
SSE4.1 -
- DPPD
- DPPS
- INSERTPS
- MOVNTDQA
- MPSADBW
- PACKUSDW
- PBLENDW
- BLENDPD
- BLENDPS
- PBLENDVB
- BLENDVPD
- BLENDVPS
- PCMPEQQ
- PEXTRB
- PEXTRW
- PEXTRD
- PEXTRQ
- PHMINPOSUW
- PINSRB
- PINSRD
- PINSRQ
- PMAXSB
- PMAXSD
- PMAXUW
- PMAXUD
- PMINUW
- PMINUD
- PMOVSXBW
- PMOVSXBD
- PMOVSXBQ
- PMOVSXWD
- PMOVSXWQ
- PMOVSXDQ
- PMOVZXBW
- PMOVZXBD
- PMOVZXBQ
- PMOVZXWD
- PMOVZXWQ
- PMOVZXDQ
- PMULDQ
- PMULLUD
- PTEST
- ROUNDPD
- ROUNDPS
- ROUNDSD
- ROUNDSS
SSE4.2
- CRC32
- PCMPESTRI
- PCMPESTRM
- PCMPGTQ
- PCMPISTRI
- PCMPISTRM
- POPCNT
ABM
LZCNT
POPCNT
Monitor -
MONITOR
MWAIT
3DNow - AMD only
FEMMS
PABGUSB
PF2ID
PFACC
PFADD
PFCMPPEQ/GT/GE
PFMAX
PFMIN
PFRCP/IT1/IT2
PFRSQRT/IT1
PFSUB
PFSUBR
PI2FD
PMULHRW
PREFETCH
PREFETCH/W
3DNowE - AMD only
PF2IW
PFNACC
PFPNACC
PI2FW
PSWAPD
64bit - extensions not available in 32-bit mode
16 64-bit general registers
16 XMM
8 MMX
8 ST
<--end-->
HTH,
Rod Pemberton