8086 Instruction Encoding

David Macmillan

unread,

Jun 24, 1992, 6:33:35 AM6/24/92

to

I wonder if some 8086 guru might not be able to convey enlightenment
on the "Instruction Encoding" table given as Table 6-22 (page 6-57ff)
of the Intel _8086/8088 User's Manual, Programmer's and Hardware
Reference_ (Intel order number 240487-001, 1989 edition).

The table claims to give the scheme for encoding the opcodes of an
8086 instruction. Yet this scheme seems to bear little resemblance to
the instructions as encoded elsewhere. For instance:

(case:)
MOV = Move word variable
Memory to register: RRR00AA1 | 100000MM | offset if AA=01 |

RRR, AA, and MM are never defined. (It says table 6-21 defines the
abbreviations used, but it doesn't define these.)

The second byte should be a "mod-reg-r/m" byte (and "mod-reg-r/m" is
used later in this table) but it isn't (the "10" for the mod field is
ok, but "000" for the reg field forces AX as the destination, and the
"0MM" for the r/m field is hard to interpret).

But this conflicts with the instruction as given in table 6-22:
100010dw
RRR00AA1
----A--- <-- conflict at the 'A'

Indeed, the only MOV instructions which fit the "RRR00AA1" format are
those which specify the Accumulator specifically. But these are
3-byte instructions where bytes 2 and 3 are the address, and this
conflicts with the second byte given as "100000MM".

Clearly there must be something I'm missing in the interpretation of
this table, but I have no idea what it is.

Thanks! It was not an epiphany now I was seeking,
David M. MacMillan but information.
dav...@hotcity.com - Umberto Eco, _Foucault's Pendulum_

Danny Halamish

unread,

Jun 24, 1992, 8:21:02 AM6/24/92

to

Intel instruction encoding scheme is not supposed to be understood.
In fact, only mindless programs shuch as TASM, SYMDEB etc. can understand it.

And another thing: why do you WANT to understand them? isn't using them bad
enough?

-Danny

d...@mossad.cs.huji.ac.il

Mark William Hopkins

unread,

Jul 5, 1992, 3:13:26 PM7/5/92

to

In article <david...@hotcity.COM> dav...@hotcity.COM (David Macmillan) writes:
>I wonder if some 8086 guru might not be able to convey enlightenment
>on the "Instruction Encoding" table given as Table 6-22 (page 6-57ff)
>of the Intel _8086/8088 User's Manual, Programmer's and Hardware
>Reference_ (Intel order number 240487-001, 1989 edition).

...

The only proper way to understand 80x86 coding is to realize that ALL 80x86
OPCODES ARE CODED IN OCTAL. For some reason absolutely everybody misses this
point, even the Intel people who wrote the reference on the 8086. The 8086
evolved from the 8080, whose coding is also in octal.

NOTE: The 80386 expands on this encoding scheme quite a bit further.

The mov instructions in octal are:

210 xrm mov Eb, Rb
211 xrm mov Ew, Rw
212 xrm mov Rb, Eb
213 xrm mov Rw, Ew
214 xsm mov Ew, S
216 xsm mov S, Ew

The meanings of the octal digits (x, m, r, s) and their correspondence to the
operands (Eb, Ew, Rb, Rw, S) are the following:

REGISTER (r):
Rb = Byte-sized register
Rw = Word-sized register

The digit r (0-7) encodes the register operand according to the following
table:

r Rb Rw
0 AL AX
1 CL CX
2 DL DX
3 BL BX
4 AH SP
5 CH BP
6 DH SI
7 BH DI

SEGMENT REGISTER (s):
S = Segment register

The segment register digit s (0-7) encodes the segment register as follows:

s S
0 ES
1 CS
2 SS
3 DS
4 RESERVED
5 RESERVED
6 RESERVED
7 RESERVED

ADDRESS MODE (x, m):
Eb = Effective address for byte-sized quantity
Ew = Effective address for word-sized quantity

The digits x (0-3), and m (0-7) encode the address mode according to
the following table:

c = signed byte ("character")
w = unsigned word

xm Eb/Ew xm Eb/Ew xm Eb/Ew xm Eb/Ew
00 DS:[BX + SI] 10 c DS:[BX + SI + c] 20 w DS:[BX + SI + w] 30 AL/AX
01 DS:[DS:BX + DI] 11 c DS:[BX + BI + c] 21 w DS:[BX + DI + w] 31 CL/CX
02 SS:[BX + SI] 12 c SS:[BP + SI + c] 22 w SS:[BP + SI + w] 32 DL/DX
03 SS:[BX + DI] 13 c SS:[BP + DI + c] 23 w SS:[BP + DI + w] 33 BL/BX
04 DS:[SI] 14 c DS:[SI + c] 24 w DS:[SI + w] 34 AH/SP
05 DS:[DI] 15 c DS:[DI + c] 25 w DS:[DI + w] 35 CH/BP
06 w DS:[w] 16 c SS:[BP + c] 26 w SS:[BP + w] 36 DH/SI
07 DS:[BX] 17 c DS:[BX + c] 27 w DS:[BX + w] 37 BH/DI

Operands where x is 0, 1, or 2 are all pointers. If the instruction is a WORD
instruction (211, 213, 214, 216 are), then this pointer addresses to a
word-sized object at the address indicated. Otherwise the instruction is a
BYTE instruction (210, 212) and the pointer addresses byte-sized object at the
indicated address.

The default segments (DS:, SS:) can be overridden with a segment prefix.

Modes where x = 1, or 2 will require displacement bytes (c or w) to follow
the opcode.

When x = 3, WORD sized instructions address the word registers (AX, CX, ...)
and the BYTE size instructions the byte registers (AL, CL, ...).

For example, take the instruction opcode

210 135 375

Here, xm = 15, and r = 3, so the operands are:

mov Eb, Rb
=>
mov byte ptr DS:[DI + c], BL

The displacement is 375 (or fd in hexadecimal), which is the signed byte -3.
So the instruction reads:

mov byte ptr DS:[DI - 3], BL

or just:
mov [DI - 3], BL

Take the instruction opcode
216 332

Here, xm = 32, and s = 3, so the operands are:

mov S, Ew
=>
mov DS, DX

I think it mentions that a move to CS is not possible (because the far jump
instruction already does that) so that the opcode sequence:

216 x2m

is free to be used for encoding something else.

As an illustration of why it's better to think in octal, just look at the
opcodes for the binary arithmetic instructions:

0p0 xrm OP Eb, Rb
0p1 xrm OP Ew, Rw
0p2 xrm OP Rb, Eb
0p3 xrm OP Rw, Ew
0p4 b OP AL, b
0p5 w OP AX, w

b = unsigned byte
w = unsigned word

They all have the same form, with a single digit encoding the operator as
follows:
p OP p OP
0 add 1 or
2 adc 3 sbb
4 and 5 sub
6 xor 7 cmp

That's a good fraction of your reference table right there.

The same mapping is used in the immediate to memory/register form of these
operations:

200 xpm b OP Eb, b
201 xpm w OP Ew, w
202 xpm c OP Eb, c
203 xpm c OP Ew, c

(202 and 203 are not defined for the logical operations: p = 1 (or), 4 (and),
or 6 (xor)).