Opcode Parsing & Invalid Opcodes

Nimai

unread,

Jul 5, 2010, 5:22:58 PM7/5/10

to

I'm learning to program in straight machine code, and I just finished
reading the Intel manuals.

I have a burning question that the books haven't answered, maybe I'm
just stupid and I missed it.

If I do a JMP to a bunch of garbled data, how does the prefetching
process know where the "instruction boundaries" are? Where will EIP
be when the inevitable invalid opcode exception is triggered?

In other words, if the instructions are garbage, how much garbage is
taken in? What are the rules?

My guess is, each possible opcode byte has something like a lookup
table entry, and after parsing a byte, the prefetcher either adds
another byte to the instruction, adds a modr/m byte to the instruction
and grabs displacement and immediate bytes, or ends the instruction
and sends it to the pipeline. This is entirely based on inference, I
can't find anything in the manuals to confirm or deny this.

Whatever process it uses, it MUST be entirely deterministic, or code
can't be. So where is it documented?

BGB / cr88192

unread,

Jul 5, 2010, 6:00:44 PM7/5/10

to

"Nimai" <phl...@nospicedham.gmail.com> wrote in message
news:8059b7b4-07a7-42bc...@i9g2000prn.googlegroups.com...

simple answer:
the bytes either look like an opcode it recognizes, or they don't;
if they look like a recognized instruction, the processor will behave as if
they were that instruction;
if it doesn't, the processor will #UD leaving EIP pointing before the thing
which looks like the bad instruction (any bytes read during attempting to
decode the opcode don't matter).

now, how many bytes this will be/... depends solely on the particular
processor and particular bytes...

in my x86 interpreter, I basically just did a big pattern match over the
opcodes table for decoding each instruction. failing here (no match) was a
#UD. similarly, if there was no logic attached to the opcode in the
interpreter, this was also a #UD (may happen with opcodes which exist in the
ISA, but nothing was written into the interpreter for them to do...).

note that on newer processors, there is the NX bit, which will generate an
exception as soon as EIP/RIP lands in a page with this set, ..., if this
matters...

or such...

Nimai

unread,

Jul 5, 2010, 6:18:51 PM7/5/10

to

On Jul 5, 3:00 pm, "BGB / cr88192" <cr88...@nospicedham.hotmail.com>
wrote:

Theories are nice, but where's the proof? I need word of god on this.

Also, your method works for fixed instruction lengths, but it doesn't
describe what happens to extra bytes trailing after an invalid opcode
is encountered. What if the modr/m byte is screwed up? Does it treat
immediate/displacement data as an additional, invalid opcodes, or does
it throw them away? What if an opcode is corrupted, and its immediate
data or displacement contains a valid opcode?

These are all boundary conditions, sure, and probably useless to most
people, but I need to know!

I want to be able to take any possible stream of bits, no matter how
full of garbage, and be able to know EXACTLY what will happen.

I do NOT want to have to fuzz test my CPU. These things should be
documented.

Again, my question is this: Where is this documented?

Where does Intel specifically say, "instructions are parsed according
to these rules:"?

Bernhard Schornak

unread,

Jul 5, 2010, 7:27:37 PM7/5/10

to

Nimai wrote:

If you jump to any target, the prefetch mechanism
starts to read data from 00[jmp_target]. If these
data cannot be interpreted as an instruction, the
processor issues an invalid opcode exception. For
conditional jumps, the built-in branch prediction
logic determines if the code at 00[jmp_target] is
prefetched or the branch is not taken, and execu-
tion continues with the next opcode following the
conditional jump. If a prediction fails, the pre-
loaded code must be flushed and the real code has
to be loaded.

The prefetcher is a stupid machine and reads data
from anywhere if the current instruction tells it
to do that. The required brain has to be provided
by the programmer or compiler (which got its true
brain from a human, as well).

If you write binary code, you should know how op-
codes are assembled. Every processor works like a
disassembler. It reads byte for byte and compares
them against its opcodes. If a match is found, it
determines if additional data are required - e.g.
the address of your jump target - and gets those,
as well. If everything was okay, this instruction
is executed (with the data you provided). If not,
your program crashes.

This probably is not documented anywhere. You can
figure it out with the opcodes listed in AMD's or
iNTEL's manuals on your own.

Greetings from Augsburg

Bernhard Schornak

Nimai

unread,

Jul 5, 2010, 8:30:43 PM7/5/10

to

On Jul 5, 4:27 pm, Bernhard Schornak <schor...@nospicedham.web.de>
wrote:

Thanks, but I already know all that. It's not that I don't have a
pretty good idea about these answers, but that I need to be able to
confirm them in order to move forward, and at this point my only
options are to fuzz test, or RTFM, and TFM doesn't seem to exist.
Computing is not guesswork. What I really need is official
documentation on what an IA32 processor is "supposed" to do when it
fetches, decodes and executes an instruction, without ignoring all the
possible boundary conditions.

The original 386 manual has a useful tidbit about how invalid opcode
exceptions aren't triggered during prefetch. The current manual
doesn't have this information anywhere, and its sections on pipeline
behavior read more like marketing blurbs than technical reference.
Are these manuals really ALL the information we have about IA32?

Any solid information about the deterministic mechanisms behind
prefetch, pipeline, and execution, from a reliable source, would be
incredibly helpful. The behavior of the opcode interpreter is
arguably the most important part of the entire architecture, and it
seems like the only way to deduce its behavior is reverse-engineering
it from the opcodes, and the descriptions of what they effect.

Any guesses and hypotheses about these things are nice, but if
documentation exists, they're a waste of time.

Rod Pemberton

unread,

Jul 6, 2010, 2:37:49 AM7/6/10

to

"Nimai" <phl...@nospicedham.gmail.com> wrote in message
news:8059b7b4-07a7-42bc...@i9g2000prn.googlegroups.com...

> I'm learning to program in straight machine code, and I just finished
> reading the Intel manuals.
>
> I have a burning question that the books haven't answered, maybe I'm
> just stupid and I missed it.
>
> If I do a JMP to a bunch of garbled data,

Don't do that... Why would you intentionally jump to "garbled data"?

BTW, the single-byte x86 opcode instruction map is full. So, to generate an
"invalid instruction", you must use multi-byte instruction opcodes... I.e.,
there's no such thing as executing "garbled data" on x86. Any randomly
generated "garbled data" will most likely result in generating one of the
single-byte opcodes. They'll execute. Occasionally, it may be a multi-byte
instruction, which will execute also. And, rarely, it may be invalid
multi-byte opcode, which will be trapped...

> how does the prefetching
> process know where the "instruction boundaries" are?

Why would it need to "know"?

The instruction decoder(s) decodes the instruction after some bytes have
already been (pre)fetched...

The 8086 has a 6-byte buffer prefetch. Later versions will likely have much
more, but at least 15, since that's the instruction size limit on 386 or
later.

> Where will EIP
> be when the inevitable invalid opcode exception is triggered?

Still pointing at the "inevitable invalid opcode", perhaps? If it's invalid
and the microprocessor prevented it from being executed, why would
it have a known size? ...

If an "invalid opcode" had a determinable size, that means the
microprocessor is able to decode an invalid opcode. That implies that the
microprocessor doesn't detect "invalid opcodes". x86 does (from 286?...).
Early micro's used an instruction mask, so they didn't have any invalid
opcodes. On them, an "invalid opcode" actually did *something*, but usually
something not that useful... Those micro's didn't detect invalid opcodes
prior to execution.

> In other words, if the instructions are garbage, how much garbage is
> taken in? What are the rules?
>

What rules?

It probably varies from microprocessor generation to microprocessor
generation, and manufacturer to manufacturer.

(Or, as a programmer, why would you *ever* need to know?)

> My guess is, each possible opcode byte has something like a lookup
> table entry, and after parsing a byte, the prefetcher either adds
> another byte to the instruction, adds a modr/m byte to the instruction
> and grabs displacement and immediate bytes, or ends the instruction
> and sends it to the pipeline. This is entirely based on inference, I
> can't find anything in the manuals to confirm or deny this.
>

That might've been true on an 8086/88. But, on later processors, it's my
guess, that they read a large block of bytes at a time. The prefetched
bytes might be called a cache line in the cache...
http://en.wikipedia.org/wiki/CPU_cache

What's known is the maximum x86 instruction length:

386 or later has 15 byte maximum
286 has a 10 byte maximum
86/88 has no maximum, but the instruction size was 1 to 4 bytes

Some info on the 8086:

8086 has a 6-byte buffer prefetch
8086 instruction size 1 to 4 bytes
8086 21-bit microinstructions (504)
8086 two stage pipeline

> Whatever process it uses, it MUST be entirely deterministic, or code
> can't be. So where is it documented?

It's probably proprietary. But, there is some public information on how
various x86 microprocessors work, including prefetch. E.g.,

Inside AMD64 Architecture (See page 6)
http://www.hardwaresecrets.com/article/Inside-AMD64-Architecture/324/1

Into the Core: Intel's next-generation microarchitecture (maybe see page 5)
http://arstechnica.com/hardware/news/2006/04/core.ars/

You can find articles like that when searching for "reorder buffer" or
"micro-ops" or "macro-fusion" or "microinstructions" inconjunction with
"x86" or "AMD64" etc.

Rod Pemberton

Alexei A. Frounze

unread,

Jul 6, 2010, 3:07:57 AM7/6/10

to

On Jul 5, 2:22 pm, Nimai <phl...@nospicedham.gmail.com> wrote:
> I'm learning to program in straight machine code, and I just finished
> reading the Intel manuals.
>
> I have a burning question that the books haven't answered, maybe I'm
> just stupid and I missed it.
>
> If I do a JMP to a bunch of garbled data, how does the prefetching
> process know where the "instruction boundaries" are?

I'm not sure what exactly you mean here by prefetching and boundaries
together.

> Where will EIP
> be when the inevitable invalid opcode exception is triggered?

At the entry point of the #UD handler, which will have on its stack
the address of that invalid instruction.

> In other words, if the instructions are garbage, how much garbage is
> taken in? What are the rules?

If that "instruction" causes a #UD, none. #UD is a fault type of
exception. As such, returning from the #UD handler will force the CPU
to try to execute that instruction again. If it doesn't cause a #UD,
chances are it's an (officially) undocumented instruction.

> My guess is, each possible opcode byte has something like a lookup
> table entry, and after parsing a byte, the prefetcher either adds
> another byte to the instruction, adds a modr/m byte to the instruction
> and grabs displacement and immediate bytes, or ends the instruction
> and sends it to the pipeline. This is entirely based on inference, I
> can't find anything in the manuals to confirm or deny this.
>
> Whatever process it uses, it MUST be entirely deterministic, or code
> can't be.

Surely, the behavior is deterministic.

> So where is it documented?

Perhaps, Intel/AMD internal documentation that you're not going to get
access to?

There are a number of places in their public documents where things
are either not described in full detail and there're statements like
"should (not)", "undefined" and such. If you carefully study the
explanation of shift/rotate instructions, you'll find "undefined"
there. These instructions, of course, do not generate random results
in the "undefined" cases, they consistently produce the same output
for the same inputs on the same CPU. But this output may be different
on different CPUs. I've found two different implementations of shift/
rotate on Intel CPUs and one on AMD, and all three are different.

You may try contacting Intel/AMD, but be prepared to get ignored or
RTFM'd by them. After all, why should they care about you and others
like you? Seriously? You'd only incur support costs.

Alex

BGB / cr88192

unread,

Jul 6, 2010, 4:51:07 AM7/6/10

to

"Nimai" <phl...@nospicedham.gmail.com> wrote in message

news:a45472a9-e845-46f8...@q21g2000prm.googlegroups.com...

<--

Theories are nice, but where's the proof? I need word of god on this.

-->

you can't prove anything...

it is like, one can ask themselves how can they be certain they exist.
the best answer would seem to be that one can look at their hand, and infer
that only someone who exists can raise the question as to whether or not
they exist. but, at best, this is the hueristic...

how about this view of the world:
there are no guerantees, and there are no absolutes;
all is, at best, probability, hueristics, and guesswork.

really, it is a world build up in the air, a world built of words and
suppositions.

so we take these guesses and arbitrary statements, and simply pretend as if
they were the truth, and at any moment the specifics may be changed and
revised, and then we have some new "absolute" reality...

one can assert that reality is absolute, but how can one prove it?...
how can one prove what, if anything, then, is absolute?...
for all it matters, all that has been seen thus far could be instead
synthetic behavior, the world we see instead being a piece of machinery
built on top of some other, almost entirely different, universe.

and, one can ask, really, what does it matter?...

the world we live in could just as easily be folded up into a paper crane
for what it matters.

<--

Also, your method works for fixed instruction lengths, but it doesn't
describe what happens to extra bytes trailing after an invalid opcode
is encountered. What if the modr/m byte is screwed up? Does it treat
immediate/displacement data as an additional, invalid opcodes, or does
it throw them away? What if an opcode is corrupted, and its immediate
data or displacement contains a valid opcode?

-->

simple answer:
ModR/M can't be screwed up...
why?... because all possible encodings are valid.
likewise for SIB and displacement...
one gets different results, but nothing can be "wrong" with these bytes as
far as the CPU is concerned (except when part of the opcode goes into "reg",
but then it is a UD as before...).

<--

These are all boundary conditions, sure, and probably useless to most
people, but I need to know!

I want to be able to take any possible stream of bits, no matter how
full of garbage, and be able to know EXACTLY what will happen.

I do NOT want to have to fuzz test my CPU. These things should be
documented.

Again, my question is this: Where is this documented?

Where does Intel specifically say, "instructions are parsed according
to these rules:"?

-->

but, anyways, besides what is documented, the CPU is free to do whatever,
and really this depends a lot on the CPU.

after all, if the CPU had "absolute" behavior, where would there be all the
special edge-cases left to hack over and redefine as new behavior?...

there are almost always little specific details depending on the specific
vendor and model of processor, and before CPUID this was commonly how people
identified which version of which processor was in use...

Bob Masta

unread,

Jul 6, 2010, 8:34:17 AM7/6/10

to

It seems to me that the only real question here is where EIP
will be when the invalid opcode exception is triggered:
Will it be at the first byte of the garbage, or at the byte
where the decoder decides it is garbage?

Personally, I'd want EIP to point to the start of the bad
instruction, not several bytes along. I can't imagine any
use for the latter approach.

But wouldn't this question be easy enough to solve via
experiment? Make up a known-bad "opcode" and see what
happens. Surely if a few trials with different code show
EIP at the start, that would be convincing evidence of the
behavior... it would be *really* strange if the decoder
didn't use the same method every time.

Best regards,

Bob Masta

DAQARTA v5.10
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
Frequency Counter, FREE Signal Generator
Pitch Track, Pitch-to-MIDI
DaqMusic - FREE MUSIC, Forever!
(Some assembly required)
Science (and fun!) with your sound card!

Frank Kotler

unread,

Jul 6, 2010, 9:07:13 AM7/6/10

to

I haven't a clue. I'm with Bob Masta - try it and see! ("one test is
worth a thousand expert opinions") But I observe that guys who design
the chips hang out on comp.arch so I'll cross-post it there, in hopes
that it may get you a definitive answer (which may be "it's proprietary,
we can't tell ya"). Good luck!

Best,
Frank

Joe Pfeiffer

unread,

Jul 6, 2010, 9:25:30 AM7/6/10

to

> Nimai wrote:
>> I'm learning to program in straight machine code, and I just finished
>> reading the Intel manuals.
>>
>> I have a burning question that the books haven't answered, maybe I'm
>> just stupid and I missed it.
>>
>> If I do a JMP to a bunch of garbled data, how does the prefetching
>> process know where the "instruction boundaries" are? Where will EIP
>> be when the inevitable invalid opcode exception is triggered?
>>
>> In other words, if the instructions are garbage, how much garbage is
>> taken in? What are the rules?
>>
>> My guess is, each possible opcode byte has something like a lookup
>> table entry, and after parsing a byte, the prefetcher either adds
>> another byte to the instruction, adds a modr/m byte to the instruction
>> and grabs displacement and immediate bytes, or ends the instruction
>> and sends it to the pipeline. This is entirely based on inference, I
>> can't find anything in the manuals to confirm or deny this.
>>
>> Whatever process it uses, it MUST be entirely deterministic, or code
>> can't be. So where is it documented?

Why should it be documented? What you've described is conceptually how
it works; all that's left that matters to the programmer is how many
instructions of what type can be decoded simultaneously (since that can
affect optimization).

As for when you get a fault, that depends on just what the garbling is.
NX bit set? Immediately.

Bad opcode? Immediately.

Ends up trying to read/write data from invalid address? Immediately,
but it'll be a proetection fault on the data address.

Made it past the first "instruction"? On to the second...
--
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)

James Harris

unread,

Jul 6, 2010, 10:19:01 AM7/6/10

to

On 5 July, 22:22, Nimai <phl...@nospicedham.gmail.com> wrote:

> I'm learning to program in straight machine code, and I just finished
> reading the Intel manuals.
>
> I have a burning question that the books haven't answered, maybe I'm
> just stupid and I missed it.
>
> If I do a JMP to a bunch of garbled data, how does the prefetching
> process know where the "instruction boundaries" are? Where will EIP
> be when the inevitable invalid opcode exception is triggered?
>
> In other words, if the instructions are garbage, how much garbage is
> taken in? What are the rules?

At machine level what determines the *meaning* of a byte is how it is
*used*. If you move part of an instruction into a register it is not
treated as an instruction but as a piece of data. It's just a pattern
of ones and zeros. Conversely if you try to execute some of your data
then, for the purposes of the CPU, it is not taken as data but
effectively *is* an instruction. Again, it's just a pattern of ones
and zeros. For example if you move into a register a byte containing
the value 83 (0x53) it will be taken as the number 83. If you try to
execute that byte (e.g. by jumping to it) it will be taken as the
instruction to push the EBX register on to the stack.

If the CPU can make sense of the instruction it will do whatever the
"instruction" tells it to do and then go on to the next instruction.
If it can't make sense of it it will issue an undefined opcode
exception.

So to answer your question as to how much garbage is taken in the
answer is that it takes in and executes as much of it as makes sense
(i.e. decodes to valid and permissible instructions). If and when it
comes across some bit patterns which are not valid instructions it
will generate an exception.

>
> My guess is, each possible opcode byte has something like a lookup
> table entry, and after parsing a byte, the prefetcher either adds
> another byte to the instruction, adds a modr/m byte to the instruction
> and grabs displacement and immediate bytes, or ends the instruction
> and sends it to the pipeline. This is entirely based on inference, I
> can't find anything in the manuals to confirm or deny this.

I think that's a good working model.

> Whatever process it uses, it MUST be entirely deterministic, or code
> can't be. So where is it documented?

Modern x86 CPUs have documented behaviour. IIRC the old 8086 didn't
take an exception if it came across garbage that didn't decode to
valid instructions. Who knows what it did but the action possibly
depended on the internals of a given fabrication and wasn't
documented.

For anything modern check the reference manual under Interrupts and
Exceptions. See Exception 6, invalid opcode. It is classified as a
"fault." This means that the CPU will wind back to the beginning of
the faulting instruction before invoking the exception handler.

"Faults — ... When a fault is reported, the processor restores the
machine state to the state prior to the beginning of execution of the
faulting instruction. The return address (saved contents of the CS and
EIP registers) for the fault handler points to the faulting
instruction, rather than to the instruction following the faulting
instruction."

Not all CPUs conveniently wind back. Some stop at whereever they got
to making it virtually impossible to tell the start of the instruction
they are complaining about. X86 is OK though.

James

nedbrek

unread,

Jul 6, 2010, 8:07:34 PM7/6/10

to

Hello,
Welcome comp.lang.asm.x86!

"Joe Pfeiffer" <pfei...@nospicedham.cs.nmsu.edu> wrote in message
news:1br5jg2...@snowball.wb.pfeifferfamily.net...

>> Nimai wrote:
>>> If I do a JMP to a bunch of garbled data, how does the prefetching
>>> process know where the "instruction boundaries" are? Where will EIP
>>> be when the inevitable invalid opcode exception is triggered?
>>>
>>> In other words, if the instructions are garbage, how much garbage is
>>> taken in? What are the rules?
>>>
>>> My guess is, each possible opcode byte has something like a lookup
>>> table entry, and after parsing a byte, the prefetcher either adds
>>> another byte to the instruction, adds a modr/m byte to the instruction
>>> and grabs displacement and immediate bytes, or ends the instruction
>>> and sends it to the pipeline. This is entirely based on inference, I
>>> can't find anything in the manuals to confirm or deny this.
>>>
>>> Whatever process it uses, it MUST be entirely deterministic, or code
>>> can't be. So where is it documented?
>

> As for when you get a fault, that depends on just what the garbling is.
> NX bit set? Immediately.
>
> Bad opcode? Immediately.
>
> Ends up trying to read/write data from invalid address? Immediately,
> but it'll be a proetection fault on the data address.
>
> Made it past the first "instruction"? On to the second...

That about sums it up!

Two aspects, architectural (what software sees) and hardware (what actually
happens).

The hardware is just going to shovel bits into the execution engine. An
advanced machine doesn't even look at the bits at first. Hardware further
down the line interprets the bits into instructions.

This part of the machine is very speculative, so it can never be sure a bad
branch somewhere won't make everything right. The machine won't flag any
bad decode until it is sure that the architectural path goes that way.

Any machine has to come to the same result as a simple, one
instruction-at-a-time machine would (maintaining the architectural
illusion). There are all sorts of nifty tricks to make this happen, but
rest assured the fault will be deterministic.

However, architecturally, there is only one invalid opcode instruction (0f
08) so anything else might run for a while. Also, new instructions get
added - so what happens to be invalid today might be a real instruction
tomorrow.

You might even manage to fall into an infinite loop! (jmp byte -2, eb fe)
Hope your environment has preemptive multitasking!

Hope that helps!
Ned

Andy 'Krazy' Glew

unread,

Jul 6, 2010, 7:39:15 PM7/6/10

to

On 7/6/2010 6:07 AM, Frank Kotler wrote:
> Nimai wrote:
>> I'm learning to program in straight machine code, and I just finished
>> reading the Intel manuals.
>>
>> I have a burning question that the books haven't answered, maybe I'm
>> just stupid and I missed it.
>>
>> If I do a JMP to a bunch of garbled data, how does the prefetching
>> process know where the "instruction boundaries" are? Where will EIP
>> be when the inevitable invalid opcode exception is triggered?
>>
>> In other words, if the instructions are garbage, how much garbage is
>> taken in? What are the rules?
>>
>> My guess is, each possible opcode byte has something like a lookup
>> table entry, and after parsing a byte, the prefetcher either adds
>> another byte to the instruction, adds a modr/m byte to the instruction
>> and grabs displacement and immediate bytes, or ends the instruction
>> and sends it to the pipeline. This is entirely based on inference, I
>> can't find anything in the manuals to confirm or deny this.
>>
>> Whatever process it uses, it MUST be entirely deterministic, or code
>> can't be. So where is it documented?

Nimai's guess is a fairly accurate description of what is treated as the defacto architectural definition.

The actual hardware is more like:
fetch 1 or 2 blocks of instructions (typically 16 byte aligned) containing the branch target
decode in parallel several instructions in those blocks starting at the branch target

i.e. it is done in parallel. Although there have been machines that could only decode one instruction at a time, if
never seen before. typically those machines have instruction predecode bits in the instruction cache, maybe even the L2,
and have rather poor performance on code they haaven't seen before.

But most modern machines can at least decode multiple bytes of a given instruction within a cycle. Typically via

Option 1:
assume first byte is an opcode byte
assume second is a modrm
assume 3rd-6th are an offset
Option 2:
assume first byte is a REX prefix or some other ptefix
assume second byte is an opcode byte
assume third is a modrm
assume 4rd-7th are an offset
..

and so on, in parallel, using whichever option matches.

But, the semantics are as if looked at a byte at a time.

MitchAlsup

unread,

Jul 6, 2010, 7:48:42 PM7/6/10

to

On Jul 6, 8:07 am, Frank Kotler <fbkot...@nospicedham.myfairpoint.net>
wrote:

> Nimai wrote:
> > I'm learning to program in straight machine code, and I just finished
> > reading the Intel manuals.
>
> > I have a burning question that the books haven't answered, maybe I'm
> > just stupid and I missed it.
>
> > If I do a JMP to a bunch of garbled data, how does the prefetching
> > process know where the "instruction boundaries" are? Where will EIP
> > be when the inevitable invalid opcode exception is triggered?

The EIP will point to the first instruction that has detectable
garbage. The key word, here, is detectable, as so very many byte
sequences are legal (if not very useable) opcodes.

> > In other words, if the instructions are garbage, how much garbage is
> > taken in? What are the rules?

It is wise to assume that at least 3 cache lines of garbage are
fetched before garbage is decoded.

> > My guess is, each possible opcode byte has something like a lookup
> > table entry, and after parsing a byte, the prefetcher either adds
> > another byte to the instruction, adds a modr/m byte to the instruction
> > and grabs displacement and immediate bytes, or ends the instruction
> > and sends it to the pipeline. This is entirely based on inference, I
> > can't find anything in the manuals to confirm or deny this.
>
> > Whatever process it uses, it MUST be entirely deterministic, or code
> > can't be. So where is it documented?

It ends up different on different architectures.

But your logic is sound, you are just not thinking in parallel. What
generally happens is that at least 4 bytes are fully decoded in to 256
signals per byte. Then various logic condenses the 256 signals (times
the number of bytes) to 50-ish, expecially fereting out prefixes (with
respect to operating mode). Then another layer of logic identifies the
major opcode byte. And the rest is simply a cascade of multiplexers.
One end result of all ths multiplexing is the start pointer for the
next instruction.

The major opcode byte specifies whether there are opcode bits in the
minor opcode byte (if present) and modr/m and SIB encodings. Knowing
if a minor, modr/m, or SIB is present and whether an immediate is
present gives you all that is necessary (prior to SSE4) to detrmine
the subsequent instruction boundary.

Bad opcodes are generally about another whole pipe stage down the pipe
from instruction parsing. There is no reason to clutter up a hard
problem with an intractable problem in a gate limited and fanlimited
pipestage. You still have at least 5 pipe stages before any damage is
done to machine state. Plenty of time to stumble accorss the myriad of
subtle invalid opcodes due to improper use of modr/m or SIBs or prefix
violations. And NO reason to get cute and try to do them earlier.

{I happen to know how to do this in 12 gate delays from RAW bytes and
3 instructions at a time in 8 gates with end pointer bits.}

All of this is also dependent on some sequencing decisions made in the
pipeline.

Mitch

Andy 'Krazy' Glew

unread,

Jul 6, 2010, 7:49:26 PM7/6/10

to

On 7/6/2010 5:07 PM, nedbrek wrote:
> Hello,

>>>> In other words, if the instructions are garbage, how much garbage is
>>>> taken in? What are the rules?
>>>>
>>>> My guess is, each possible opcode byte has something like a lookup
>>>> table entry, and after parsing a byte, the prefetcher either adds
>>>> another byte to the instruction, adds a modr/m byte to the instruction
>>>> and grabs displacement and immediate bytes, or ends the instruction
>>>> and sends it to the pipeline. This is entirely based on inference, I
>>>> can't find anything in the manuals to confirm or deny this.
>>>>
>>>> Whatever process it uses, it MUST be entirely deterministic, or code
>>>> can't be. So where is it documented?

By the way, stuff like this was documented, somewhat, in early processors such as the 8086. the manuals would say how
many instruction bytes had been prefetched, and you could deduce, from the manuals but also with a lot of
experimentation, that if did stuff like writing into precisely the next instruction it would not be seeen immediately
because that instruction had already been fetched.

People wrote code that depended on such behavior.

People wrote code that useed differences in such behavior to distinguish a 8086 from and 80286, etc.

Intel learned that it was bad to describe such model specific behavior in tooo much detail.

Over time, processors have gotten stricter and stricter, tending to implement "SMC immediattely", etc.