Terje Mathisen <"terje.mathisen at
tmsw.no"@
giganews.com> had
this to say:
>> You could replace the "push ax" with "sub sp,2" -- 3 bytes vs
1 but
>> executes in 4 clocks as opposed to 11/15 (according to
Helppc).
> Don't believe it:
> SUB SP,2 has to take at the very minimum 4 clocks for each of
the 3 code
> bytes, so 12 clocks is the actual timing.
This is completely false. The 8086 did not have to wait for an
instruction to complete before it could fetch the next one from
the memory store (fortunately!). Code bytes would be prefetched
and predecoded in advance of execution.
The 8086/8088 was an /advanced/ microprocessor (don't laugh!).
Things ran in parallel, with anticipation. Specifically there
are 2 internal queues involved in actual instruction
timing/thruput calculations, viz the prefetch queue (into which
instruction bytes are accumulated whenever the bus is not
occupied by other tranfers such as fetching of operands and
storing back results), and the predecode queue which does some
decoding of the actually prefetched instructions in advance,
these two activities : prefetching and predecoding taking place
in parallel with actual execution of instructions.
Because of this the time taken by one instruction to complete is
more easily determined, in general, by measuring than using the
tables provided by Intel. In the case of a simple SUB SP,n
however, there are no data to load from or store to memory, and
as recalled above instruction bytes were prefetched (and
decoded) well in advance, so that the time from the table would
probably match the actual measured execution time as part of
your program.
HTH. By the way, as part of Intel's struggle against speed
bottlenecks, both prefetch and predecode queues got larger and
larger from 8088 to 8086 to 286 to 386 (startying with the 486,
the architecture became very different).
--
Nimbus