Please correct me if I have got this right. But from what I know, when
the x86 processor is seeking ahead and reaches a branch it has a
problem. That is it doesn't know what the outcome of the condition
until the program counter gets there.
So, it emulates both paths from the branch until the program counter
gets there. When the processor is seeking ahead/emulating the future
instructions, what exactly is it doing with the code - p-code or
whatever.
There aint much documentation on processors these days like there used
to years back. Even Wikipedia does not have much to say.
I have a registered copy of A386 that I intend to use soon, but my
knowledge of 386> is not much. Can anybody recommend a book, something
like the equivalent of the 8086 book.
TIA
Muscipula
Try the Intel docs, including the optimization guides - there's a lot
of information there. In addition, there are bunches of white papers
online describing internals.
Anyway, processors that speculatively execute instructions usually
just follow one path. At conditional branches, they try to predict,
usually based on history or some static information, which way the
branch will go. If they guess wrong, some mechanism undoes the
incorrectly speculated instructions, and the process back itself up to
the point where the incorrect prediction was made and starts again.
That usually involves a hefty performance penalty, so a good (IOW,
rarely incorrect) branch predictor is important.
It's not really that the CPU is emulating any instructions, it's just
trying to execute more of them, and all sorts of things are done to
allow a nominally sequential stream of instructions to execute in
parallel. For example register renaming to eliminate sequential
dependencies, etc. For example, the CPU might detect that it can
issue the next three add instructions in a row all in parallel, since
they don't conflict. Obviously a conditional branch presents a
barrier to that sort of thing, unless you can predict which way the
branch is going to go. And since you can't (usually) predict that
with 100% certainty, the CPU has to be able to back out a bad guess.
The exact mechanisms vary considerably, but most commonly CPUs prevent
committing any results to memory that might need to be undone (many
ISAs, including x86, make that a requirement), so any "false" state is
contained within the processor.
Muscipula
Although, unless my memory has gone completely faulty, the more recent
processors actually use register renaming to execute BOTH paths of the
conditional, and simply discard half of the results when the result of the
jump is finally known.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.
I'm not aware of any mainstream processors that actually speculatively
execute both legs of a branch in a general way. A major problem is
the combinatorial explosion - after all there's a conditional branch
every 5-10 instructions in most cases, and once you get past three or
four of those 90% of the instructions you're issuing are going to be
discarded. And since execution resources are limited, executing
useless instructions will negatively impact performance. This also
impacts power consumption rater severely, which is a big issue these
days.
Several research processors do manage something like that, and in some
cases some CPUs can rewrite short blocks in a predicated fashion (for
example, a forward conditional branch that skips a small number of non-
branching instructions), but it's not really common.
OTOH, processors with predicated ISAs, like IPF, do allow both legs of
an condition to execute, and to avoid committing either leg until the
predicate is known. IPF compilers manage a steep reduction in the
number of conditional branches because they can convert most small if
blocks to predicated sequences. A major hazard is overdoing that,
since you again end up with the processor doing a large amount of
useless work.
In any event, executing both legs would be of value only if you didn't
have a pretty good prediction on the direction of the branch, and in
practice, branch predictors are quite good. The tradeoff is
approximately the misspredict rate times the time wasted on the wrong
path plus misspredict (backup) penalty, vs. the time wasted always
executing instruction from the wrong path. Unless the misspredict
rate is high, or enough extra resources are available to execute the
wrong leg with little penalty (or equivalently, the wrong path is very
short), trying to run down both legs isn't going to help - which
unfortunately is the case much (although certainly not all) of the
time.
[ ... ]
> Please correct me if I have got this right. But from what I know, when
> the x86 processor is seeking ahead and reaches a branch it has a
> problem. That is it doesn't know what the outcome of the condition
> until the program counter gets there.
>
> So, it emulates both paths from the branch until the program counter
> gets there. When the processor is seeking ahead/emulating the future
> instructions, what exactly is it doing with the code - p-code or
> whatever.
No -- the whole point of branch prediction is that it does NOT emulate
both paths following a (conditional) branch. Rather, it guess whether
you're likely to take that branch or not.
To do that, it typically stores some data along with the instruction in
the cache. If the instruction is newly fetched into the cache, it
guesses based on the type of the instruction itself: IIRC, the algorithm
is pretty simple: backward jumps are predicted as taken, since they're
usually forming loops that will usually be taken far more often than not
(and the loop instruction is always predicted as taken, for the same
basic reason). I believe most forward jumps are predicted as not taken,
but I don't remember for sure.
In the original Pentium, the dynamic branch prediction was pretty
simple: it had four states: strongly taken, weakly taken, weakly not
taken and strongly not taken. Each time a branch isn't taken, the state
is moved one step toward strongly not taken. Each time a branch is
taken, the state is moved one step toward strongly taken. When it comes
time to predict the branch, either weakly taken or strongly taken is
predicted as taken, and either weakly not-taken or strongly not taken is
predicted as not taken.
On the more recent processors this has gotten considerably more
sophisticated -- the Pentium IV, especially, had such a long pipeline
that the most accurate possible branch prediction was crucial to getting
decent performance at all. To do that, they added a Branch History
Table, to help it find patterns in the branches -- for example, if you
take a particular branch every third time through the code, it could
find that, and predict it accurately.
Of course, as you mention, in the most recent processors they've gotten
considerably more cagey about what they reveal, so it's a bit harder to
be sure. What little I've seen makes it look like the Core processors
haven't added a lot in the way of new branch prediction. Given their
shorter pipelines, this is little surprise, since that reduces the
penalty for a mis-predicted branch a great deal.
--
Later,
Jerry.
The universe is a figment of its own imagination.
> In the original Pentium, the dynamic branch prediction was pretty
> simple: it had four states: strongly taken, weakly taken, weakly not
> taken and strongly not taken. Each time a branch isn't taken, the state
> is moved one step toward strongly not taken. Each time a branch is
> taken, the state is moved one step toward strongly taken. When it comes
> time to predict the branch, either weakly taken or strongly taken is
> predicted as taken, and either weakly not-taken or strongly not taken is
> predicted as not taken.
No. Read some documentation and repost.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
[ ... ]
> No. Read some documentation and repost.
If you want to make a useful post that corrects whatever you think is
wrong, feel free. Otherwise admit that the Pentium is sufficiently old
than nothing beyond the general idea matters to anybody anymore. For
that matter, even if you DO bother, it'll still be too obsolete for any
more than the basic idea to matter.
Or, if you prefer a post as rude as your own: No. Get a clue of how to
contribute and repost.
James, don't be too harsh!
What Jerry wrote is a pretty good description of how the Pentium did work.
Agner Fog has the definitive treatment.
Terje
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"