Branch prediction

Muscipula

unread,

Feb 26, 2008, 9:24:29 AM2/26/08

to

There is something that is confusing about what I understand about
branch prediction. If the processor is reading ahead of the program
counter then it must be emulating the instructions at least and I find
that unusual.

Please correct me if I have got this right. But from what I know, when
the x86 processor is seeking ahead and reaches a branch it has a
problem. That is it doesn't know what the outcome of the condition
until the program counter gets there.

So, it emulates both paths from the branch until the program counter
gets there. When the processor is seeking ahead/emulating the future
instructions, what exactly is it doing with the code - p-code or
whatever.

There aint much documentation on processors these days like there used
to years back. Even Wikipedia does not have much to say.

I have a registered copy of A386 that I intend to use soon, but my
knowledge of 386> is not much. Can anybody recommend a book, something
like the equivalent of the 8086 book.

TIA

Muscipula

spam...@crayne.org

unread,

Feb 26, 2008, 7:24:14 PM2/26/08

to

Try the Intel docs, including the optimization guides - there's a lot
of information there. In addition, there are bunches of white papers
online describing internals.

Anyway, processors that speculatively execute instructions usually
just follow one path. At conditional branches, they try to predict,
usually based on history or some static information, which way the
branch will go. If they guess wrong, some mechanism undoes the
incorrectly speculated instructions, and the process back itself up to
the point where the incorrect prediction was made and starts again.
That usually involves a hefty performance penalty, so a good (IOW,
rarely incorrect) branch predictor is important.

It's not really that the CPU is emulating any instructions, it's just
trying to execute more of them, and all sorts of things are done to
allow a nominally sequential stream of instructions to execute in
parallel. For example register renaming to eliminate sequential
dependencies, etc. For example, the CPU might detect that it can
issue the next three add instructions in a row all in parallel, since
they don't conflict. Obviously a conditional branch presents a
barrier to that sort of thing, unless you can predict which way the
branch is going to go. And since you can't (usually) predict that
with 100% certainty, the CPU has to be able to back out a bad guess.

The exact mechanisms vary considerably, but most commonly CPUs prevent
committing any results to memory that might need to be undone (many
ISAs, including x86, make that a requirement), so any "false" state is
contained within the processor.

Muscipula

unread,

Feb 27, 2008, 2:15:49 AM2/27/08

to

Thanks Robert that is my morphine for a couple of minutes...

Muscipula

Tim Roberts

unread,

Feb 28, 2008, 12:14:55 AM2/28/08

to

"robert...@yahoo.com" <spam...@crayne.org> wrote:
>
>Try the Intel docs, including the optimization guides - there's a lot
>of information there. In addition, there are bunches of white papers
>online describing internals.

>...

>It's not really that the CPU is emulating any instructions, it's just
>trying to execute more of them, and all sorts of things are done to
>allow a nominally sequential stream of instructions to execute in
>parallel. For example register renaming to eliminate sequential
>dependencies, etc. For example, the CPU might detect that it can
>issue the next three add instructions in a row all in parallel, since
>they don't conflict. Obviously a conditional branch presents a
>barrier to that sort of thing, unless you can predict which way the
>branch is going to go. And since you can't (usually) predict that
>with 100% certainty, the CPU has to be able to back out a bad guess.

Although, unless my memory has gone completely faulty, the more recent
processors actually use register renaming to execute BOTH paths of the
conditional, and simply discard half of the results when the result of the
jump is finally known.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

spam...@crayne.org

unread,

Feb 28, 2008, 1:13:10 AM2/28/08

to

On Feb 27, 11:14 pm, Tim Roberts <spamt...@crayne.org> wrote:
> Although, unless my memory has gone completely faulty, the more recent
> processors actually use register renaming to execute BOTH paths of the
> conditional, and simply discard half of the results when the result of the
> jump is finally known.

I'm not aware of any mainstream processors that actually speculatively
execute both legs of a branch in a general way. A major problem is
the combinatorial explosion - after all there's a conditional branch
every 5-10 instructions in most cases, and once you get past three or
four of those 90% of the instructions you're issuing are going to be
discarded. And since execution resources are limited, executing
useless instructions will negatively impact performance. This also
impacts power consumption rater severely, which is a big issue these
days.

Several research processors do manage something like that, and in some
cases some CPUs can rewrite short blocks in a predicated fashion (for
example, a forward conditional branch that skips a small number of non-
branching instructions), but it's not really common.

OTOH, processors with predicated ISAs, like IPF, do allow both legs of
an condition to execute, and to avoid committing either leg until the
predicate is known. IPF compilers manage a steep reduction in the
number of conditional branches because they can convert most small if
blocks to predicated sequences. A major hazard is overdoing that,
since you again end up with the processor doing a large amount of
useless work.

In any event, executing both legs would be of value only if you didn't
have a pretty good prediction on the direction of the branch, and in
practice, branch predictors are quite good. The tradeoff is
approximately the misspredict rate times the time wasted on the wrong
path plus misspredict (backup) penalty, vs. the time wasted always
executing instruction from the wrong path. Unless the misspredict
rate is high, or enough extra resources are available to execute the
wrong leg with little penalty (or equivalently, the wrong path is very
short), trying to run down both legs isn't going to help - which
unfortunately is the case much (although certainly not all) of the
time.

Jerry Coffin

unread,

Mar 9, 2008, 9:58:29 PM3/9/08

to

In article <6a8fb857-0761-4c9a-8005-1bd75b06c2e6
@e6g2000prf.googlegroups.com>, spam...@crayne.org says...

[ ... ]

> Please correct me if I have got this right. But from what I know, when
> the x86 processor is seeking ahead and reaches a branch it has a
> problem. That is it doesn't know what the outcome of the condition
> until the program counter gets there.
>
> So, it emulates both paths from the branch until the program counter
> gets there. When the processor is seeking ahead/emulating the future
> instructions, what exactly is it doing with the code - p-code or
> whatever.

No -- the whole point of branch prediction is that it does NOT emulate
both paths following a (conditional) branch. Rather, it guess whether
you're likely to take that branch or not.

To do that, it typically stores some data along with the instruction in
the cache. If the instruction is newly fetched into the cache, it
guesses based on the type of the instruction itself: IIRC, the algorithm
is pretty simple: backward jumps are predicted as taken, since they're
usually forming loops that will usually be taken far more often than not
(and the loop instruction is always predicted as taken, for the same
basic reason). I believe most forward jumps are predicted as not taken,
but I don't remember for sure.

In the original Pentium, the dynamic branch prediction was pretty
simple: it had four states: strongly taken, weakly taken, weakly not
taken and strongly not taken. Each time a branch isn't taken, the state
is moved one step toward strongly not taken. Each time a branch is
taken, the state is moved one step toward strongly taken. When it comes
time to predict the branch, either weakly taken or strongly taken is
predicted as taken, and either weakly not-taken or strongly not taken is
predicted as not taken.

On the more recent processors this has gotten considerably more
sophisticated -- the Pentium IV, especially, had such a long pipeline
that the most accurate possible branch prediction was crucial to getting
decent performance at all. To do that, they added a Branch History
Table, to help it find patterns in the branches -- for example, if you
take a particular branch every third time through the code, it could
find that, and predict it accurately.

Of course, as you mention, in the most recent processors they've gotten
considerably more cagey about what they reveal, so it's a bit harder to
be sure. What little I've seen makes it look like the Core processors
haven't added a lot in the way of new branch prediction. Given their
shorter pipelines, this is little surprise, since that reduces the
penalty for a mis-predicted branch a great deal.

--
Later,
Jerry.

The universe is a figment of its own imagination.

James Van Buskirk

unread,

Mar 9, 2008, 11:11:20 PM3/9/08

to

"Jerry Coffin" <spam...@crayne.org> wrote in message
news:MPG.223e43ad...@news.sunsite.dk...

> In the original Pentium, the dynamic branch prediction was pretty
> simple: it had four states: strongly taken, weakly taken, weakly not
> taken and strongly not taken. Each time a branch isn't taken, the state
> is moved one step toward strongly not taken. Each time a branch is
> taken, the state is moved one step toward strongly taken. When it comes
> time to predict the branch, either weakly taken or strongly taken is
> predicted as taken, and either weakly not-taken or strongly not taken is
> predicted as not taken.

No. Read some documentation and repost.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Jerry Coffin

unread,

Mar 10, 2008, 12:14:03 AM3/10/08

to

In article <qdadnf4kNqhEO0na...@comcast.com>,
spam...@crayne.org says...

[ ... ]

> No. Read some documentation and repost.

If you want to make a useful post that corrects whatever you think is
wrong, feel free. Otherwise admit that the Pentium is sufficiently old
than nothing beyond the general idea matters to anybody anymore. For
that matter, even if you DO bother, it'll still be too obsolete for any
more than the basic idea to matter.

Or, if you prefer a post as rude as your own: No. Get a clue of how to
contribute and repost.

Terje Mathisen

unread,

Mar 10, 2008, 3:52:03 AM3/10/08

to

James Van Buskirk wrote:
> "Jerry Coffin" <spam...@crayne.org> wrote in message
> news:MPG.223e43ad...@news.sunsite.dk...
>
>> In the original Pentium, the dynamic branch prediction was pretty
>> simple: it had four states: strongly taken, weakly taken, weakly not
>> taken and strongly not taken. Each time a branch isn't taken, the state
>> is moved one step toward strongly not taken. Each time a branch is
>> taken, the state is moved one step toward strongly taken. When it comes
>> time to predict the branch, either weakly taken or strongly taken is
>> predicted as taken, and either weakly not-taken or strongly not taken is
>> predicted as not taken.
>
> No. Read some documentation and repost.

James, don't be too harsh!

What Jerry wrote is a pretty good description of how the Pentium did work.

Agner Fog has the definitive treatment.

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"