I found the following paraph is hard to understand.
*The following results are seen from a simulation study of five floating-point benchmarks and two integer benchmarks from the SPEC92 suite. The branch misprediction rate nearly doubles from 5% to 9.1% going from 1 thread to 8 threads in an SMT processor. However, the wrong-path instructions fetched (on a misprediction) drops from 24% on a single-threaded processor to 7% on an 8-thread processor.* *<<<<<* ** ** since I assume each thread has its own branch predictor anyway or they normally be shared in the whole processor ?
From the context, a shared branch predictor is assumed. Power 5 uses a shared one (only RAS is separate), for example. I would assume normally a SMT processor uses shared predictor.
> I found the following paraph is hard to understand.
> The following results are seen from a simulation study of five > floating-point benchmarks and two integer benchmarks from the SPEC92 suite. > The branch misprediction rate nearly doubles from 5% to 9.1% going from 1 > thread to 8 threads in an SMT processor. However, the wrong-path > instructions fetched (on a misprediction) drops from 24% on a > single-threaded processor to 7% on an 8-thread processor. > <<<<<
> since I assume each thread has its own branch predictor anyway or they > normally be shared in the whole processor ?
Thanks Navy Ant. Two further thoughts & questions:
a. What do you mean by RAS, b. For the second fact the SMT got less wrong path fetched,I am not sure how I understand it correctly. But I can make a stupid example. If I got 8 thread(e.g. assume it exactly the same), it will fetch 8N wrong instructions. However, in a 8-thread processor, if i ignore the pipelien setup cost, i only fetch N wrong instructions, is this scenario correct?
> From the context, a shared branch predictor is assumed. Power 5 uses a > shared one (only RAS is separate), for example. I would assume > normally a SMT processor uses shared predictor.
> 2008/8/26 yao gang <nobond...@gmail.com>: > > I found the following paraph is hard to understand.
> > The following results are seen from a simulation study of five > > floating-point benchmarks and two integer benchmarks from the SPEC92 > suite. > > The branch misprediction rate nearly doubles from 5% to 9.1% going from 1 > > thread to 8 threads in an SMT processor. However, the wrong-path > > instructions fetched (on a misprediction) drops from 24% on a > > single-threaded processor to 7% on an 8-thread processor. > > <<<<<
> > since I assume each thread has its own branch predictor anyway or they > > normally be shared in the whole processor ?
RAS = Return Address Stack. It is used to predict the targets of return instructions.
A SMT processor normally fetches instructions in a round-robin way from all ready threads. Those N threads are typically different. (Your assumption is not true in most cases). So the wrongly fetched instructions are only 1/N of that in the single thread case. Considering the prediction accuracy is compromised a little bit, it makes sense that the wrongly fetched instruction rate drops from 24% to 7%, rather than 3%.
> Thanks Navy Ant. Two further thoughts & questions:
> a. What do you mean by RAS, > b. For the second fact the SMT got less wrong path fetched,I am not sure > how I understand it correctly. But I can make a stupid example. If I got 8 > thread(e.g. assume it exactly the same), it will fetch 8N wrong > instructions. However, in a 8-thread processor, if i ignore the pipelien > setup cost, i only fetch N wrong instructions, is this scenario correct?
>> From the context, a shared branch predictor is assumed. Power 5 uses a >> shared one (only RAS is separate), for example. I would assume >> normally a SMT processor uses shared predictor.
>> 2008/8/26 yao gang <nobond...@gmail.com>: >> > I found the following paraph is hard to understand.
>> > The following results are seen from a simulation study of five >> > floating-point benchmarks and two integer benchmarks from the SPEC92 >> > suite. >> > The branch misprediction rate nearly doubles from 5% to 9.1% going from >> > 1 >> > thread to 8 threads in an SMT processor. However, the wrong-path >> > instructions fetched (on a misprediction) drops from 24% on a >> > single-threaded processor to 7% on an 8-thread processor. >> > <<<<<
>> > since I assume each thread has its own branch predictor anyway or they >> > normally be shared in the whole processor ?