> On Sunday, April 29, 2018 at 12:18:57 PM UTC-4, EricP wrote:
>> The word scoreboard is quite overloaded, so you need to be more
>> explicit in describing what microarchitecture you are thinking about.
>> When a processor uses a "scoreboard" you really have to look
>> at exactly how it works to see how it is the same or different.
> I was using the term scoreboarding both loosely and specifically. How come both... I was contemplating centralized control circuitry vs the distributed method found in Tomasulo. So there is both the possibility of scoreboarding with register renaming, and there are other possibilities.
>> Adding a renamer in the instruction decode stage:
>> - eliminates WAR and WAR hazard detection and it becomes
>> subsumed by a more general resource availability check
>> for a free physical register.
>> Note that physical register status now tracks pending reads,
>> so a physical register is free if it is:
>> not the architecture register, not busy, and no reads pending.
>> - allows rollback to prior committed architected state
>> implementing precise exceptions.
> First you wrote WAR and WAR, I assume you meant WAR and WAW. Second I thought the renamer would be relevant in the issue and read operands stage since there is no instruction decode stage in a scoreboard.
Yes, that was a typo - should have been WAR and WAW.
Sure there is an instruction decoder for a scoreboarded cpu.
The scoreboard is a a kind of scheduler, it tracks dependencies
and decides when to issue to the FU's.
The renamer removes architectural register dependencies
but it also separates the execute completion from state commit.
The renamer contains two sets of maps from architecture to
physical registers, the future set and the committed set.
At state commit, the committed map gets updated with the
latest physical register. If something goes wrong, the committed
set can be copied into the future set effecting a rollback.
>> It the end we have:
>> - in order issue, out-of-order execute, out-of-order complete
>> - with RAW hazards handled by stall in the register read stage
>> - stall for free physical register in the instruction decode stage
>> - stall for available FU in issue stage
>> - no forwarding
>> - precise interrupts
> I thought reorder buffers allowed for precise interrupts. Register renaming alone does not. In fact you state out of order completion which implies imprecise interrupts.
To me it is the renamer that records the committed vs future state,
and that is what creates precise interrupts.
Or to put it a different way, one could build a cpu with
a ROB and OoO scheduler and completion but _without_ a renamer.
It would NOT have precise interrupts.
I am trying to look at each potential change in isolation.
What is required to add JUST a renamer?
What is required for JUST OoO issuing and completion?
What is required for JUST forwarding?
A ROB is typically also associated with Out-of-Order issuing
and has other features associated with it,
and I was trying to separate out the changes for just adding
a renamer while retaining the original in-order issue.
But yes, adding the renamer would need something to trigger
the renamer state commit. I wanted something very simple.
I thought about various shift registers but none really worked out.
The simplest I can think of is a circular buffer with just
an architecture register#, a physical register#, and a Done flag.
It didn't seem fair to call that a ROB though that is partly its job.
>> Now the question becomes, once you have done all this,
>> how did the performance change?
>> Is having precise interrupts sufficient cause to
>> justify adding renaming?
> Again you're not adding precise interrupts with register renaming but you are getting rid of the stalls caused by RAW and WAW hazards. Which are the primary reasons Tomasulo is abstractly better than Scoreboarding.
Yes it does make interrupts precise if you add some minimal
support logic, like that circular buffer I mention above.
It is the renamer that allows the state commit vs rollback.
Tomasulo bypasses the in-order issue bottleneck with reservation stations.
It ALSO has a renamer and therefore a future vs committed state.
>> Reservation stations (with renamer) allows distribution of the
>> pending instructions from the decoder stage to the various stations
>> so you don't stall at the issue stage for an available FU.
>> Instead each RS tracks its own FU and data availability,
>> and does its own OoO scheduling and OoO complete.
>> But now you have to figure out how to get all the necessary
>> information and data to each RS.
>> One question, for example, is how many registers should each RS have?
>> Too few and you stall at decode, too many and they sit idle.
> I understand the purpose of reservation stations but I wonder if this distributed design (physically) has a positive impact on clock speed vs a centralized solution. Also the only reason the scoreboard in the 6600 stalled if a FU was busy was that the FU's weren't pipelined. That should be easily remedied. In fact IIRC the successor to the 6600 the 7600(???) implemented pipelined FU's.
From a concurrency tracking point of view you can have
multiple FU's, or pipelined FU's, or multiple pipelined FU's.
Both have multiple instructions in flight.
But a pipeline can only start or complete one instruction per clock,
while multiple FU's start or complete multiple instructions per clock
and so can have more port and bus resource contentions.