On Wednesday, July 28, 2021 at 4:53:23 PM UTC-5, JimBrakefield wrote:
> Consider a portion of a RISC register file used in a write-once mode per loop execution on a nominally OO design: Register renaming is not needed.
<
OO is Object Oriented
OoO is Out of Order
<
> Loop instructions can be issued in any order, even simultaneously.
>
> Issues: what are the savings from not having register renaming?
<
Probably not enough to make this a key microarchitectural assumption. If you
have resourced the machine to run at peak performance for multiple milli-
seconds in a row, you already have enough rename read ports. If not, you
are already up sh!t creek.
<
But note: The VEC-LOOP construct in My 66000 uses a single rename over
the entire loop (body) and concatenates a loop iteration count to this name
making it loop unique without eating lots of rename space (and allowing
the front end to be quiescent during loop execution.
<
> Are there enough registers (e.g. is a larger register file needed)?
<
The general trick is that (logical) register names are used to setup the
data-flow dependencies. I might note that nothing in Livermore loops (*.c)
required anything more than the 32 GPRs in My 66000.
<
> For short loops mechanism needed to identify a register's loop number so multiple loop executions can exist simultaneously?
<
You need to solve the question of whether memory is dense and independent
and at this point rewriting into SIMD is fairly easy. The second problem is to
identify the produce in this loop and consume in this loop from the produce in
a previous loop and consumed in this loop. I call the former vector data and the
later loop-carried data.
<
> Loading of loop constant values from elsewhere in the register file into the functional units prior to loop execution?
<
My 66000 loads scalar register values into station entries during loop installation.
Subsequently, these values do not need to be read fro the RF on a per loop
iteration. Each station will await its vector or loop-carried dependencies just
like any reservation station entry would.
<
> A burst mechanism for initializing the function unit input buffers or reservation stations?
<
I did not find this necessary as it speeds up only the first iteration.
<
> This is a somewhat novel approach?
<
Sounds essentially what VVM does (or enables).........
<
> to high performance loops based on the type of iterations found in the Livermore Loops. As such the best ISA for
> it is undetermined? There is considerable room for innovation?
<
Having read the code for LL spit out from Brian's compiler, I can't see room
for a lot of improvement within the realm of RISC architectures, except perhaps
in code density.
>
> Jim Brakefield