On May 9, 4:32 pm,
timcaff...@aol.com wrote:
> Andy Glew mentioned explicit operand chaining in another
> thread. That would be one way of giving a hint to the
> microarch, especially in relation to forwarding results (it
> also potentially allows fewer "temporary" registers).
Some time ago I suggested using a zero register for one
time use results. After each read, the "zero" register
would be reset to zero, so it could be used for normal
zero register uses without extra overhead. (This would
not be binary compatible with most RISCs since nops and
certain hints [e.g., prefetches] are allowed to use the
zero register as a destination, but binary translation
would be fairly trivial [as would per-page modes to
support legacy operation, but PTE bits are a bit
scarce to be used for such a trivial use IMO].)
> It seems like the other big hit is letting the microarch
> know when a register's value will no longer be used. You
> can do this, somewhat, by XOR reg,reg on the x86, but it
> seems to me it might be more useful to have an instruction
> that explicitly says a register's value is dead. That can
> be potentially used to cause exceptions if it is used anyway
> and to reduce the state save/restore overhead for interrupts
> (and perhaps call/return).
Zero compression and similar special handling of zero values
might not be that difficult, so using register clear to avoid
save/restore overhead might be practical. In theory, one
could use SUB rX, rX as a signaling register clearing operation,
but preserving the signaling property through memory might be
problematic (at the same granularity as ECC coverage, a chunk
could be given a maximally uncorrectable error, perhaps with
the data value equal to the virtual address to indicate a
software issue rather than an actual detected uncorrectable
error).
> Both of these should be fairly straightforward to implement
> with current compiler technology.
Both of those are potential uses of specialized register
names, though the later might have a stronger architectural
component. (Explicit operand chaining might be provided as
a hint with the possibility of prediction being used to
choose to ignore the hint and choices about how expensive
incorrect hints would be [e.g., localized replay or
equivalent of an exception/branch misprediction].) Such are
also unlikely to greatly hinder register allocation
optimization.
Optimizations like grouping addresses by potential aliasing
(e.g., defining four collections of register names that
only allow internal aliasing when used as base addresses
for accessing memory) could facilitate some energy-saving
hardware optimizations (possibly even as hints), but such
would be more difficult for a compiler to use and would
interfere more with register allocation.
Using the same architectural register for all jump table
base addresses would potentially allow some optimization of
indirect branch predictors (both in making recognizing
load rX <- [rB + offset] as loading from "offset" entry
in a jump table and in potentially getting rB value
information to the front-end early enough to influence
branch prediction). (I think Nick Maclaren mentioned this
possibility some time ago.)
Presumably other pieces of information could also be
useful and be communicated by "redundant" encodings (as
register names are somewhat "redundant"). However, first
the compiler needs to be able to get such information
(and know how useful it is to the hardware) and then it
needs to be able to communicate the information.