--
You received this message because you are subscribed to the Google Groups "Unum Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unum-computin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unum-computing/1192fd51-7c0c-4397-8954-39173dc75f3en%40googlegroups.com.
I understand the motivation behind the quire, and I admire whomever though it up, but there are a few points I do not understand about a quire :
a) is there always 0 or 1 quire ? One would think that once a feature has risen into source code expression, SW can get/use as many as it desires !?!
b) is there (will there ever be) any quire OP quire arithmetic ?
c) if you had a Padé polynomial to evaluate, with only 1 quire you have to evaluate the numerator, and then evaluate the denominator separately; this might be a burden on the compiler (writer) since the polynomials are generally expressed in pairs of Horner or Estrin evaluations finishing up with FDIV. (this smells like more of (a) to me)
numeratorHigh = qToP(quire);quire –= numeratorHigh;numeratorLow = qToP(quire);
quotient = numeratorHigh / denominator + numeratorLow / denominator;
quire = numeratorHigh + numeratorLow – quotient * denominator;
d) swapping of quires into these semi-dedicated registers will not be fun for the HW or compiler--leading to code explosion and poor performance.
You ask;MitchBest,John
--
You received this message because you are subscribed to the Google Groups "Unum Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unum-computin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unum-computing/1192fd51-7c0c-4397-8954-39173dc75f3en%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Unum Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unum-computin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unum-computing/225cf387-21f1-4ea7-9b58-d2f8b784e02bn%40googlegroups.com.
On Sunday, March 26, 2023 at 4:56:51 PM UTC-5 johngustafson wrote:I have suggested a way to make the quire very inexpensive in silicon, but so far no one has taken me up on it. Use 16 of the general-purpose registers as a quire, with hardware support for the shifting and carry propagation. There's a long history of treating register pairs as double-precision going back at least as far as IBM's System 360 back in the 1960s. Just extend the idea to 16 registers. If the code uses a quire accumulation, just push the 16 registers onto the stack, use them as a quire, and when the quire work is done, pop the saved registers back into place.
HW people really hate ganging registers, compiler writers really hate ganging registers; but for different reasons, Architects have other reason to choose-differently.<-----------------------------Over on the HW side, we hate having instructions that require odd-get-started sequences, and odd-finish-up sequences; which is what you current suggestion would do (I think).<-----------------------------Over on the compiler side, Compiler writers would have to "see" that a quire was required, before doing any register allocation (probably not hard) then setup a prologue and epilogue to deal with the swath of registers required (also not very hard). But if you ran into a subroutine where it had 1 path using a quire and another path devoid of quire, then the subroutine would have to pay quire-overheads even for the majority of subroutine calls that did not need the quire. Pushing prologue and epilogue sequences into nesting blocks is quite hard (although doable, it is rarely done.)Then you have the situation where the compiler needs 20+ FP registers to perform the source algorithm and by allocating a quire you end up with a bunch of spill/fill instructions in the subroutine. This is likely the biggest enemy of your current thought train.<-----------------------------Architects generally have a fee hand in choosing certain features of their architecture; one of these features is separate versus unified register model. My 66000 architecture has chosen the unified register model while many have chosen the separate registers model (more choose separate than unified by a considerable margin). For those who choose the unified model, the mapping of a quire into the register file is a high enough hill to climb that it would discourage adopting Posits over IEEE 754. Something I suspect you don't want at this moment.I have been studying various numeric benchmarks using the LLVM RISC-V compiler and My 66000 LLVM compiler (same source code, same compiler flags, trying to make instruction set comparisons as fair as possible) and I have run into several subroutines where the RISC-V LLVM compiler had to generate spill/fill instructions when it has 32 64-bit GPRs and 32 64-bit FPRs while I only have 32 64-bit GPRs and need no spill/fill instructions. So, the choice of unified versus separate is a bit more delicate than one might surmise--it depends on other features of the ISA--and in this case features RISC-V did not have.
Perhaps a hardware expert in the Unum Computing group can explain to me why it's better to create a separate register for the quire instead of ganging the registers already defined in RISC-V.
I understand the motivation behind the quire, and I admire whomever though it up, but there are a few points I do not understand about a quire ::
a) is there always 0 or 1 quire ? One would think that once a feature has risen into source code expression, SW can get/use as many as it desires !?!
b) is there (will there ever be) any quire OP quire arithmetic ?
c) if you had a Padé polynomial to evaluate, with only 1 quire you have to evaluate the numerator, and then evaluate the denominator separately; this might be a burden on the compiler (writer) since the polynomials are generally expressed in pairs of Horner or Estrin evaluations finishing up with FDIV. (this smells like more of (a) to me)
d) swapping of quires into these semi-dedicated registers will not be fun for the HW or compiler--leading to code explosion and poor performance.
You ask;MitchBest,John--
You received this message because you are subscribed to the Google Groups "Unum Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unum-computin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unum-computing/1192fd51-7c0c-4397-8954-39173dc75f3en%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Unum Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unum-computin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unum-computing/225cf387-21f1-4ea7-9b58-d2f8b784e02bn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unum-computing/73f511a0-2b31-43f1-98d0-6eb89d7acf1bn%40googlegroups.com.
For a superscalar renaming microprocessor, abrusing integer registers for the quire is a flat out non-starter. It would severely impact the highly optimized data paths.
Also, even if you architecturally only have a single quire, your execution window is 300 and growing instructions wide and there will likely be more than a single quire active.