--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/bb41761b-4fc4-f636-cf77-c0dd216d41b2%40cadence.com.
might there be more performance value in making it dual-operand to make better use of available read ports, eg:
a/sqrt(b)
or
1/sqrt(a+b)
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CALo5CZx5KmeO28yDRmX5t3QkprE0qP7wshiCE1f784La7R0HfA%40mail.gmail.com.
might there be more performance value in making it dual-operand to make better use of available read ports, eg:a/sqrt(b)
or1/sqrt(a+b)The FSQRT instruction in the base F extension is only single-operand. However, from a quick skim I don't see any commentary as to why that was chosen that as opposed to something analogous to your suggestion, Guy. Presumably, it's because the extra arithmetic complexity outweighs the wasted read port. (and it's not much of a waste, since that read can be gated).
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CABXpatq%3DjpDi3s3WBktcfPWwgsDYi%2Biddx1B2K5-UmFAuTF%3DeQ%40mail.gmail.com.
1/sqrt(a) is a single-operand instruction.might there be more performance value in making it dual-operand to make better use of available read ports, eg:a/sqrt(b)or1/sqrt(a+b)both are common forms of usage. i suppose these could be formed by chaining, but if that’s the case there’s little need for rsqrt if you have both div and sqrt.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/DM5PR2201MB103679BA4E7D621632E464ABF9F30%40DM5PR2201MB1036.namprd22.prod.outlook.com.
Apologies if this is slightly off topic but
(1) does support for the D extension imply support for the F extension?(2) does support for the Q extension imply support for both D and F extensions?
Or can they all be independent - e.g. support for Q but not D or F etc.?
The D extension depends on the base single-precision instruction subset F.
The quad-precision binary floating-point instruction-set extension is named “Q”; it depends on the double-precision floating- point extension D.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAC2bXD7Na%2B41LSzp_dLSeXKPcfT%2BXWTukCit0ab4gfzeipBbQA%40mail.gmail.com.
Dan, I'm not talking about the original sqrt, i'm talking about rsqrt
(reciprocal).
However, if it is treated as separate
sqrt followed by divide, you can get a/sqrt(b) without doing the extra
1/sqrt(a) lookup table.
1/sqrt(a) has been done as single-operand because it's an easy,
independent table-lookup operation, followed by iteration to get the
desired precision. it converges nicely.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CALo5CZxWfPS-z8XgogJuKTG_fVCfOgJLb2vPQS2-ukxajmFHcQ%40mail.gmail.com.
Just because "that's the way it's always been done" is not a good
reason to justify its continuance.
1/sqrt(a) has been done as single-operand because it's an easy,
independent table-lookup operation, followed by iteration to get the
desired precision. it converges nicely.
however, in real software, the function 1/sqrt(a) almost never stands
alone. it is used for normalization, so it is almost always followed
by a multiplication, ie a/sqrt(b), or preceeded by an addition, ie
1/sqrt(a+b).
saying it is "subjected to rounding twice" isn't really fair. if done
as separate operations, it is subjected to rounding twice. when done
as an atomic operation, you can arrange extended precision and round
only once.
http://bugs.libre-riscv.org/show_bug.cgi?id=44
Some context, above, apologies using a phone to type, quite awkward to keep thread replies.
Vulkan's accuracy requirements are extreme. Error is only allowed in the last 2 bits of mantissa.
3D GPU requirements are also extreme. One DIV and one ISQRT per pixel, no compromises allowed. This for normalisation, typically 1/(x^2 + y^2 + z^2).
Also there are power requirements to meet.
This eliminates Newton Raphson and other iterative methods as there is no guaranteed completion time, plus, if providing enough engines to do so in a readonable timeframe (higher radix) the number of multipliers and in particular their increased size will kill all chance of meeting the power budget.
We therefore had to research pipelined designs ONLY, and Jacob found the above paper. It uses On the Fly conversion as well as redundant carry save format, between pipeline stages this saves hugely on gate count.
The fascinating bit is that the OTFC outputs BOTH sqrt AND isqrt from the SAME hardware. This because it needs the partial results from each to make decisions on what to do within each stage.
Unfortunately the paper is extremely obtuse, like many academic papers, and there is no verilog source. Sigh.
So in the meantime we go with a simpler design, at least we have something, and Jacob has worked out that there are adjustable magic constants so that DIV, SQRT and ISQRT can be covered by at least the same algorithm if not the actual same hardware, with very little extra gate count.
Summary:
1. For 3D we absolutely need isqrt, this is going to go ahead.
2. Lookup tables and Newton Raphson are off the table for us.
3. There exist algorithms that give ISQRT "for free".
4. Love the idea, Guy, of the add, however we may need more than 2 operands, 3 adds would be more useful. Perhaps a separate opcode?
5. We have no problem with a spec requiring less accuracy, however it is something that other implementors may come to regret, particularly when it comes to testing. We use softfloat python bindings on DIV SQRT MUL ADD, perform direct comparisons, and it works extremely well.
L.
On Friday, July 12, 2019 at 4:42:30 AM UTC+8, glemieux wrote:
> might there be more performance value in making it dual-operand to make better use of available read ports, eg:
>
>
> a/sqrt(b)
> or
> 1/sqrt(a+b)
The hybrid combibation of divide and isqrt (or, multiply and isqrt), I have not seen any hardware out there that does this. I would be concerned about the increase in gate count, it is 2 complex special purpose blocks, back to back.
Also I would be concerned about the rounding, just working it out (let alone implementing it).
On Friday, July 12, 2019 at 5:09:47 AM UTC+8, andrew wrote:
> It’s not straightforward to correctly implement sqrt(x) using something like sqrt(x+y), because the addition messes up the sign of zero for sqrt(-0). You’d need to use 2-3 instructions to get the IEEE 754-mandated value (load zero into y; copy the sign of x onto y; then perform sqrt(x+y)).
Assuming this is isqrt rather than sqrt we are discussing.
So what you are saying Andrew is that FP exceptions on the add part would make a hybrid operation much more complex. Two possible exceptions could occur, and I assume the same +/- zero issues arise?
FMUL on the other hand, exceptions etc. have been thought through and the add has not been problematic.
With the possibility of a FP HW Exception extension to be created, it would be even more important to get this right.
> So, in addition to being more complex to implement, it’s also a less useful instruction than plain-old sqrt.
Isqrt. Definitely needed for 3D.
Thank you for this insight Andrew it cuts off a lot of development time potentially expended unnecessarily.
L.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/f211f254-bd7a-4e48-9e15-69d1e74ff4a6%40groups.riscv.org.
The rounding isn't difficult in an N-bit at a time algorithm that doesn't have a redundant result representation. For a Newton-Raphson implementation or a redundant result implementation, rounding is more difficult.
Bill
EXTERNAL MAIL
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5a6fa996-8966-483a-bdcf-60b371f334b5%40groups.riscv.org.
On Sat, Jul 13, 2019 at 10:30 AM Aneesh Raveendran <anees...@gmail.com> wrote:
>
> Hi all,
> Myself Aneesh Raveendran. I worked on RISC-V floating point co-processor. I have few doubts regarding floating point reciprocal square-root.
>
> 1. In which application/bench marking suites will infer floating point reciprocal square-root operations?reciprocal sqrt is used a lot in 3D graphics for normalizing vectors -- the pseudocode for normalizing 3D a vector is:fn normalize(x: float, y: float, z: float) -> (float, float, float) {let sum_of_squares = x * x + y * y + z * z;let factor = rsqrt(sum_of_squares);return (factor * x, factor * y, factor * z);}It can also be used in machine learning to normalize 1-hot output vectors, though would not be particularly performance critical for that particular usecase.
> 2. If this instruction is proposing, what could be the possible instruction formats? (opcodes, f7, f5 field values )The proposed instructions are:
+----------+---------+-------+-----+--------+----+---------+
| Mnemonic | funct7 | rs2 | rs1 | funct3 | rd | opcode |
+==========+=========+=======+=====+========+====+=========+
| frsqrt.s | 0111100 | 00000 | rs1 | rm | rd | 1010011 |
+----------+---------+-------+-----+--------+----+---------+
> 3. Any testsuites are available to verify the functional correctness of the module?mpfr implements reciprocal sqrt, however it doesn't support all of RISC-V's rounding modes and may be missing support for other features needed for testing.Softfloat doesn't currently implement rsqrt.I have not researched other softfloat libraries yet.
On Sun, Jul 14, 2019 at 12:26 AM lkcl <luke.l...@gmail.com> wrote:
> using bigfloat to perform the reciprocal-square-root in a much higher precision will cover the requirement to provide accurate FPSQRT. however the corner-cases (at the extreme limits of the exponent, and when the mantissa's MSB is zero) are going to be a bundle of fun.
>
Note that mpfr has code to emulate fixed-size floating point numbers,
including handling denormal numbers. It also has mostly (see caveats)
the same special-case semantics (+-0, +-Inf, NaN) as IEEE 754, so that
should make it a lot easier to use.
MPFR is used in gcc to evaluate floating-point expressions at compile
time, so it is well-tested and likely to be correct.
mpfr does, however, have some notable caveats: see
https://www.mpfr.org/mpfr-current/mpfr.html#MPFR-and-the-IEEE-754-Standard
> i am still puzzled as to how a *soft* FP rsqrt implementation may be verified and found to be a correct implementation of the IEEE754 standard.
All that's needed is that it provides the correct answers, since the
IEEE 754-2008 standard defines exactly what the answer should be in
all cases (ignoring different NaN encodings).
Fascinating. It's a code generator / compiler. With the IEEE754 arithmetic algorithms encoded as python objects, in an Abstract Syntax Tree, that may be handed to any code-generator backend.
Wow.
So there is no reason why, for example, a Chisel3 backend should not be created.
Or a verilog one.
Or a nmigen one.
That would basically AUTOGENERATE the RTL needed to be conformant with the IEE754 spec in any one of the required algorithms.
It is also quite likely to be able to autogenerate the unit tests as well (might need some work)
Mind you, that recip sqrt is designed to autogenerate a Newton Raphson algorithm, which some people will not be happy with.
Still, it is pretty awesome. Good find Jacob.
L.
The special case values should be (for reciprocal sqrt):
NaN -> NaN (ignoring signaling/quiet)
-Inf -> NaN
-finite -> NaN
-0 -> -Inf (div-by-zero; weird, but this is how ieee 754 defines it)
+0 -> +Inf (div-by-zero)
+finite -> rsqrt
+Inf -> +0
do you happen to know if that's the exact order in which those tests have to be actioned? the reason i ask is because i got caught out when doing fpsqrt special cases: i'd placed zero-testing later in the list, tested -ve numbers (all -ve numbers) first to return canonical-NaN, and of course sqrt(-ve zero) is -ve zero.
Those cases are more like a switch statement in C with breaks in that
each case is independent:
> this one "-0 -> -Inf" kiiinda makes sense if the 1/ is considered to take precedence over sqrt() part.
yeah, but it makes it more annoying since otherwise the sign of the
result (ignoring NaNs) is always positive, like you would expect from
the mathematical limits at 0 and infinity.
If I recall correctly the actual definition of reciprocal sqrt in IEEE
754 is (expanded):
fn frsqrt(v) {
let temp = sqrt(v); // IEEE 754 sqrt except it returns the exact
mathematical result without rounding
return 1.0 / temp; // IEEE 754 division; rounds the result
}
the reason frsqrt(-0) returns -Inf is that IEEE 754 sqrt(-0) returns
-0 which then gets converted to -Infinity.
Just some notes I have on RSQRT::
RSQRT( ±0 ) = ±∞ , and signals the DIV ZERO exception
RSQRT( +∞ ) = +0
RSQRT( -Â ) = Quiet NaN, and signals the Operation exception.
than you realize. The best-case scenario would be for your computed
results to be correctly rounded most of the time, but not always.
Getting to "always" requires so much more effort that I'd expect it's
easier to build the reciprocal square-root function from scratch.
Just some notes I have on RSQRT::RSQRT( ±0 ) = ±∞ , and signals the DIV ZERO exception
RSQRT( +∞ ) = +0
RSQRT( -Â ) = Quiet NaN, and signals the Operation exception.
RSQRT( NaN ) = Quiet NaNOn the other hand, the thread is about a reciprocal sqrt approximation.
Why approximate when the real result is easily calculated in HW where all theboundary cases can be done with <essentially> no overhead.There is technology that enables an FMAC unit to calculate RSQRT and produce its IEEE 754-2008 correctly rounded result in low 20's clock cycles.One can get this down to 14 cycles if one can accept faithfully rounded results.If you want details, e-mail me.
On Friday, July 26, 2019 at 6:07:04 AM UTC+8, MitchAlsup wrote:
>
> But I started interacting with this thread thinking people wanted to get this done fast in DOUBLE PRECISION.
<snip>64 bit is still needed however the priorities are different: efficiency is the (only) priority.
There is an abstruse hack to compute 1/sqrt(x) with low precision using only integer manipulations that may start your Newton Raphson.
For the encoding, I think using an encoding similar to both the
fsqrt.* and fdiv.* encodings is a good idea, since frsqrt is similar
to both fdiv and fsqrt; Therefore, as an initial proposal, I think
using a funct7 value of 0111100 and the rest of the instruction
identical to fsqrt is a good idea, since, as far as I'm aware, that
doesn't conflict with anything currently.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/6810f50a-4e4b-43b9-afbf-7f167f5e6b12%40groups.riscv.org.
* The Libre RISCV FPDIV pipeline also computes SQRT and RSQRT
* Some OTFC pipeline designs do RSQRT "for free"
* However, if doing RSQRT chances are high it will be part of a design that needs EXP, LOG etc anyway.
* CORDIC can do a ton of algorithms including SQRT, RSQRT, LOG, SIN etc.
Thoughts?
L.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/13bb0fc5-dbb4-4e44-bce8-96058684042e%40groups.riscv.org.