Removing implementation-dependent FP behavior

70 views
Skip to first unread message

Krste Asanovic

unread,
Mar 28, 2017, 6:09:21 AM3/28/17
to isa...@groups.riscv.org

In reflecting on the discussion of wider transfer of narrower FP
types, we realized that RISC-V still has undefined behavior when
narrower operations take wider values. For example, a FLD followed by
a FADD.S using the value loaded into the register, or a FLD followed
by a FSW of the value. We believe this is the last part of the
current user spec that has implementation-defined behavior, and would
like to eliminate this if possible.

The proposal is to take the earlier proposal a step further and define
the model for RISC-V floating-point to always maintain FP values in
registers as the set of FLEN-wide IEEE FP values. Operations that
produce <FLEN-bit results will correctly produce values from the
narrower set (including handling rounding, subnormals, Infs, NaNs)
then store them internally as the equivalent FLEN-wide IEEE value.

Any wider operation (not a transfer) can directly use the result from
a narrower previous operation as input without explicit conversion,
leading to a minor performance improvement. As RISC-V operations are
defined to produce the canonical NaN, differences in narrower versus
wide NaN encoding do not affect results.

A wider transfer (store, FMV) operation will use the wide internal
encoding, where NaNs are expanded with additional zero bits on the
right side.

A narrower operation or transfer will use the internal value with
round-to-zero (truncation) of the significand, and values with
out-of-range exponents converted to appropriately signed Infs. This
direction can also give a minor performance improvement by avoiding an
explicit conversion when the truncation is appropriate.

This proposal follows naturally from an internally recoded format and
apart from the primarily goal of removing ambiguity in RISC-V outputs,
also provides a small performance improvement for some cases. It does
effectively mandate the use of a recoded format for scalar FP in the
presence of multiple FP widths. The big advantage of recoded formats
is that they provide low-overhead deterministic-time handling of
subnormals. Handling subnormals with register I/O in IEEE memory
format requires either larger arithmetic datapaths to handle subnormal
shifts inline or SW/HW trap and fixup.

Krste

Roger Espasa

unread,
Mar 28, 2017, 6:48:29 AM3/28/17
to Krste Asanovic, RISC-V ISA Dev
I think *mandating* a recoded internal implementation is a bad idea for an open-source ISA. The ISA should allow both recoded and non-recoded choices to be made by different implementations. Each implementation could then evaluate the merits and de-merits of recoding in the context of its target markets and make the most appropriate choice. The fact that Rocket started with a recoded implementation should not bias the ISA decision.

roger.


Krste

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/m2k27990he.fsf%40berkeley.edu.

Krste Asanovic

unread,
Mar 28, 2017, 7:34:30 AM3/28/17
to Roger Espasa, RISC-V ISA Dev
1) we want to define behavior of all operations - this means picking some encoding favoring one style or the other.  (For the record, I would strongly oppose an ATM-style solution that crippled both).

2) for pure RISC-V implementations, scalar FP ISA was defined to support internal recoding as this has many benefits. 
I believe (as do many earlier FPU designers) that recoded FPU has many benefits, particularly if you worry about area/power and predictable latency operations.  What are the arguments in favor of non-recoded FPU for a pure RISC-V design?

Now, if you had to support other non-RISC-V ISAs using the same FP data path, including for example, other ISA’s packed-SIMD operations,  you can use the techniques I outlined in last proposal.  Other ISA’s choices here shouldn’t affect the decision for RISC-V, particularly if it hurts the RISC-V spec.

Krste

Bruce Hoult

unread,
Mar 28, 2017, 7:34:43 AM3/28/17
to Krste Asanovic, RISC-V ISA Dev
I think it's good, but I don't understand one part.

What does a single precision denorm look like when it's been widened into a 64 bit register?

Will it (and SP denorm results from arithmetic) end up stored in the register in normalized form, but with increasingly more zeros on the right hand side of the significand as the exponent gets smaller?


Krste

Krste Asanovic

unread,
Mar 28, 2017, 7:57:30 AM3/28/17
to Bruce Hoult, RISC-V ISA Dev
Adding list back (my mistake).

The recoded format uses an extra exponent bit internally, so all representable numbers are stored normalized.
But this is never visible externally.

Krste

On Mar 28, 2017, at 4:55 AM, Bruce Hoult <br...@hoult.org> wrote:

Oh! I didn't realize even the widest format would be stored differently to the IEEE format when in registers. Thanks.

(did you mean to omit the list address)


On Tue, Mar 28, 2017 at 2:43 PM, Krste Asanovic <kr...@berkeley.edu> wrote:

On Mar 28, 2017, at 4:34 AM, Bruce Hoult <br...@hoult.org> wrote:

I think it's good, but I don't understand one part.

What does a single precision denorm look like when it's been widened into a 64 bit register?

Like a 64-bit normal number.


Will it (and SP denorm results from arithmetic) end up stored in the register in normalized form, but with increasingly more zeros on the right hand side of the significand as the exponent gets smaller?

Yes. The tradeoff between recoded and non-recoded is that recoded arithmetic operations have to round at any position to support subnormals (which are all held as normalized numbers including for the widest format), whereas non-recoded implementations have to either shift values by a very large amount (and calculate more output bits of intermediate results) or use a HW/SW fixup when subnormals are encountered.   Variable-position rounding is much cheaper than large hardware arithmetic blocks when fixed latency is required, and the fixup trap adds non-deterministic timing to FP number handling (which is why some FP units flush denorms to zero if predictable timing is required).

Krste



To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.



--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


David Horner

unread,
Mar 28, 2017, 11:23:07 AM3/28/17
to RISC-V ISA Dev, br...@hoult.org
Considering logarithmic number system encoding raised for me the extent of the "lesser precision" encoding guarantee.

Given that transforming the internal representation back to IEEE format is very expensive, using the "store single in double" afforded a means to NaN wrapper it.

And given that a RISCV design intent is to support novel and experimental implementation, I was quite interested to see how this would play out.
Not as well for LNS it seems, but fortunately RISCV is open to non-standard extensions that can accommodate.

With the "precision folding" and redefinition of the float operations to relax (and define) "mixed operations" there is no need for precision tracking, and no illegal behaviour.

It may still be desirable to warn when "mixed operations" occur, but the motivation is lessened for RISCV hardware targets.
Implementation of such warning shifts to compiler (et al) and debug support.

As with LNS, this decision is not ideal for all possibilities, but it is now internally consistent and practical.

Even more so, this development process worked to discover and reconcile disparate views.
I expect that the two prevalent worldviews apparent in these discussions "ISA formalizing a conceptual model" and "ISA defining machine operation" were bridged mostly without a conscious awareness, and in some cases not yet satisfactorily. 

I appreciate being able to be a part of this process, and anticipate involvement in the document revision.

kr...@berkeley.edu

unread,
Mar 28, 2017, 12:00:06 PM3/28/17
to David Horner, RISC-V ISA Dev, br...@hoult.org

Hi David,

>>>>> On Tue, 28 Mar 2017 08:23:07 -0700 (PDT), David Horner <ds2h...@gmail.com> said:

| Considering logarithmic number system encoding raised for me the extent of the
| "lesser precision" encoding guarantee.

| Given that transforming the internal representation back to IEEE format is very
| expensive, using the "store single in double" afforded a means to NaN wrapper
| it.

The real problem with the NaN wrapper scheme was not knowing whether
the value was a NaN-double or a wrapped-single until first use, and
there being nothing to stop software doing something different on
second use (e.g., FLD x; FADD.S z,x,y; FADD.D w,x,z).

| And given that a RISCV design intent is to support novel and experimental
| implementation, I was quite interested to see how this would play out.
| Not as well for LNS it seems, but fortunately RISCV is open to non-standard
| extensions that can accommodate.

The proposed encoding scheme still has the property that a sequence of
pure transfer instructions of at least the desired width must preserve
the original bit pattern, so LNS numbers can be loaded/stored/FMV-ed
etc without corruption. Similarly, any <=FLEN-bit wide value can be
losslessly encoded/decoded using the same logic that a regular FP
load/store uses, so LNS operations are also supportable in the
microarchitecture albeit it not that efficiently.

This does show how using internal recoding works against using
existing load/store/transfer instructions on FP registers when holding
other datatypes - but I'd argue choosing IEEE memory representation
internally biases against floating-point types. I trace the problem
to the decision in the standard to not handle gradual underflow by
using variable-position rounding but instead the subnormal encoding we
have now.

Other extensions can always define additional architectural registers,
which can share the physical rename pool with FP registers (e.g., BOOM
shares one pool of physical registers between int and fp regs). With
appropriate config/disable instructions, even the "architecturally
committed" set of physical registers can be freed.

The vector extension gets around these issues by being explicit about
the type held, and would be where I'd assume most novel data types get
implemented, with FP scalar arguments coming from f registers and
other "arbitrary/uninterpreted-bit-pattern" scalar arguments coming
from x registers.

| With the "precision folding" and redefinition of the float operations to relax
| (and define) "mixed operations" there is no need for precision tracking, and no
| illegal behaviour.

| It may still be desirable to warn when "mixed operations" occur, but the
| motivation is lessened for RISCV hardware targets.
| Implementation of such warning shifts to compiler (et al) and debug support.

| As with LNS, this decision is not ideal for all possibilities, but it is now
| internally consistent and practical.

| Even more so, this development process worked to discover and reconcile
| disparate views.
| I expect that the two prevalent worldviews apparent in these discussions "ISA
| formalizing a conceptual model" and "ISA defining machine operation" were
| bridged mostly without a conscious awareness, and in some cases not yet
| satisfactorily. 

| I appreciate being able to be a part of this process, and anticipate
| involvement in the document revision.


Thanks for all your help!
Krste

| On Tuesday, 28 March 2017 07:57:30 UTC-4, krste wrote:
| Adding list back (my mistake).
| The recoded format uses an extra exponent bit internally, so all
| representable numbers are stored normalized.
| But this is never visible externally.
| Krste
| --
| You received this message because you are subscribed to the Google Groups
| "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email
| to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/
| .
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/
| 1c61b431-d755-48a7-be27-9bd45b465a81%40groups.riscv.org.

Alex Bradbury

unread,
Mar 28, 2017, 12:20:43 PM3/28/17
to Roger Espasa, Krste Asanovic, RISC-V ISA Dev
On 28 March 2017 at 11:48, Roger Espasa <roger....@esperantotech.com> wrote:
> I think *mandating* a recoded internal implementation is a bad idea for an
> open-source ISA. The ISA should allow both recoded and non-recoded choices
> to be made by different implementations. Each implementation could then
> evaluate the merits and de-merits of recoding in the context of its target
> markets and make the most appropriate choice. The fact that Rocket started
> with a recoded implementation should not bias the ISA decision.

Hi Roger, by my understanding nothing is _mandated_ regarding the
internal representation - an implementation could always use a
different internal representation (such as the appropriate IEEE
zero-padded binary encoding), and use internal tracking bits to
indicate whether a register is in the 'internal' or 'external' format.
It can then convert from external->internal when an FPU calculation
takes place and internal->external upon fmv or fs[dwq].

There are obviously going to be advantages to using the mandated
external representation internally, so the way I see it this boils
down to a value judgement about which internal encoding is going to be
most common / highest performance, and whether the choice of external
encoding might make reasonable designs infeasible.

Best,

Alex

Stefan O'Rear

unread,
Mar 28, 2017, 1:42:41 PM3/28/17
to Krste Asanovic, David Horner, RISC-V ISA Dev, Bruce Hoult
On Tue, Mar 28, 2017 at 9:00 AM, <kr...@berkeley.edu> wrote:
> The real problem with the NaN wrapper scheme was not knowing whether
> the value was a NaN-double or a wrapped-single until first use, and
> there being nothing to stop software doing something different on
> second use (e.g., FLD x; FADD.S z,x,y; FADD.D w,x,z).

At the time I was advocating this the idea was that an implementation
would be able to have an extra bit in the recoded form for "this is a
wrapped single, treat as NaN in all DP operations". This made it seem
preferable to zero-padding, which creates a DP subnormal and
potentially requires two entirely parallel recodings.

-s

Victor Moya

unread,
Mar 28, 2017, 2:03:09 PM3/28/17
to RISC-V ISA Dev

 Handling subnormals with register I/O in IEEE memory
format requires either larger arithmetic datapaths to handle subnormal
shifts inline or SW/HW trap and fixup.

Other than I find really dangerous to basically mandate an specific hardware implementation (reminds me of many other very bad decisions in the 80s and 90s that become nasty ISA legacy because someone thought at their time that a specific hardware implementation was the best choice) I don't really follow on how the extra hardware requirement of adding support for subnormals forces, today in 2017, not in the 80s or 70s, requires to make an ISA decision with such future impact.

If an implementation needs to be so low power or area that the marginal difference due to subnormals becomes expensive probably it doesn't even need support for subnormals and should implement them.

Victor
 

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
Reply all
Reply to author
Forward
0 new messages