[RFC] Addressing the problems with opaque floating point encodings in F/D/Q

579 views
Skip to first unread message

Alex Bradbury

unread,
Mar 22, 2017, 7:10:09 PM3/22/17
to RISC-V ISA Dev
Hi all,

I've written up a short document explaining an issue that's had some
discussion on the riscv-isa-manual issue tracker. See below for raw
markdown, or here
<https://gist.github.com/asb/a3a54c57281447fc7eac1eec3a0763fa> for an
easier to read rendered version. I'd really welcome any feedback.

# [RFC] Addressing the problems with opaque floating point encodings in F/D/Q

__Alex Bradbury, lowRISC CIC__

## Brief problem summary
The current version of the RISC-V ISA specification explicitly leaves the
encoding of a single precision value undefined when it is either converted to
a wider integer register or written to a wider memory location (e.g. a float
written with fsd or fmv.x.d. The motivation for this is to allow a low
overhead internal recoding. Unfortunately, the freedom to keep this
encoding opaque is illusory, as the chosen encoding is architecturally
visible in
a way that can cause real compatibility issues unless it is standardised.

See [issue #30 on
riscv-isa-manual](https://github.com/riscv/riscv-isa-manual/issues/30)
for further discussion. Thank you to all contributors to the discussion there:
Krste Asanovic, David Horner, Andrew Waterman and especially Stefan
O'Rear for helping
me to understand the scope of the problem.

## Background and more detailed problem description
The RISC-V ISA specification describes floating point support in the 'F'
(single precision), 'D' (double precision), and 'Q' (quad precision)
extensions. The proposed vector extension additionally introduces scalar
instructions for half precision floats. These extensions each build upon each
other, and introduce a 32 floating point registers with length ('flen')
dependent on the maximum extension implemented. e.g. RV32IF has 32x32-bit
FPRs, while RV32IFD has 32x64-bit FPRs. This document will focus on floats and
doubles, although analogous issues exist for quad and half precision values.
This document does not (yet) consider potential interaction with the future 'L'
decimal floating point extension, which has yet to be written.

A single precision value stored to memory using fsw will be encoded according
to the IEEE-754-2008 standard. Similarly for a double stored using fsd.
However, the encoding for a single precision value written using fsd is
currently implementation defined. This can occur when a callee-saved register
is spilled (saved on the stack), or when register state is saved while context
switching. At the point the point the state is stored, it won't be known what
type of value resides in the FPR. The problem is most obviously apparent if
you consider a process running on a core that uses one encoding, which is then
migrated to a core core that uses a different one - for example
switching cores in a
heterogeneous cluster.

To clarify the discussion, I should also point out where this is _not_ a
concern. It will not pose a problem for shared data structures containing
doubles, as for correctness the compiler must insert a cast for cases like
below (i.e. an fcvt.d.s instruction):

double d;

void callee(float f) {
d = f;
}


## Impact
Fundamentally, this issue affects any case where registers are spilled using
one encoding, and potentially read back and interpreted by an FPU that expects
a different one:
* Task migration across heterogeneous cores within the same SoC.
* Task migration across different devices, either using virtual machine
migration or something like CRIU in Linux
* Debug and validation. e.g. it's useful to verify one implementation against
another by comparing the same instruction produces the same results. (See
[here](https://groups.google.com/a/groups.riscv.org/d/msg/hw-dev/j7yXuuHbSCE/N7cPsC0tGQAJ)).
* Performing design space exploration, performance analysis, or testing in a
way that might involve switching between differnet models (e.g. a "fast" and a
"slow" model).

Being able to safely transfer state between cores that claim to implement the
same ISA string (within reason) is a useful property, and this kind of ability
is something that is actively used with other architectures. For instance,
provided I ensure a minimal cpuid is exposed I am free to transfer virtual
machines between Intel and AMD implementations of x86, without worrying that
spilled floating point state will be corrupted (interpreted differently).

## Solutions
The first possibility is we leave things as they are, and hope this isn't an
issue in practice. This would harm the ability to mix cores from different
vendors (or even cores by different design teams in the same company, unless
they properly synchronise on this issue). I think the only reasonable path is
to decide upon a standardised encoding/serialisation. In fact, to avoid issues
such as this in the future we should consider any state directly accessible to
user-level code to be architecturally visible and subject to a
standard encoding.

First of all, a refresher on IEEE FP:
* Single precision (32 bits): 1 bit sign, 8 bit exponent, 23 bit significand
* Double precision (64 bits): 1 bit sign, 11 bit exponent, 52 bit significand

There are a number of choices for how you may arrange to store and
encode single and
double precision values in your register file:

* Have a single floating point register file of fixed width (e.g. 32 32-bit
registers), and use register pairs to support double precision floating point.
* MIPS, among others, use this approach
* Gives the advantage of allowing ABI compatibility when adding
support for higher precision. As it stands the 'Q'
extension introduces a completely new ABI requiring a complete
rebuild (RV32E, RV32I, RV32IF, RV64IF, RV64IFD, RV64IFDQ are all
different ABIs).
* The RISC-V spec doesn't allow this approach
* Downside: using a register pair means two single precision registers are
now unavailable for use

* Pack floats tightly inside wider double registers. Logically 'f2' refers to
a different register depending on if it's used in an F or a D instruction (the
lower 32 bits of the 2nd 64-bit register in an F context, and the whole of the
3rd 64-bit register in a D context).
* ARMv7 (and below?) used this approach
* Downside: writing to any of the first 16 64-bit registers using a double
operation clobbers two registers for single precision use
* The RISC-V spec doesn't allow this approach

* Store a single precision float in the lower half of a 64-bit register. This
is perhaps the most obvious solution.
* Allowed by the current RISC-V spec
* Simple and easy to understand. AArch64 and presumably a number of other
architectures use this approach.

* Standardise on the UCB recoded format used in Rocket's current FPU
* Is this documented anywhere?

* When performing an flw on a system with 64-bit FPRs, unpack the exponent and
significant into the appropriate locations in the 64-bit register. Perform
appropriate masking/rounding when executing single-precision operations
* Andrew
[indicates](https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/ARDSfeAoRW4/dNKfMP90BgAJ)
this approach was used by POWER6 and Alpha, but may hurt single precision
latency

* Come up with a new NaN-based encoding.
* A double-precision NaN is represented as a value where the exponent is all
1s and the 52-bit significand is non-zero. This is a huge encoding space,
and a standard encoding could easily be chosen
* There is an advantage in that debug tools could determine with a high
degree of certainty whether the dumped state from a floating point register
is holding a value that is meant to be interpreted as a single-precision
float
* Similar encodings could be used to represent a half precision value in a
float register, or a double in a quad register
* There's perhaps more flexibility for eagerly recoding what seems to be a
single-precision float to a different internal representation upon an fld
(rather than on-demand when a single precision operation is performed).
However, for IEEE compliance any such value would still need to act as a NaN
when used in a double precision operation.

I argue that what matters above all else, is that one of these options is
chosen and used consistently. It's worth nothing that an implementation is
still free to use a different internal recoding, it would just need to support
serialising and deserialising to the standard encoding that is chosen.

## Backwards compatibility impact
I believe this change can be made in a backwards compatible way (i.e. all
standards-compliant RISC-V software would continue to work on a newer
revision). It also seems likely there is still time to specify this change
and have it adopted before any RISC-V FPU implementations are available in
shipping systems.

## Other related issues
* Substantially more minor, but at least a recommendation for encoding quiet vs
signalling NaNs would be useful. A similar issue does exist here, in that a
signalling NaN might be interpreted as non-signalling after a context switch
to a different RISC-V implementation. I expect the potential impact of this
issue is far, far lower than what is described above
* An RV64IFD system hoping to support Q would, as it stands, have to break ABI
compatibility when Q is in use. Is the cost of adding yet another ABI worth
it, or can it be avoided?

## Conclusion/summary
* Leaving the encoding of lower precision values in higher precision fp
registers appears to give more microarchitectural freedom, but in reality this
is an architecturally visible property that 'leaks' and causes potential
issues in use cases that RISC-V community should care about (e.g.
migration on a heterogeneous cluster)
* The RISC-V community would benefit from standardising on a single externally
visible encoding ('serialisation'), and doing so quickly

Jacob Bachmeyer

unread,
Mar 22, 2017, 11:52:06 PM3/22/17
to Alex Bradbury, RISC-V ISA Dev
Alex Bradbury wrote:
> * Come up with a new NaN-based encoding.
> * A double-precision NaN is represented as a value where the exponent is all
> 1s and the 52-bit significand is non-zero. This is a huge encoding space,
> and a standard encoding could easily be chosen
> * There is an advantage in that debug tools could determine with a high
> degree of certainty whether the dumped state from a floating point register
> is holding a value that is meant to be interpreted as a single-precision
> float
> * Similar encodings could be used to represent a half precision value in a
> float register, or a double in a quad register
> * There's perhaps more flexibility for eagerly recoding what seems to be a
> single-precision float to a different internal representation upon an fld
> (rather than on-demand when a single precision operation is performed).
> However, for IEEE compliance any such value would still need to act as a NaN
> when used in a double precision operation.
>

I favor this option, especially since the double-precision NaN space is
enough to simply store a single-precision value with 20 bits left over.
Does this also extend to hiding double-precision values in
quad-precision NaNs?

> ## Backwards compatibility impact
> I believe this change can be made in a backwards compatible way (i.e. all
> standards-compliant RISC-V software would continue to work on a newer
> revision). It also seems likely there is still time to specify this change
> and have it adopted before any RISC-V FPU implementations are available in
> shipping systems.
>

Standards-compliant software is forbidden to assume anything about the
format of narrower floats stored in wider modes other than that they can
be restored. (Section 8.2 in the user ISA spec) We would have to go
out of our way to break backwards compatibility and I am quite sure that
that will not happen.

> ## Other related issues
> * Substantially more minor, but at least a recommendation for encoding quiet vs
> signalling NaNs would be useful. A similar issue does exist here, in that a
> signalling NaN might be interpreted as non-signalling after a context switch
> to a different RISC-V implementation. I expect the potential impact of this
> issue is far, far lower than what is described above
>

Assign one of those 20 bits left over after hiding a single-precision
float in a double-precision NaN as a RISC-V standard "signaling NaN" flag?

> * An RV64IFD system hoping to support Q would, as it stands, have to break ABI
> compatibility when Q is in use. Is the cost of adding yet another ABI worth
> it, or can it be avoided?
>

I do not see a good way to avoid this, since Q fundamentally makes the
FP registers wider.


-- Jacob

Allen J. Baum

unread,
Mar 23, 2017, 1:43:35 AM3/23/17
to jcb6...@gmail.com, Alex Bradbury, RISC-V ISA Dev
At 10:52 PM -0500 3/22/17, Jacob Bachmeyer wrote:
>Alex Bradbury wrote:
>>* Come up with a new NaN-based encoding.
>> * A double-precision NaN is represented as a value where the exponent is all
>> 1s and the 52-bit significand is non-zero. This is a huge encoding space,
>> and a standard encoding could easily be chosen
>> * There is an advantage in that debug tools could determine with a high
>> degree of certainty whether the dumped state from a floating point register
>> is holding a value that is meant to be interpreted as a single-precision
>> float
>> * Similar encodings could be used to represent a half precision value in a
>> float register, or a double in a quad register
>> * There's perhaps more flexibility for eagerly recoding what seems to be a
>> single-precision float to a different internal representation upon an fld
>> (rather than on-demand when a single precision operation is performed).
>> However, for IEEE compliance any such value would still need to act as a NaN
>> when used in a double precision operation.
>>
>
>I favor this option, especially since the double-precision NaN space is enough to simply store a single-precision value with 20 bits left over. Does this also extend to hiding double-precision values in quad-precision NaNs?

The ideal format from a HW perspective is to right-justify the exponent (and converting from excess 127 to/from excess 1023, which is a 3 bit decrement) and to left-justify the mantissa (with trailing zeroes) to the wider format - which is effectively converting it to the wider format. That's cheap, easy, and allows the FPU to handle either single or double with little extra logic.

As was said, the most important thing is to standardize, and I'm getting the impression that it may be too late for that, although it's only a real issue if current non-conforming chips are put into systems that can migrate to chips with other formats.

Otherwise I don't see any way that we could be backwards compatible (from a migration point of view; otherwise, things work fine regardless).

If explicit spill/fill code is the only issue (and my reading of this is that migration spill/fill is really the only issue), then another possibility is a width tag CSR where a bit/FPReg is set on every write into the CSR which indicates the width of the value written to the FPReg. Spill/Fill code could interrogate the CSR, and convert to/from the standardized format (or two double!) as needed. That's not backward compatible either...


>
>>## Backwards compatibility impact
>>I believe this change can be made in a backwards compatible way (i.e. all
>>standards-compliant RISC-V software would continue to work on a newer
>>revision). It also seems likely there is still time to specify this change
>>and have it adopted before any RISC-V FPU implementations are available in
>>shipping systems.
>>
>
>Standards-compliant software is forbidden to assume anything about the format of narrower floats stored in wider modes other than that they can be restored. (Section 8.2 in the user ISA spec) We would have to go out of our way to break backwards compatibility and I am quite sure that that will not happen.
>
>>## Other related issues
>>* Substantially more minor, but at least a recommendation for encoding quiet vs
>>signalling NaNs would be useful. A similar issue does exist here, in that a
>>signalling NaN might be interpreted as non-signalling after a context switch
>>to a different RISC-V implementation. I expect the potential impact of this
>>issue is far, far lower than what is described above
>>
>
>Assign one of those 20 bits left over after hiding a single-precision float in a double-precision NaN as a RISC-V standard "signaling NaN" flag?
>
>>* An RV64IFD system hoping to support Q would, as it stands, have to break ABI
>>compatibility when Q is in use. Is the cost of adding yet another ABI worth
>>it, or can it be avoided?
>>
>
>I do not see a good way to avoid this, since Q fundamentally makes the FP registers wider.
>
>
>-- Jacob
>
>--
>You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
>To post to this group, send email to isa...@groups.riscv.org.
>Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/58D34663.8040308%40gmail.com.


--
**************************************************
* Allen Baum tel. (908)BIT-BAUM *
* 248-2286 *
**************************************************

Alex Bradbury

unread,
Mar 23, 2017, 4:18:01 AM3/23/17
to jcb6...@gmail.com, RISC-V ISA Dev
Yes, and for hiding half-precision values in a single precision NaN.

>> ## Other related issues
>> * Substantially more minor, but at least a recommendation for encoding
>> quiet vs
>> signalling NaNs would be useful. A similar issue does exist here, in that
>> a
>> signalling NaN might be interpreted as non-signalling after a context
>> switch
>> to a different RISC-V implementation. I expect the potential impact of
>> this
>> issue is far, far lower than what is described above
>>
>
>
> Assign one of those 20 bits left over after hiding a single-precision float
> in a double-precision NaN as a RISC-V standard "signaling NaN" flag?

Stefan O'Rear pointed out I'd missed the reference to the 'quiet bit'
(I'd been searching for signaling). The specification should still
make it crystal clear whether this the bit indicates is_quiet or
is_signalling. The IEEE 754-2008 makes a recommendation, but it has
been interpreted either way in the past, as I understand it
<https://en.wikipedia.org/wiki/NaN#Encoding>.

>> * An RV64IFD system hoping to support Q would, as it stands, have to break
>> ABI
>> compatibility when Q is in use. Is the cost of adding yet another ABI
>> worth
>> it, or can it be avoided?
>>
>
>
> I do not see a good way to avoid this, since Q fundamentally makes the FP
> registers wider.

I have to admit I didn't expect to see Q marked as 'frozen' along
IMAFD - I wasn't aware of any implementations or compiler support thus
far.

Q could be used while maintaining the RV64IFD ABI but could not treat
any registers as callee-saved (plus obviously kernel context switching
would need to be modified). I mention it because the choice of
encoding/register packing has an effect here. e.g. if D values were
explicitly documented as being stored tightly packed in Q registers,
code using Q within the RV64IFD ABI could still rely on some of its
state being callee-saved (callee-saved registers in the range
fa0-fa15). Implementers might also consider the pros/cons of a Q' (Q
prime) extension that stored quads in pairs of double registers, again
allowing greater ABI compatibility without forcing quads to always be
spilled on function calls.

Best,

Alex

Alex Bradbury

unread,
Mar 23, 2017, 4:24:21 AM3/23/17
to Allen J. Baum, jcb6...@gmail.com, RISC-V ISA Dev
On 23 March 2017 at 06:43, Allen J. Baum <allen...@esperantotech.com> wrote:
> As was said, the most important thing is to standardize, and I'm getting the impression that it may be too late for that, although it's only a real issue if current non-conforming chips are put into systems that can migrate to chips with other formats.

Yes, I think if there were loads of chips/designs already out there
making different encoding choices it might be too late, but if
something can quickly be chosen and RISC-V implementers can be
persuaded to implement the 2.1 version of D with this change, rather
than 2.0 then we're all good.

> Otherwise I don't see any way that we could be backwards compatible (from a migration point of view; otherwise, things work fine regardless).

It will never be safe in general to migrate state in this way between
systems implementing the 2.0 D spec, unless you know the two systems
use the same encoding. If there's never really any 2.0 hardware, then
this isn't an issue anyone will have to worry about going forwards. My
point about backwards compatibility was primarily that no software
would have to change. Not all 2.0 hardware would be 2.1 compliant, but
all 2.1 hardware would also be 2.0 compliant.

> If explicit spill/fill code is the only issue (and my reading of this is that migration spill/fill is really the only issue), then another possibility is a width tag CSR where a bit/FPReg is set on every write into the CSR which indicates the width of the value written to the FPReg. Spill/Fill code could interrogate the CSR, and convert to/from the standardized format (or two double!) as needed. That's not backward compatible either...

Yes, that's certainly possible and I considered detailing such a
scheme - but as you say it has some serious disadvantages.

Best,

Alex

Bruce Hoult

unread,
Mar 23, 2017, 4:52:03 AM3/23/17
to Alex Bradbury, RISC-V ISA Dev
I'm not a hardware guy, but it feels to me as if always storing FP in the widest supported format in registers, converting on load/store, and just having the ALU round a bit differently is the most natural.

I can't see why you'd do anything else, unless you were already constrained by backward compatibility -- or maybe if you expect to mostly use SP, have DP mostly as a courtesy, and want to make full use of expensive register bits to pack as many SP values as possible.

Does it have *any* downsides? (other than making fcvt.d.s into a no-op ... fcvt.s.d would still need to round)

Do the people who store 32 bit FP in the low half of a 64 bit register have a whole separate ALU for single precision? Or are they unpacking and packing the format on every operation, not only on load/store?

If you're going to use the bottom half of a DP register for SP and not use the upper half for anything useful then, yeah, sticking a NaN bit pattern in the top half is kinda cute and has some real advantages over just zeroing it and is barely more expensive. So that would be my 2nd choice. It also scales to half inside a float Nan, inside a double NaN, inside a Quad Nan, as all IEEE formats have more significand bits  than the whole of the next smaller format.


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

kr...@berkeley.edu

unread,
Mar 23, 2017, 8:28:18 AM3/23/17
to Alex Bradbury, Allen J. Baum, jcb6...@gmail.com, RISC-V ISA Dev

>>>>> On Thu, 23 Mar 2017 08:24:17 +0000, Alex Bradbury <a...@asbradbury.org> said:
| On 23 March 2017 at 06:43, Allen J. Baum <allen...@esperantotech.com> wrote:
|| As was said, the most important thing is to standardize, and I'm
|getting the impression that it may be too late for that, although
|it's only a real issue if current non-conforming chips are put into
|systems that can migrate to chips with other formats.
| Yes, I think if there were loads of chips/designs already out there
| making different encoding choices it might be too late, but if
| something can quickly be chosen and RISC-V implementers can be
| persuaded to implement the 2.1 version of D with this change, rather
| than 2.0 then we're all good.
|| Otherwise I don't see any way that we could be backwards compatible
| (from a migration point of view; otherwise, things work fine
| regardless).

I think this closing of an "undefined" hole would be appropriate as
part of the move to ratifying the spec for the Foundation. The degree
of backwards incompatibility would be very minor, both in actual
effect and in number of shipped/taped-out commercial systems.

Krste

Roger Espasa

unread,
Mar 23, 2017, 11:20:13 AM3/23/17
to kr...@berkeley.edu, Alex Bradbury, Allen J. Baum, jcb6...@gmail.com, RISC-V ISA Dev

Encoding SP in more than 32b might cause troubles to folks wishing to build cheap SIMD on top of the existing Fregs. And the issue described by Alex will get worse when those SIMD extensions also include integer data types. And this will pop up again in the definition of the vector extension.  So closing the hole in a way that an SP value stored with FSD ends up in the low 32b of the memory location in exactly IEEE format seems a good choice to me. 

roger. 

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

M Farkas-Dyck

unread,
Mar 23, 2017, 12:46:46 PM3/23/17
to Alex Bradbury, RISC-V ISA Dev
On 22/03/2017, Allen J. Baum <allen...@esperantotech.com> wrote:
> The ideal format from a HW perspective is to right-justify the exponent (and
> converting from excess 127 to/from excess 1023, which is a 3 bit decrement)
> and to left-justify the mantissa (with trailing zeroes) to the wider format
> - which is effectively converting it to the wider format. That's cheap,
> easy, and allows the FPU to handle either single or double with little extra
> logic.

On 23/03/2017, Roger Espasa <roger....@esperantotech.com> wrote:
> Encoding SP in more than 32b might cause troubles to folks wishing to build
> cheap SIMD on top of the existing Fregs. And the issue described by Alex
> will get worse when those SIMD extensions also include integer data types.
> And this will pop up again in the definition of the vector extension. So
> closing the hole in a way that an SP value stored with FSD ends up in the
> low 32b of the memory location in exactly IEEE format seems a good choice
> to me.

Well, it seems we have a dilemma. But i have an idea:

If we unpack the values, the exponent in the more-precise format is
never all 0 or all 1.

If we choose zero-padding or NaN-encoding scheme, the exponent in the
more-precise format is exactly all 0 or all 1.

Migrating a process between cores/devices, the precision of the values
would be specified.

We could specify these as the canonical formats; a processor would be
free to choose its native format for stores, but would be obligated to
know all for loads.
To save area, a processor could load its native format verbatim into a
register, but trap to machine mode for other formats. Thus in the
common case of a homogeneous multiprocessor system, the extra cost of
migrating the values would merely be testing a few bits, and the
processors could use whichever format is most architecturally
convenient.

Thoughts?

Allen J. Baum

unread,
Mar 23, 2017, 1:51:24 PM3/23/17
to M Farkas-Dyck, Alex Bradbury, RISC-V ISA Dev
Frankly, at this point solving the problem of migrating between different formats doesn't seem to be a problem worth solving. The number of real systems that will encounter this can probably counting on the fingers of one foot...

Standardizing on something avoids the problem in all future implementations, so all we need to do is pick something, and to a large extent it doesn't really matter what the something is.

Zero fill, NaN fill, and upconverting/unpacking will all work, but will have differing implementation costs. As Roger points out, once we get SIMD sharing HW with the FPU, then upconvert/unpacking doesn't save anything. So, we may as well think about eating that cost now.

As I see it, the obvious choice is to but lower precision formats right justified in an FPReg. The choice is then to figure out what to do with the higher bits. Filling them with zeroes works for any precision; filling with NaNs is a bit more problematic if there is more than one lower precision (e.g. half and single).
>--
>You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
>To post to this group, send email to isa...@groups.riscv.org.
>Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAL3m8eCkEzH5ET%3DyYueGBVs8xN0HHJHSrK7AhJ%3D5cHd8JFSwcg%40mail.gmail.com.

Andrew Waterman

unread,
Mar 23, 2017, 1:56:55 PM3/23/17
to Allen J. Baum, Jacob Bachmeyer, Alex Bradbury, RISC-V ISA Dev
On Wed, Mar 22, 2017 at 11:43 PM, Allen J. Baum
<allen...@esperantotech.com> wrote:
> At 10:52 PM -0500 3/22/17, Jacob Bachmeyer wrote:
>>Alex Bradbury wrote:
>>>* Come up with a new NaN-based encoding.
>>> * A double-precision NaN is represented as a value where the exponent is all
>>> 1s and the 52-bit significand is non-zero. This is a huge encoding space,
>>> and a standard encoding could easily be chosen
>>> * There is an advantage in that debug tools could determine with a high
>>> degree of certainty whether the dumped state from a floating point register
>>> is holding a value that is meant to be interpreted as a single-precision
>>> float
>>> * Similar encodings could be used to represent a half precision value in a
>>> float register, or a double in a quad register
>>> * There's perhaps more flexibility for eagerly recoding what seems to be a
>>> single-precision float to a different internal representation upon an fld
>>> (rather than on-demand when a single precision operation is performed).
>>> However, for IEEE compliance any such value would still need to act as a NaN
>>> when used in a double precision operation.
>>>
>>
>>I favor this option, especially since the double-precision NaN space is enough to simply store a single-precision value with 20 bits left over. Does this also extend to hiding double-precision values in quad-precision NaNs?
>
> The ideal format from a HW perspective is to right-justify the exponent (and converting from excess 127 to/from excess 1023, which is a 3 bit decrement) and to left-justify the mantissa (with trailing zeroes) to the wider format - which is effectively converting it to the wider format. That's cheap, easy, and allows the FPU to handle either single or double with little extra logic.

Agreed, Allen - this seems to me to be the most natural choice that
preserves flexibility for the hardware to employ a recoding scheme.

>
> As was said, the most important thing is to standardize, and I'm getting the impression that it may be too late for that, although it's only a real issue if current non-conforming chips are put into systems that can migrate to chips with other formats.
>
> Otherwise I don't see any way that we could be backwards compatible (from a migration point of view; otherwise, things work fine regardless).
>
> If explicit spill/fill code is the only issue (and my reading of this is that migration spill/fill is really the only issue), then another possibility is a width tag CSR where a bit/FPReg is set on every write into the CSR which indicates the width of the value written to the FPReg. Spill/Fill code could interrogate the CSR, and convert to/from the standardized format (or two double!) as needed. That's not backward compatible either...
>
>
>>
>>>## Backwards compatibility impact
>>>I believe this change can be made in a backwards compatible way (i.e. all
>>>standards-compliant RISC-V software would continue to work on a newer
>>>revision). It also seems likely there is still time to specify this change
>>>and have it adopted before any RISC-V FPU implementations are available in
>>>shipping systems.
>>>
>>
>>Standards-compliant software is forbidden to assume anything about the format of narrower floats stored in wider modes other than that they can be restored. (Section 8.2 in the user ISA spec) We would have to go out of our way to break backwards compatibility and I am quite sure that that will not happen.
>>
>>>## Other related issues
>>>* Substantially more minor, but at least a recommendation for encoding quiet vs
>>>signalling NaNs would be useful. A similar issue does exist here, in that a
>>>signalling NaN might be interpreted as non-signalling after a context switch
>>>to a different RISC-V implementation. I expect the potential impact of this
>>>issue is far, far lower than what is described above
>>>
>>
>>Assign one of those 20 bits left over after hiding a single-precision float in a double-precision NaN as a RISC-V standard "signaling NaN" flag?
>>
>>>* An RV64IFD system hoping to support Q would, as it stands, have to break ABI
>>>compatibility when Q is in use. Is the cost of adding yet another ABI worth
>>>it, or can it be avoided?
>>>
>>
>>I do not see a good way to avoid this, since Q fundamentally makes the FP registers wider.
>>
>>
>>-- Jacob
>>
>>--
>>You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
>>To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
>>To post to this group, send email to isa...@groups.riscv.org.
>>Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>>To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/58D34663.8040308%40gmail.com.
>
>
> --
> **************************************************
> * Allen Baum tel. (908)BIT-BAUM *
> * 248-2286 *
> **************************************************
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/p06240868d4f910eeef6f%40%5B192.168.1.50%5D.

David Horner

unread,
Mar 23, 2017, 3:50:08 PM3/23/17
to RISC-V ISA Dev, a...@asbradbury.org, m.fark...@gmail.com


On Thursday, 23 March 2017 12:46:46 UTC-4, M Farkas-Dyck wrote:
On 22/03/2017, Allen J. Baum <allen...@esperantotech.com> wrote:
> The ideal format from a HW perspective is to right-justify the exponent (and
> converting from excess 127 to/from excess 1023, which is a 3 bit decrement)
> and to left-justify the mantissa (with trailing zeroes) to the wider format
> - which is effectively converting it to the wider format. That's cheap,
> easy, and allows the FPU to handle either single or double with little extra
> logic.

On 23/03/2017, Roger Espasa <roger....@esperantotech.com> wrote:
> Encoding SP in more than 32b might cause troubles to folks wishing to build
> cheap SIMD on top of the existing Fregs. And the issue described by Alex
> will get worse when those SIMD extensions also include integer data types.
> And this will pop up again in the definition of the vector extension.  So
> closing the hole in a way that an SP value stored with FSD ends up in the
> low 32b of the memory location in exactly IEEE format seems a good choice
> to me.

Well, it seems we have a dilemma. But i have an idea:

If we unpack the values, the exponent in the more-precise format is
never all 0 or all 1.

If we choose zero-padding or NaN-encoding scheme, the exponent in the
more-precise format is exactly all 0 or all 1.

zero fill of the upper 32 bit yields a sub-normalized IEEE D value.
However, if a specific sNaN header is used and reserved, then that range of NaN values can be definitively identified as F in D encoding and no ambiguity occurs.  
  

Stefan O'Rear

unread,
Mar 23, 2017, 4:02:35 PM3/23/17
to Roger Espasa, Krste Asanovic, Alex Bradbury, Allen J. Baum, Jacob Bachmeyer, RISC-V ISA Dev
On Thu, Mar 23, 2017 at 8:20 AM, Roger Espasa
<roger....@esperantotech.com> wrote:
>
> Encoding SP in more than 32b might cause troubles to folks wishing to build
> cheap SIMD on top of the existing Fregs. And the issue described by Alex
> will get worse when those SIMD extensions also include integer data types.
> And this will pop up again in the definition of the vector extension. So
> closing the hole in a way that an SP value stored with FSD ends up in the
> low 32b of the memory location in exactly IEEE format seems a good choice to
> me.

The vector extension as presented by Krste at W5 does not have this
specific problem because vector registers know the type of data they
contain, which can be F4 or F8 but not both at the same time. (Also,
vector registers are never callee-saved).

The vector extension has a somewhat different problem in that the
context-switch and mstatus.XS mechanism has not been specified in
detail, and the saved-state blobs may or may not be portable between
implementations.

-s

Jacob Bachmeyer

unread,
Mar 23, 2017, 10:15:28 PM3/23/17
to Bruce Hoult, Alex Bradbury, RISC-V ISA Dev
Bruce Hoult wrote:
> I'm not a hardware guy, but it feels to me as if always storing FP in
> the widest supported format in registers, converting on load/store,
> and just having the ALU round a bit differently is the most natural.
>
> I can't see why you'd do anything else, unless you were already
> constrained by backward compatibility -- or maybe if you expect to
> mostly use SP, have DP mostly as a courtesy, and want to make full use
> of expensive register bits to pack as many SP values as possible.
>
> Does it have *any* downsides? (other than making fcvt.d.s into a no-op
> ... fcvt.s.d would still need to round)
>
> Do the people who store 32 bit FP in the low half of a 64 bit register
> have a whole separate ALU for single precision? Or are they unpacking
> and packing the format on every operation, not only on load/store?
>
> If you're going to use the bottom half of a DP register for SP and not
> use the upper half for anything useful then, yeah, sticking a NaN bit
> pattern in the top half is kinda cute and has some real advantages
> over just zeroing it and is barely more expensive. So that would be my
> 2nd choice. It also scales to half inside a float Nan, inside a double
> NaN, inside a Quad Nan, as all IEEE formats have more significand bits
> than the whole of the next smaller format.

I would expect FP registers to always hold "unpacked" values in the
widest format with a few tag bits to distinguish
half/single/double/quad. Since the internal FP register format is not
actually visible to software, all that this requires is permuting bits
and "filling in" a NaN when accessing the "integer bit pattern value" of
an FP register that is tagged as holding a narrower value than requested.

-- Jacob

Jacob Bachmeyer

unread,
Mar 23, 2017, 10:18:11 PM3/23/17
to Allen J. Baum, M Farkas-Dyck, Alex Bradbury, RISC-V ISA Dev
Allen J. Baum wrote:
> Frankly, at this point solving the problem of migrating between different formats doesn't seem to be a problem worth solving. The number of real systems that will encounter this can probably counting on the fingers of one foot...
>
> Standardizing on something avoids the problem in all future implementations, so all we need to do is pick something, and to a large extent it doesn't really matter what the something is.
>
> Zero fill, NaN fill, and upconverting/unpacking will all work, but will have differing implementation costs. As Roger points out, once we get SIMD sharing HW with the FPU, then upconvert/unpacking doesn't save anything. So, we may as well think about eating that cost now.
>
> As I see it, the obvious choice is to but lower precision formats right justified in an FPReg. The choice is then to figure out what to do with the higher bits. Filling them with zeroes works for any precision; filling with NaNs is a bit more problematic if there is more than one lower precision (e.g. half and single).
>

But "hide-it-in-NaN" is recursive: a half-precision value can be stored
as a single-precision NaN inside a double-precision NaN inside a
quad-precision NaN if need be.

-- Jacob

Jacob Bachmeyer

unread,
Mar 23, 2017, 10:31:20 PM3/23/17
to Andrew Waterman, Allen J. Baum, Alex Bradbury, RISC-V ISA Dev
Andrew Waterman wrote:
> On Wed, Mar 22, 2017 at 11:43 PM, Allen J. Baum
> <allen...@esperantotech.com> wrote:
>
>> At 10:52 PM -0500 3/22/17, Jacob Bachmeyer wrote:
>>
>>> Alex Bradbury wrote:
>>>
>>>> * Come up with a new NaN-based encoding.
>>>> * A double-precision NaN is represented as a value where the exponent is all
>>>> 1s and the 52-bit significand is non-zero. This is a huge encoding space,
>>>> and a standard encoding could easily be chosen
>>>> * There is an advantage in that debug tools could determine with a high
>>>> degree of certainty whether the dumped state from a floating point register
>>>> is holding a value that is meant to be interpreted as a single-precision
>>>> float
>>>> * Similar encodings could be used to represent a half precision value in a
>>>> float register, or a double in a quad register
>>>> * There's perhaps more flexibility for eagerly recoding what seems to be a
>>>> single-precision float to a different internal representation upon an fld
>>>> (rather than on-demand when a single precision operation is performed).
>>>> However, for IEEE compliance any such value would still need to act as a NaN
>>>> when used in a double precision operation.
>>> I favor this option, especially since the double-precision NaN space is enough to simply store a single-precision value with 20 bits left over. Does this also extend to hiding double-precision values in quad-precision NaNs?
>>>
>> The ideal format from a HW perspective is to right-justify the exponent (and converting from excess 127 to/from excess 1023, which is a 3 bit decrement) and to left-justify the mantissa (with trailing zeroes) to the wider format - which is effectively converting it to the wider format. That's cheap, easy, and allows the FPU to handle either single or double with little extra logic.
>>
>
> Agreed, Allen - this seems to me to be the most natural choice that
> preserves flexibility for the hardware to employ a recoding scheme.
>

I agree that widening narrower values may be the best way to store them
in the FP register file, but the issue here is standardizing what to do
when FSQ is executed on a register holding a single-precision value if
the processor distinguishes that case. I seem to recall previous
discussions on this list pointing out that IEEE-compliant FPUs must make
that distinction, so we are left with standardizing a way to encode a
width tag and a narrower value into a wider value.

I favor Alex Bradbury's proposal to encode the narrower value inside a
wider NaN and offer a suggestion to extend that recursively if the
widths are not adjacent. A half-precision value can be encoded into a
single-precision NaN encoded into a double-precision NaN encoded into a
quad-precision NaN and the whole bundle unpacked into the half-precision
value, converted to internal quad-precision with a tag indicating
half-precision format, upon FLQ if the implementation so chooses.

-- Jacob

Allen Baum

unread,
Mar 24, 2017, 12:29:56 AM3/24/17
to jcb6...@gmail.com, Andrew Waterman, Alex Bradbury, RISC-V ISA Dev
If you're using a single FPU to implement double, single, and packed single, then unpacking has 3 formats instead of two.
I don't think I understand the requirement that 
  IEEE-compliant FPUs must make a distinction when storing single precision values with a store double.

What does that mean? That someone examining the bits in memory can determine whether the value stores was originally a single rather than a double? Strictly speaking, the only way to do that is by reserving NaN values, and I'd have to read the spec carefully to see if that was legal.
I also don't know what the spec says about loading a single and adding it to a double or vice Versace- a similar issue.

At first glance the HW cost doesn't seem to be much worse than zero filling, though the recursive encoding makes me nervous.

-Allen

Andrew Waterman

unread,
Mar 24, 2017, 12:58:21 AM3/24/17
to Jacob Bachmeyer, Allen J. Baum, Alex Bradbury, RISC-V ISA Dev
Yeah. If the values are stored in the regfile in the widest supported
format, then it is clear what happens when a float32 or float64 is
FSQ'd: it's stored to memory as a float128 that represents the same
value. Easy to specify, and cheap to implement if employing recoding.

David Horner

unread,
Mar 24, 2017, 9:16:03 AM3/24/17
to RISC-V ISA Dev, jcb6...@gmail.com, allen...@esperantotech.com, a...@asbradbury.org


On Friday, 24 March 2017 00:58:21 UTC-4, andrew wrote:
On Thu, Mar 23, 2017 at 7:31 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Andrew Waterman wrote:
>>
>> On Wed, Mar 22, 2017 at 11:43 PM, Allen J. Baum
>> <allen...@esperantotech.com> wrote:

> I agree that widening narrower values may be the best way to store them in
> the FP register file, but the issue here is standardizing what to do when
> FSQ is executed on a register holding a single-precision value if the
> processor distinguishes that case.  I seem to recall previous discussions on

Yeah.  If the values are stored in the regfile in the widest supported
format, then it is clear what happens when a float32 or float64 is
FSQ'd: it's stored to memory as a float128 that represents the same
value.  Easy to specify, and cheap to implement if employing recoding.

Some hardware implementations that employ this approach internally (of max size fits all)
provide a post operation of readjusting to the lower precision and therefore execute lower precision
formats _slower_ than the full higher precision and at a higher cost in active circuits (e.g. adding across 52 bits vs 23.)
RISC-V has distinct instructions for each precision, so an implementation can optimize for lower precision.
 
Whereas, an implementation that is lower precision aware can have lower precision values already positioned for optimizing the expected precision operation.

The NaN encoding provides the information of the precision at load time.
Whereas, encoding as lower_precision_value in higher_precision_format does not know the intended precision until a subsequent float instruction is decoded.

 

David Horner

unread,
Mar 24, 2017, 12:18:18 PM3/24/17
to RISC-V ISA Dev, jcb6...@gmail.com, and...@sifive.com, a...@asbradbury.org


On Friday, 24 March 2017 00:29:56 UTC-4, Allen Baum wrote:
If you're using a single FPU to implement double, single, and packed single, then unpacking has 3 formats instead of two.
I don't think I understand the requirement that 
  IEEE-compliant FPUs must make a distinction when storing single precision values with a store double.

What does that mean? That someone examining the bits in memory can determine whether the value stores was originally a single rather than a double? Strictly speaking, the only way to do that is by reserving NaN values, and I'd have to read the spec carefully to see if that was legal.
I also don't know what the spec says about loading a single and adding it to a double or vice Versace- a similar issue.

At first glance the HW cost doesn't seem to be much worse than zero filling, though the recursive encoding makes me nervous.

Although the encoding format is recursive, the inserted bit pattern is fixed for each sub-format, and the decoding can be performed in parallel with nominal gate delay.
 

Stefan O'Rear

unread,
Mar 24, 2017, 1:12:05 PM3/24/17
to David Horner, RISC-V ISA Dev, Jacob Bachmeyer, Andrew Waterman, Alex Bradbury
On Fri, Mar 24, 2017 at 9:18 AM, David Horner <ds2h...@gmail.com> wrote:
>
> Although the encoding format is recursive, the inserted bit pattern is fixed
> for each sub-format, and the decoding can be performed in parallel with
> nominal gate delay.

Have you considered "fill on the left with 1 bits"? That produces
negative NaNs (rare!), and the recursive aspects are invisible in the
resulting encodings.

-s

Andrew Waterman

unread,
Mar 24, 2017, 1:50:15 PM3/24/17
to David Horner, RISC-V ISA Dev, allen...@esperantotech.com, a...@asbradbury.org, jcb6...@gmail.com
On Fri, Mar 24, 2017 at 6:29 AM David Horner <ds2h...@gmail.com> wrote:


On Friday, 24 March 2017 00:58:21 UTC-4, andrew wrote:
On Thu, Mar 23, 2017 at 7:31 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Andrew Waterman wrote:
>>
>> On Wed, Mar 22, 2017 at 11:43 PM, Allen J. Baum
>> <allen...@esperantotech.com> wrote:

> I agree that widening narrower values may be the best way to store them in
> the FP register file, but the issue here is standardizing what to do when
> FSQ is executed on a register holding a single-precision value if the
> processor distinguishes that case.  I seem to recall previous discussions on

Yeah.  If the values are stored in the regfile in the widest supported
format, then it is clear what happens when a float32 or float64 is
FSQ'd: it's stored to memory as a float128 that represents the same
value.  Easy to specify, and cheap to implement if employing recoding.

Some hardware implementations that employ this approach internally (of max size fits all)
provide a post operation of readjusting to the lower precision and therefore execute lower precision
formats _slower_ than the full higher precision and at a higher cost in active circuits (e.g. adding across 52 bits vs 23.)
RISC-V has distinct instructions for each precision, so an implementation can optimize for lower precision.

No, it is not inherently slower (or even equal in latency). You've presupposed one particular implementation. You can still have dedicated functional units for lower precision. They simply ignore e.g. the lower 29 significand bits.


 
Whereas, an implementation that is lower precision aware can have lower precision values already positioned for optimizing the expected precision operation.

The NaN encoding provides the information of the precision at load time.
Whereas, encoding as lower_precision_value in higher_precision_format does not know the intended precision until a subsequent float instruction is decoded.

 

> this list pointing out that IEEE-compliant FPUs must make that distinction,
> so we are left with standardizing a way to encode a width tag and a narrower
> value into a wider value.
>
> I favor Alex Bradbury's proposal to encode the narrower value inside a wider
> NaN and offer a suggestion to extend that recursively if the widths are not
> adjacent.  A half-precision value can be encoded into a single-precision NaN
> encoded into a double-precision NaN encoded into a quad-precision NaN and
> the whole bundle unpacked into the half-precision value, converted to
> internal quad-precision with a tag indicating half-precision format, upon
> FLQ if the implementation so chooses.
>
> -- Jacob
>

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Alex Bradbury

unread,
Mar 24, 2017, 4:17:15 PM3/24/17
to David Horner, RISC-V ISA Dev, jcb6...@gmail.com, Allen J. Baum
Hi David - just wanted to double-check we're thinking along the same
lines here. You'd still expect that D operations on an FLDed NaN that
appears to hold a single-precision float would still act exactly the
same, treating the value as a NaN? As I see it, recognising the NaN
gives a very strong hint that a lower precision value is encoded, but
it still could be a normal double precision NaN from an external
source and would have to be treated as such for IEEE compliance.

Best,

Alex

Allen J. Baum

unread,
Mar 24, 2017, 6:04:56 PM3/24/17
to Andrew Waterman, David Horner, RISC-V ISA Dev, a...@asbradbury.org, jcb6...@gmail.com
You beat me to it. I would go further.
An implementation that is slower for single precision than doulbe is simply a bad implementation.
Having separate sing/double hardware is probably slower still, since the wire delay to cross over one of them will likely outweigh the extra gate delay or two.

I am getting rather confused about what the goals of this discussion are.
Obviously, there is one goal to standardize on the memory format of a narrow FP variable stored using a wider format.

But, beyond that, is it to:
 - ensure that we can migrate running code from one format to another in heterogeneous systems that use different formats?
  * I would argue that we shouldn't bother; it won't be an issue once the standardizations takes places,
    and its only an issue when someone builds a system that mixes existing chips and compliant systems.
    Not worth the bother, probably won't happen

 - ensure IEEE Standards compliance?
  * I'm a little unsure of which rule we may currently be violating, beside perhaps not having a standard.
    There was a suggestion last night that ( if I interpreted it correctly, far from a sure thing) it was a requirement for someone who examined the bits stored by a wider format store to be able to determine that the value stored was actually a narrow format.
    If that is indeed the case, then right justify, fill with NaN is the only legal option, as far as I can tell.

 - reduce the HW cost or maximize the performance of the (new) standard
  * This gets a bit trickier, and it depends on at least two assumptions
        A. the standards compliance question above
        B. whether you envision using the same FPU for packed SIMD ops
 if  A,  then you must right justify and fill with NaN (not that packed SIMD will be the wider width)
        cost: every FPU op needs to position the bits appropriately before and after the op, depending on the op.
        additional SIMD cost: negligible (above the cost of making sure HW can perform 2 singles)
 if ~A& B, then you should right justify and fill with...something. Could be all zero, could be all ones, could be NaN
        cost: as above
        additional SIMD cost: as above
 if ~A&~B, then I'd argue you convert to the same format as the wider one.
 Note that the FPU will not blindly execute in the wider format, but will execute
        cost: negligible
        additional SIMD cost: N/A

(this is in reference to the statement:

The NaN encoding provides the information of the precision at load time.
Whereas, encoding as lower_precision_value in higher_precision_format does not know the intended precision until a subsequent float instruction is decoded.

The above looks only at the opcodes; decoding values are unnecessary.
I don't see the use case for wanting to determine the precision, (except for debuggers, perhaps), and those values are accessible later in the pipe than opcode decoding in any case. Yes, to be standards compliant you may haved that as a requirement, but I don't see anywhere that the HW actually needs to be aware for any reason.
If you want to implement a tagged architecture, that's a different story, but this is RISC-V we're talking about.

Comments/criticisms, and corrections gratefully accepted.

Jacob Bachmeyer

unread,
Mar 24, 2017, 6:11:55 PM3/24/17
to Alex Bradbury, David Horner, RISC-V ISA Dev, Allen J. Baum
I think that this is a non-issue. The double precision computational
are defined (user ISA section 8.3) to "operate on double-precision
operands and produce double-precision results". Strictly, the current
user ISA spec leaves the result of mixing precision in FP operations
undefined and conformant programs cannot assume that mixed inputs will
produce defined results. Treating a lower-precision input as NaN seems
like a good solution to this problem to me.


-- Jacob

Alex Bradbury

unread,
Mar 24, 2017, 6:21:42 PM3/24/17
to jcb6...@gmail.com, David Horner, RISC-V ISA Dev, Allen J. Baum
I was thinking specifically in terms of IEEE conformance. If I FLD a
double from somewhere (for argument's sake a memory mapped file), and
the result is a NaN matching the single FP embedded NaN encoding, D
operations would still need to treat it as a NaN to be compliant,
right? [it could have been generated from some other source and
actually not be an embedded float] I agree performing D operations on
an F value is undefined.

Best,

Alex

Michael Clark

unread,
Mar 24, 2017, 6:22:57 PM3/24/17
to Alex Bradbury, RISC-V ISA Dev
This seems like the most robust “implementation or otherwise defined” mechanism such that FSW/FSD/FSQ expand the internal format (exponent and significand) into the respective SP/DP/QP IEEE-754 binary formats no matter what type was last used in the floating point register. The question is whether it can be mandated. 

This approach is ideal from a behavioural perspective but mandating it might reduce implementation flexibility.

I can see this approach being problematic for some implementations. It is similar to the 8087 which used an 80-bit internal representation, and converted to/from external representations on load store, however it differs to the packed representation of 64-bit DP and 32-bit SP floating point values held in SSE SIMD registers. The issue may also come up with the Packed SIMD extension where a 128-bit quadruple precision register can be used as 4 packed single precision values. At present the type invariant is held as state in the compiler such that the compiler implicitly doesn’t access two SP values as a DP value without performing a load, store, move, or appropriate conversion, making it a software problem.

* Come up with a new NaN-based encoding.
 * A double-precision NaN is represented as a value where the exponent is all
 1s and the 52-bit significand is non-zero. This is a huge encoding space,
 and a standard encoding could easily be chosen
 * There is an advantage in that debug tools could determine with a high
 degree of certainty whether the dumped state from a floating point register
 is holding a value that is meant to be interpreted as a single-precision
 float
 * Similar encodings could be used to represent a half precision value in a
 float register, or a double in a quad register
 * There's perhaps more flexibility for eagerly recoding what seems to be a
 single-precision float to a different internal representation upon an fld
 (rather than on-demand when a single precision operation is performed).
 However, for IEEE compliance any such value would still need to act as a NaN
 when used in a double precision operation.

I argue that what matters above all else, is that one of these options is
chosen and used consistently. It's worth nothing that an implementation is
still free to use a different internal recoding, it would just need to support
serialising and deserialising to the standard encoding that is chosen.

If the FLEN width operations are used in context switching, and the FLEN width operations are defined to restore the type in the register, is this not sufficient?

I believe there is a minimal option to uphold the statement with respect to the widest FLEN type loads, stores and moves correctly saving and restoring the state for whatever type was last held in the floating point register. Andrew’s first comment:

"FSD and FMV.X.D should be defined to create the same implementation-defined values as each other, and FLD and FMV.D.X should restore them equivalently. In particular, FSD followed by LD and FMV.D.X should properly recreate the single-precision value, as should FMV.X.D followed by SD and FLD.”

1. It’s not possible to context switch between ABIs e.g. (lp64f to lp64d)
2. The widest FLEN type store and load should save and restore the floating point register state.

A binary restore such that the smaller width floating point type is held right justified within the mantissa bits of the larger type (irrespective of the higher width exponent field indicating NaN) should technically work as It should be “undefined behaviour” to treat an SP floating point register as a DP floating point register without performing an appropriate conversion, otherwise there would be no explicit conversion operations in the ISA. 

I can see the major issue is with portability of VM images that save single precision values using FSD, or that save double precision values using FSQ, and having implementations that serialise using different approaches, however mandating a special NaN may make things more difficult for some implementations.

There will be a somewhat similar issue for the Vector extension when accessing register state after reconfiguration of the vector register topology, if VM images that use the vector extension are also to be portable between heterogenous implementations. i.e. “opaque” implicitly must become defined for the binary image to become portable and have a known external representation for these corner cases.

## Backwards compatibility impact
I believe this change can be made in a backwards compatible way (i.e. all
standards-compliant RISC-V software would continue to work on a newer
revision). It also seems likely there is still time to specify this change
and have it adopted before any RISC-V FPU implementations are available in
shipping systems.

## Other related issues
* Substantially more minor, but at least a recommendation for encoding quiet vs
signalling NaNs would be useful. A similar issue does exist here, in that a
signalling NaN might be interpreted as non-signalling after a context switch
to a different RISC-V implementation. I expect the potential impact of this
issue is far, far lower than what is described above
* An RV64IFD system hoping to support Q would, as it stands, have to break ABI
compatibility when Q is in use. Is the cost of adding yet another ABI worth
it, or can it be avoided?

## Conclusion/summary
* Leaving the encoding of lower precision values in higher precision fp
registers appears to give more microarchitectural freedom, but in reality this
is an architecturally visible property that 'leaks' and causes potential
issues in use cases that RISC-V community should care about (e.g.
migration on a heterogeneous cluster)
* The RISC-V community would benefit from standardising on a single externally
visible encoding ('serialisation'), and doing so quickly

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Alex Bradbury

unread,
Mar 24, 2017, 6:54:43 PM3/24/17
to Allen J. Baum, Andrew Waterman, David Horner, RISC-V ISA Dev, jcb6...@gmail.com
>On 24 March 2017 at 23:04, Allen J. Baum <allen...@esperantotech.com> wrote:
> I am getting rather confused about what the goals of this discussion are.
> Obviously, there is one goal to standardize on the memory format of a narrow
> FP variable stored using a wider format.

That is exactly the issue I think should be addressed.

> But, beyond that, is it to:
> - ensure that we can migrate running code from one format to another in
> heterogeneous systems that use different formats?
> * I would argue that we shouldn't bother; it won't be an issue once the
> standardizations takes places,
> and its only an issue when someone builds a system that mixes existing
> chips and compliant systems.
> Not worth the bother, probably won't happen

I agree with you completely, picking a standard and sticking to it
will be the easiest solution for everyone.

> - ensure IEEE Standards compliance?
> * I'm a little unsure of which rule we may currently be violating, beside
> perhaps not having a standard.

I don't think we're currently breaking a standard as such, there's
just the possibility of "corruption" upon context migration,
potentially leading to issues far more embarassing and difficult to
work around than
<https://lists.linaro.org/pipermail/linaro-toolchain/2016-September/005900.html>.

> There was a suggestion last night that ( if I interpreted it correctly,
> far from a sure thing) it was a requirement for someone who examined the
> bits stored by a wider format store to be able to determine that the value
> stored was actually a narrow format.
> If that is indeed the case, then right justify, fill with NaN is the
> only legal option, as far as I can tell.

I don't believe that is a requirement, although the fact you can know
that in a NaN in the appropriate format is actually a lower precision
value is cute. The only invariant that needs to be maintained is that
I can blindly FSD a register (e.g. in spilling) in one core, and then
later FLD it on another after some kind of process migration, and have
it work regardless of whether that register held a lower precision
value. The current spec quite rightly guarantees this to work on any
single RISC-V implementation, we just have this "hole" which means
different implementations may make different encoding choices which
makes context migration worrisome.

> - reduce the HW cost or maximize the performance of the (new) standard

This is the direction the thread is going - which I suppose is the
natural thing to wonder about given a number of perfectly workable
alternatives. How much performance is really on the table here though?

> * This gets a bit trickier, and it depends on at least two assumptions
> A. the standards compliance question above
> B. whether you envision using the same FPU for packed SIMD ops
> if A, then you must right justify and fill with NaN (not that packed SIMD
> will be the wider width)
> cost: every FPU op needs to position the bits appropriately before
> and after the op, depending on the op.
> additional SIMD cost: negligible (above the cost of making sure HW
> can perform 2 singles)
> if ~A& B, then you should right justify and fill with...something. Could be
> all zero, could be all ones, could be NaN
> cost: as above
> additional SIMD cost: as above
> if ~A&~B, then I'd argue you convert to the same format as the wider one.
> Note that the FPU will not blindly execute in the wider format, but will
> execute
> cost: negligible
> additional SIMD cost: N/A
>
> (this is in reference to the statement:
>
> The NaN encoding provides the information of the precision at load time.
>
> Whereas, encoding as lower_precision_value in higher_precision_format does
> not know the intended precision until a subsequent float instruction is
> decoded.
>
>
> The above looks only at the opcodes; decoding values are unnecessary.
> I don't see the use case for wanting to determine the precision, (except for
> debuggers, perhaps), and those values are accessible later in the pipe than
> opcode decoding in any case. Yes, to be standards compliant you may haved
> that as a requirement, but I don't see anywhere that the HW actually needs
> to be aware for any reason.
> If you want to implement a tagged architecture, that's a different story,
> but this is RISC-V we're talking about.
>
> Comments/criticisms, and corrections gratefully accepted.

Thanks for your summary Allen. I think the thought regarding being
able to 'detect' the encoding was that there may be some different
internal recoding decisions made for single vs double, and being able
to detect the value is must be a NaN or an embedded float may allow
such recoding to be done optimistically as soon as the load is
executed. It's not clear how valuable that would be.

Best,

Alex

Alex Bradbury

unread,
Mar 24, 2017, 6:59:27 PM3/24/17
to Michael Clark, RISC-V ISA Dev
On 24 March 2017 at 22:22, Michael Clark <michae...@mac.com> wrote:
> I can see the major issue is with portability of VM images that save single
> precision values using FSD, or that save double precision values using FSQ,
> and having implementations that serialise using different approaches,
> however mandating a special NaN may make things more difficult for some
> implementations.

Hi Michael - VM migration is one case, but really it's any situation
where context/state might migrate from one core to another. A cluster
of cores in a single SoC, potentially with different FPUs making
different implementation-dependent choices, would be another common
case and perhaps the more immediate concern for most current RISC-V
implementers. See my list
https://gist.github.com/asb/a3a54c57281447fc7eac1eec3a0763fa#impact

Best,

Alex

Jacob Bachmeyer

unread,
Mar 24, 2017, 8:16:19 PM3/24/17
to Allen J. Baum, Andrew Waterman, David Horner, RISC-V ISA Dev, a...@asbradbury.org
Allen J. Baum wrote:
> - ensure IEEE Standards compliance?
> * I'm a little unsure of which rule we may currently be violating,
> beside perhaps not having a standard.
> There was a suggestion last night that ( if I interpreted it
> correctly, far from a sure thing) it was a requirement for someone who
> examined the bits stored by a wider format store to be able to
> determine that the value stored was actually a narrow format.
> If that is indeed the case, then right justify, fill with NaN is
> the only legal option, as far as I can tell.

That was me misremembering a response to a suggestion I had made that an
implementation could implement only FLEN-width floats, "unpack" narrower
floats to FLEN upon LOAD-FP, and "pack" them again for STORE-FP.
Masking the "excess" bits in arithmetic is then required, but can make
single-precision latency greater than double-precision latency due to an
implicit FCVT.S.D after every operation. See message-id
<CA++6G0AvAWSOcuCOenHecUMR...@mail.gmail.com> for
the response I misremembered.

Time to put that concern to bed unless someone knows a valid reason to
raise it.

-- Jacob

Andrew Waterman

unread,
Mar 24, 2017, 8:24:26 PM3/24/17
to Jacob Bachmeyer, Allen J. Baum, David Horner, RISC-V ISA Dev, Alex Bradbury
If you employ some form of recoding (which, if you care about latency,
you really must!), the FCVT.S.D is just a masking operation, not a
renormalization. It's basically free.

The combination of recoding and storing the numbers in the highest
implemented precision is good for latency and HW cost, and lends
itself naturally to a simple-to-specify encoding for wide stores on
narrow floats. The value stored to memory is the float with the same
value in the wider format.

Jacob Bachmeyer

unread,
Mar 24, 2017, 8:30:28 PM3/24/17
to Alex Bradbury, David Horner, RISC-V ISA Dev, Allen J. Baum
Correct, a single-precision value embedded in a double-precision NaN is
a double-precision NaN. I suggest that FP operations be defined to
consider *all* narrower operands as NaN, with FCVT required to expand a
value. Tracking the widths of FP values has a very low cost, requiring
only two hardware tag bits per FP register for an RVFDQ implementation.
Using the NaN encoding for STORE-FP/LOAD-FP would allow the tag bits to
be implied when the register is spilled and restored, while also
providing consistency--the register holds a single-precision value if
used in an F operation and a NaN if used in a D operation.

In your scenario, if you FLD a double from a file and that value looks
like an embedded single-precision value, the hardware could extract the
single-precision value and tag the FP register as holding a
single-precision value. If you then operate on it using
single-precision operations, it is a single-precision value. If you use
double-precision operations, it is NaN. This means that FCLASS.D would
always return NaN if the operand register was last loaded with a
single-precision value, while FCLASS.S would return a correct result.

Which reminds me: should the NaN encoding for a narrower float be a
"quiet NaN" or a "signaling NaN"?


-- Jacob

Michael Clark

unread,
Mar 24, 2017, 8:32:39 PM3/24/17
to Alex Bradbury, RISC-V ISA Dev
Agree.

Standardisation of externalisation and type size reflection via extended NaNs are orthogonal but related depending on externalisation format.

a). expand into exponent and mantissa of larger type
b). right justify in mantissa of larger type with unspecified encoding for the remaining bits
c). right justify in mantissa of larger type with specified recursive type encoding for the larger type MSB mantissa NaN (all 1’s)

One could be compatible across cores/implementations with b). but not have some extended ability to reflect on the type and b). presents issues for recoded internal representations, which naturally suits a).

If it is “implementation defined” and the implementation choices are clear, it could be a case of “don’t do that” e.g. like mixing cores with different bit widths.

Recursive NaN type encodings sound like an interesting idea, e.g. a special bit pattern in the higher types most significant mantissa bits. A recoded FPU register with externalisation using right justification needs to identify type to know where to find the exponent and mantissa for expansion into its internal representation.

All 1’s in the higher types left bits seems like the natural encoding for c). e.g -NaN(0b111111…)

0xffffffffffff0000 Half Precision Zero in Single and Double Precision NaN
0xffffffff00000000 Single Precision Zero in Double Precision NaN
0x0000000000000000 Double Precision Zero

c). seems a little complex, but possible, for recoded implementations where a) might otherwise be the natural choice.

Jacob Bachmeyer

unread,
Mar 24, 2017, 8:41:38 PM3/24/17
to Andrew Waterman, Allen J. Baum, David Horner, RISC-V ISA Dev, Alex Bradbury
Huh? This is confusing. (from the ISA spec) "FCVT.S.D rounds according
to the RM field; FCVT.D.S will never round." The response I had
misremembered said:
> It's worse than just masking off the extra precision for
> single-precision arithmetic, though, as single-precision rounding may
> increment the upper bits of the significand.
Given that issue, how does recoding make rounding free?


-- Jacob

Andrew Waterman

unread,
Mar 24, 2017, 9:25:33 PM3/24/17
to Jacob Bachmeyer, Allen J. Baum, David Horner, RISC-V ISA Dev, Alex Bradbury
This discussion's conflating two issues. Encoding float32 as float64
in the regfile doesn't mean you need to use float64 functional units
for everything. The statement you quoted is about the POWER6 using
float64 functional units to implement float32. You can encode float32
as float64, and still use lower-latency functional units for the
float32 operations.

>
>
> -- Jacob

Allen Baum

unread,
Mar 24, 2017, 9:29:08 PM3/24/17
to Andrew Waterman, Jacob Bachmeyer, David Horner, RISC-V ISA Dev, Alex Bradbury
yea, but FCVT.S.D still won't take less than a cycle, so I don't think that should be a consideration.
Again, IMHO, the reason to recode vs right justify&fill with 0/1/NaN is primarily if you want to take advantage of using the same FPU with packed SIMD or not.
Even then, the more i think about it, I'm not sure how much difference it really makes.
In the recode case you need format conversion between single/ double and packed, and in the right-justify case you need it between double and single/packed single.
It seems like a wash from that perspective.
There are still some tradeoff subtleties:
 - recode shifts the cost from the FPU to the Load/Store, so for performance what really mattersis  which of those (if either) are in a critical path
 - fill with NaN enables the ability to catch some errors - a nice to have


On Fri, Mar 24, 2017 at 5:24 PM, Andrew Waterman <wate...@eecs.berkeley.edu> wrote:
On Fri, Mar 24, 2017 at 5:16 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Allen J. Baum wrote:
>>
>>  - ensure IEEE Standards compliance?
>>   * I'm a little unsure of which rule we may currently be violating,
>> beside perhaps not having a standard.
>>     There was a suggestion last night that ( if I interpreted it
>> correctly, far from a sure thing) it was a requirement for someone who
>> examined the bits stored by a wider format store to be able to determine
>> that the value stored was actually a narrow format.
>>     If that is indeed the case, then right justify, fill with NaN is the
>> only legal option, as far as I can tell.
>
>
> That was me misremembering a response to a suggestion I had made that an
> implementation could implement only FLEN-width floats, "unpack" narrower
> floats to FLEN upon LOAD-FP, and "pack" them again for STORE-FP.  Masking
> the "excess" bits in arithmetic is then required, but can make
> single-precision latency greater than double-precision latency due to an
> implicit FCVT.S.D after every operation.  See message-id
> <CA++6G0AvAWSOcuCOenHecUMRKqAozrWPmqvGrt8ex...@mail.gmail.com> for the
> response I misremembered.

Jacob Bachmeyer

unread,
Mar 24, 2017, 9:44:07 PM3/24/17
to Allen Baum, Andrew Waterman, David Horner, RISC-V ISA Dev, Alex Bradbury
Allen Baum wrote:
> yea, but FCVT.S.D still won't take less than a cycle, so I don't think
> that should be a consideration.
> Again, IMHO, the reason to recode vs right justify&fill with 0/1/NaN
> is primarily if you want to take advantage of using the same FPU with
> packed SIMD or not.
> Even then, the more i think about it, I'm not sure how much difference
> it really makes.
> In the recode case you need format conversion between single/ double
> and packed, and in the right-justify case you need it between double
> and single/packed single.
> It seems like a wash from that perspective.
> There are still some tradeoff subtleties:
> - recode shifts the cost from the FPU to the Load/Store, so for
> performance what really mattersis which of those (if either) are in a
> critical path
> - fill with NaN enables the ability to catch some errors - a nice to have

It might be worth pointing out that I am specifically suggesting the NaN
encoding only be specified for use with STORE-FP/LOAD-FP.
Implementations are free to use any representation they want in the FP
register file, as long they can remember what width of value was last
written to each register. FP width mismatch (F value with D operation
or D value with F operation) would be treated as a NaN. Perhaps
narrower values could be seen as quiet NaNs by wider operations, while
wider values would be seen as signaling NaNs by narrower operations?


-- Jacob

Jacob Bachmeyer

unread,
Mar 24, 2017, 9:47:50 PM3/24/17
to Andrew Waterman, Allen J. Baum, David Horner, RISC-V ISA Dev, Alex Bradbury
I think I get it now: rounding can be avoided at the expense of
duplicating hardware. An implementation with only a single full-width
FP pipeline will need to round after narrower operations, but an
implementation with multiple FP pipelines can simply choose one of the
appropriate width and feed a subset of the FP register to it/update that
subset with the result. The ISA spec should not constrain this choice,
since both of these approaches can easily interoperate.


-- Jacob

Stefan O'Rear

unread,
Mar 24, 2017, 9:52:04 PM3/24/17
to Jacob Bachmeyer, Allen Baum, Andrew Waterman, David Horner, RISC-V ISA Dev, Alex Bradbury
On Fri, Mar 24, 2017 at 6:44 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Allen Baum wrote:
>>
>> yea, but FCVT.S.D still won't take less than a cycle, so I don't think
>> that should be a consideration.
>> Again, IMHO, the reason to recode vs right justify&fill with 0/1/NaN is
>> primarily if you want to take advantage of using the same FPU with packed
>> SIMD or not.
>> Even then, the more i think about it, I'm not sure how much difference it
>> really makes.
>> In the recode case you need format conversion between single/ double and
>> packed, and in the right-justify case you need it between double and
>> single/packed single.
>> It seems like a wash from that perspective.
>> There are still some tradeoff subtleties:
>> - recode shifts the cost from the FPU to the Load/Store, so for
>> performance what really mattersis which of those (if either) are in a
>> critical path
>> - fill with NaN enables the ability to catch some errors - a nice to have
>
>
> It might be worth pointing out that I am specifically suggesting the NaN
> encoding only be specified for use with STORE-FP/LOAD-FP. Implementations

No, FMV.X.D et al need to work in a compatible way, this isn't just
about loads and stores

> are free to use any representation they want in the FP register file, as
> long they can remember what width of value was last written to each
> register. FP width mismatch (F value with D operation or D value with F

It is important to me that context switches be unobservable by D instructions

That means that a D register cannot be allowed to have more than 2^64
observable values

A register value FMV.X.S must behave in all regards exactly the same
as the float64 which would be produced by spilling and then reloading
it

> operation) would be treated as a NaN. Perhaps narrower values could be seen
> as quiet NaNs by wider operations, while wider values would be seen as
> signaling NaNs by narrower operations?

I do not see the purpose of complicating it in that way.

I see only two viable possibilities:

* Embed float32 in a float64 of the same numeric value (but we still
need to decide what happens to NaN payloads; left-aligning them with
zero bits on the right probably requires the fewest additional wires)

* Embed float32 in float64 by adding 1 bits on the left.

-s

Jacob Bachmeyer

unread,
Mar 24, 2017, 10:48:08 PM3/24/17
to Stefan O'Rear, Allen Baum, Andrew Waterman, David Horner, RISC-V ISA Dev, Alex Bradbury
Stefan O'Rear wrote:
> On Fri, Mar 24, 2017 at 6:44 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> Allen Baum wrote:
>>
>>> yea, but FCVT.S.D still won't take less than a cycle, so I don't think
>>> that should be a consideration.
>>> Again, IMHO, the reason to recode vs right justify&fill with 0/1/NaN is
>>> primarily if you want to take advantage of using the same FPU with packed
>>> SIMD or not.
>>> Even then, the more i think about it, I'm not sure how much difference it
>>> really makes.
>>> In the recode case you need format conversion between single/ double and
>>> packed, and in the right-justify case you need it between double and
>>> single/packed single.
>>> It seems like a wash from that perspective.
>>> There are still some tradeoff subtleties:
>>> - recode shifts the cost from the FPU to the Load/Store, so for
>>> performance what really mattersis which of those (if either) are in a
>>> critical path
>>> - fill with NaN enables the ability to catch some errors - a nice to have
>>>
>> It might be worth pointing out that I am specifically suggesting the NaN
>> encoding only be specified for use with STORE-FP/LOAD-FP. Implementations
>>
>
> No, FMV.X.D et al need to work in a compatible way, this isn't just
> about loads and stores
>

FMV.X.D and FMV.D.X can be specified to use the same format as
STORE-FP/LOAD-FP. The current spec imposes the same constraint:
> If the last value written to
> the source floating-point register was a single-precision floating-point value, then the value returned
> by FMV.X.D is undefined beyond having the property that moving the value back to a floating-
> point register will recreate the original single-precision value.
>

An implementation can use the same recoding hardware for memory access
and FMV.X.

>> are free to use any representation they want in the FP register file, as
>> long they can remember what width of value was last written to each
>> register. FP width mismatch (F value with D operation or D value with F
>>
>
> It is important to me that context switches be unobservable by D instructions
>

Context switches are unobservable--FSD either stores the
double-precision value as-is if the register holds a float64, or a
float32 encoded as a float64 NaN if the register holds a float32. FLD
loads a float64 as-is and interprets a float32 encoded as a float64 NaN
as a float32. *Any* use of a float32 value in a double-precision
operation is treated as NaN.

After:
FCVT.S.W f0, x0
FCVT.D.W f1, x0
FADD.D f2, f1, f0

The f2 register will contain the canonical NaN.

> That means that a D register cannot be allowed to have more than 2^64
> observable values
>

And it does not have more than 2^64 observable values--the tag bits are
invisible and the register appears to hold the NaN-encoded
single-precision value if it actually stores a single-precision value
but is accessed by a double-precision operation--*any* double-precision
operation, not just FSD or FMV.

> A register value FMV.X.S must behave in all regards exactly the same
> as the float64 which would be produced by spilling and then reloading
> it
>

Huh? FMV.X.S transfers a float32 from the FP register file. The result
of FMV.X.S if the source FP register contains a double-precision value
is undefined.

>> operation) would be treated as a NaN. Perhaps narrower values could be seen
>> as quiet NaNs by wider operations, while wider values would be seen as
>> signaling NaNs by narrower operations?
>>
>
> I do not see the purpose of complicating it in that way.
>

I see it as a means for software to use the FCLASS.x instructions in an
RVFDQ implementation to quickly determine what width of value is in an
FP register using at most two executions of FCLASS.x.


-- Jacob

David Horner

unread,
Mar 24, 2017, 11:52:56 PM3/24/17
to RISC-V ISA Dev, jcb6...@gmail.com, allen...@esperantotech.com, wate...@eecs.berkeley.edu, ds2h...@gmail.com, a...@asbradbury.org


I have not digested all the comments and their implications yet (and I may never fully)
but:

On Friday, 24 March 2017 21:52:04 UTC-4, Stefan O'Rear wrote:
On Fri, Mar 24, 2017 at 6:44 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Allen Baum wrote:

I see only two viable possibilities:


 I agree.

* Embed float32 in a float64 of the same numeric value

This causes the loss of internally tracked format information (_size_ especially and likely specifically).
And I see no way that it is recoverable (certainly not for the following float operation)..

Implementations that internally contain float32 in float64 using the same "value",
  arguably may not need that information (although some might benefit from it).
[as Andrew stated one should not assume internal representations and implementations]

One checking facility, not mandated by RISC-V, that I'd like to see is raising the "illegal operation flag" when operations and data-size mismatch.
It is a legitimate outcome of the standard as it now stands that implies such mismatched operations are undefined.
A context switch and back ( that dumps and restores using larger formats ) looses the information for the check of the next float operation.
 

 
(but we still
need to decide what happens to NaN payloads; left-aligning them with
zero bits on the right probably requires the fewest additional wires)

* Embed float32 in float64 by adding 1 bits on the left.


As mentioned elsewhere, this mandates tracking size in the implementation.
Ideally we would not wish to impose this is on an implementation that otherwise would not care to do so.

The question is if we mandate a relatively minor obligation on F32-in-F64 to retain the internal state of other implementations.

By the way, I believe Stefan O'Rear's suggestion of all 1 -NaN is not only cute, but elegant, I cannot think of a better value for a specific NaN.


-s

Some additional thoughts:
(The float load and store and fmv (pseudo op) of course will not raise the mismatch flag because they explicitly handle this use case.
As I read the fmv it propagates the size information as well as the existing format's data).

I believe the specs should be explicit when operations and data-size mismatch, thought I saw it stated at one time but it is certainly implied.

 I cannot think of a better value for a specific NaN than all 1 bits ......
However, we might be wanting to allow the for variants that encode additional info in say the last few bits of the upper half of the encoding for additional state information that is not portable, but might help the highest occurrence case of re-load on the same machine.
 It would have to be of the nature of a "hint" , and that in itself may be reason to dismiss this alternative:
   as an implementation that considers it valuable enough to encode in those bits may overly rely upon it, when the values are supplied by another implementation.

Please forgive me if  these thoughts were in previous submissions.

-dsh

Alex Elsayed

unread,
Mar 25, 2017, 1:54:19 PM3/25/17
to isa...@groups.riscv.org

I'm not convinced this "internally tracked format information" is a meaningful thing to talk about this way, and you seem to be misapprehending the meaning of "undefined behavior."

 

Undefined behavior is there _to allow for implementation freedom in handling it_, when a valid program would never encounter it. As a result:

 

- If it's undefined behavior, that "internally tracked format information" is actually entirely unnecessary

- If you define "mismatched operations" as "raises Illegal Operation" it is no longer undefined behavior, and actually _significantly constrains implementations_

 

For example, if mismatched operations (aside from FLD and FMV) are UB, then using the wrong-precision "fadd" may result in incorrect rounding - which is fine, so long as using the _correct-precision_ "fadd" results in the correct answer.

 

Such an implementation would _never_ need "internally tracked format information" - it might use a rounding step after performing a double computation, or it might use a separate single-precision unit, or any other implementation under the sun. It'd simply select behavior based on the format given _in the instruction_.

 

By comparison, if mismatched operations are _erroneous_, then it _must_ store that information - forcing complexity on informations - in order to _compare_ it against the information in the instruction.

 

I personally prefer the "store float32 as a float64 of the same value" representation. It's friendlier to the outside world, and doesn't invent a quirky new format. In an implementation with the area to expend on separate single/double units it has no performance downsides (and area is sufficiently available that it's being used on GPUs). In area-constrained implementations, it's going to have the same performance impact as _any_ approach to supporting both float and double with the same unit.

 

> > (but we still

> > need to decide what happens to NaN payloads; left-aligning them with

> > zero bits on the right probably requires the fewest additional wires)

> >

> > * Embed float32 in float64 by adding 1 bits on the left.

>

> As mentioned elsewhere, this mandates tracking size in the implementation.

> Ideally we would not wish to impose this is on an implementation that

> otherwise would not care to do so.

 

As above, no it does not.

 

> The question is if we mandate a relatively minor obligation on F32-in-F64

> to retain the internal state of other implementations.

>

> By the way, I believe Stefan O'Rear's suggestion of all 1 -NaN is not only

> cute, but elegant, I cannot think of a better value for a specific NaN.

 

Er, the "all 1 -NaN" _is_ "embed float32 in float64 by adding 1 bits on the left".

 

It just so happens that "the left" includes the sign bits, mantissa, etc, and thus results in a negative NaN.

 

>

> -s

>

>

> Some additional thoughts:

> (The float load and store and fmv (pseudo op) of course will not raise the

> mismatch flag because they explicitly handle this use case.

> As I read the fmv it propagates the size information as well as the

> existing format's data).

>

> I believe the specs should be explicit when operations and data-size

> mismatch, thought I saw it stated at one time but it is certainly implied.

 

Clearly marking it as undefined behavior would be good. Making it _defined erroneous_ behavior is a very different beast.

signature.asc

Alex Elsayed

unread,
Mar 25, 2017, 1:56:31 PM3/25/17
to RISC-V ISA Dev

On Saturday, 25 March 2017 10:54:09 PDT Alex Elsayed wrote:

> On Friday, 24 March 2017 20:52:56 PDT David Horner wrote:

<snip>

> > By the way, I believe Stefan O'Rear's suggestion of all 1 -NaN is not only

> > cute, but elegant, I cannot think of a better value for a specific NaN.

>

> Er, the "all 1 -NaN" _is_ "embed float32 in float64 by adding 1 bits on the

> left".

>

> It just so happens that "the left" includes the sign bits, mantissa, etc,

> and thus results in a negative NaN.

 

Gah, s/bits/bit/ and s/mantissa/exponent/. That'll teach me to send email right after waking up :/

signature.asc

David Horner

unread,
Mar 25, 2017, 4:35:16 PM3/25/17
to RISC-V ISA Dev, sor...@gmail.com, allen...@esperantotech.com, wate...@eecs.berkeley.edu, ds2h...@gmail.com, a...@asbradbury.org, jcb6...@gmail.com

The current spec does not mandate this.
It only dictates what must occur when FADD.D is operating on double values.
An implementation can instead "perform" a implicit FCVT.D.S f0,f0 (which might require no change of internal state) and perform the FADD.D.
Alternatively, the implementation is free to produce a garbage result.
The onus is on applications to ensure they match ops and formats.

There is wisdom in ensuring implementations are not over constrained; especially not to impose behaviour that only occurs when apps misbehave.



<clip>
 
>> operation) would be treated as a NaN.  Perhaps narrower values could be seen
>> as quiet NaNs by wider operations, while wider values would be seen as
>> signaling NaNs by narrower operations?
>>    
>
> I do not see the purpose of complicating it in that way.
>  

I see it as a means for software to use the FCLASS.x instructions in an
RVFDQ implementation to quickly determine what width of value is in an
FP register using at most two executions of FCLASS.x.

 
If the actual value in a float32 is NaN then how is the "actual" size determined when both FCLASS.D and FCLASS.S return NaN?

If the double NaN wrapper encoding for Single is used, then an FMV.X.D can be checked directly.

If float32 value in float64 is the encoding, then FCLASS needs to be extended to definitively determine the size (as determined by the last legal float operation).
However, if FCLASS is expanded to return size information, then internally the implementation must track current float size, which would moot one of the argument to use the F32-in-F64 encoding.

 

-- Jacob

Andrew Waterman

unread,
Mar 25, 2017, 4:41:36 PM3/25/17
to Alex Elsayed, RISC-V ISA Dev
+1
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/19635410.tLkE1zVqnO%40arkadios.

Tommy Thorn

unread,
Mar 25, 2017, 5:52:15 PM3/25/17
to David Horner, RISC-V ISA Dev, sor...@gmail.com, allen...@esperantotech.com, wate...@eecs.berkeley.edu, a...@asbradbury.org, jcb6...@gmail.com

> There is wisdom in ensuring implementations are not over constrained; especially not to impose behaviour that only occurs when apps misbehave.

Not wisdom, nativity. All undefined behavior is defined by hardware. Having it undefined by spec means both that we can observed undefined behavior on different implementations and that we can't predict behavior on real hardware.

History is ripe with examples of undefined behavior being fertile ground for security loopholes.

Please close the hole.

Tommy

David Horner

unread,
Mar 25, 2017, 6:06:45 PM3/25/17
to RISC-V ISA Dev
Tracking float size (F32, F64, F16 or F128) is necessary if the NaN wrapper approach is standardized.
 It determines for a Single, Double or Quad Float store if the current precision matches the operators precision and the full IEEE representation for it is stored, or if a NaN wrapper is required for the inserted smaller actual precision data value.

Note, if the RISC-V specification did not impose the guarantee on FSD for "a floating-point register holds a single-precision value" that the single-precision value would be restored, then the onus would be on the user program to ensure the operating system and libraries were aware of each registers current "precision"/size.

However, RISC-V has made the stipulation and there appears to be only the two approaches to encode.

 

- If you define "mismatched operations" as "raises Illegal Operation" it is no longer undefined behavior, and actually _significantly constrains implementations_


I agree. Note, I did not recommend this as a specification, but only an implementation that I would quite like to see (preferably universally, but that is just my bent).
And as I mentioned, it is fully within the current specifications that are quite lenient on what can be implemented in the "error" case.
But, as I also mentioned, it cannot work consistently with the Save Lower Precision Value In Higher Precision Format implementation of the guarantee. . 
 

Andrew Waterman

unread,
Mar 25, 2017, 6:25:26 PM3/25/17
to David Horner, RISC-V ISA Dev
Arithmetic instructions in RISC-V do not cause traps based upon
argument values. Adding value-dependent traps for this one case is an
undue burden. And I'd argue that making the trapping behavior
optional defeats the purpose of closing this specification hole.

So, I conclude that trapping on mismatched values should not factor
into this discussion.
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/2be0478b-d5c3-46e8-b2a2-b3e2a8a3bd27%40groups.riscv.org.

Michael Clark

unread,
Mar 25, 2017, 7:26:57 PM3/25/17
to David Horner, RISC-V ISA Dev
It depends on the internal representation. Tracking float size explicitly is required for a recoded / expanded internal representation; as the exponent can’t hold a valid value and record a NaN concurrently, but a packed internal representation where the exponent is read out from different bit locations for the different size operations could implicitly track size based on the enclosing NaN coded in the alternate exponent position. Implementation details.

The type can be implicitly tracked in a packed internal representation without any extra bits given the reserved NaN values to indicate the type (due to the alternate exponent locations):

-Nan(0xFFFFFFFFFxxxx) // HP in DP NaN
-Nan(0xFFFFFxxxxxxxx) // SP in DP NaN

On an implementation that uses a packed internal representation, FLW and FMV.S.X would need to set the most significant bits to 1 (essentially sign extend the integer representation).

For the alternative behaviour of expanding into the larger type, e.g. expanding an SP into a DP on FSD and FMV.X.D, an implementation with a packed internal representation has no information to reconstruct its internal representation, so mandating that behaviour would essentially prevent packed internal representations.

The NaN idea is very neat. Whether or not it should be “specified behaviour” or “implementation defined behaviour” behaviour is the issue.

Portable VM images (in the broader sense i.e. including core dumps, and thread migration as VM images) is the pivoting issue. If there is to be implementation flexibility, i.e. the possibility of implementations with both recoded and packed internal representations, then I think the NaN encoding makes the most sense as I can’t see how a packed internal representation could reconstruct its internal state without the size hint which is the whole purpose of the NaN encoding as “external representation”. External representation as the widest type as a “specified behaviour” essentially excludes packed internal representations. 

Portable VM images is quite a big issue. Imagine an app that uses floats being migrated or restored between expanded and NaN encodings. Glitches and crashes. linux-kernel would be fine but many apps would be dead after migration.

I think Stefan’s NaN idea is a very good idea. A recoded internal implementation could with effort support the NaN external representation but a packed representation cannot support reconstitution of an SP internal state from an externalised valid DP¿

So it is really just the NaN approach if we want to allow both packed and expanded recoded implementations. i.e. offer the most implementation flexibility with a novel width coding in the external representation.

Note, if the RISC-V specification did not impose the guarantee on FSD for "a floating-point register holds a single-precision value" that the single-precision value would be restored, then the onus would be on the user program to ensure the operating system and libraries were aware of each registers current "precision"/size.

However, RISC-V has made the stipulation and there appears to be only the two approaches to encode.

The guarantee has to be made so that context switching code works using FLEN width loads and stores (as it implicitly doesn’t know the type).

David Horner

unread,
Mar 25, 2017, 7:43:35 PM3/25/17
to RISC-V ISA Dev, ds2h...@gmail.com


On Saturday, 25 March 2017 18:25:26 UTC-4, andrew wrote:
On Sat, Mar 25, 2017 at 3:06 PM, David Horner <ds2h...@gmail.com> wrote:
>
>

>
> However, RISC-V has made the stipulation and there appears to be only the
> two approaches to encode.
>


Arithmetic instructions in RISC-V do not cause traps based upon
argument values.  Adding value-dependent traps for this one case is an
undue burden.  And I'd argue that making the trapping behavior
optional defeats the purpose of closing this specification hole.

So, I conclude that trapping on mismatched values should not factor
into this discussion.


I fully agree, and apologize that " I'd like to see is raising the "illegal operation flag" " did not clearly specify the intent of setting  NX (Invalid Operation) in FFLAGS, and not a trap.
 

Jacob Bachmeyer

unread,
Mar 25, 2017, 8:19:39 PM3/25/17
to David Horner, RISC-V ISA Dev, sor...@gmail.com, allen...@esperantotech.com, wate...@eecs.berkeley.edu, a...@asbradbury.org
David Horner wrote:
> On Friday, 24 March 2017 22:48:08 UTC-4, Jacob Bachmeyer wrote:
>
>
> Context switches are unobservable--FSD either stores the
> double-precision value as-is if the register holds a float64, or a
> float32 encoded as a float64 NaN if the register holds a float32.
> FLD
> loads a float64 as-is and interprets a float32 encoded as a
> float64 NaN
> as a float32. *Any* use of a float32 value in a double-precision
> operation is treated as NaN.
>
> After:
> FCVT.S.W f0, x0
> FCVT.D.W f1, x0
> FADD.D f2, f1, f0
>
> The f2 register will contain the canonical NaN.
>
>
> The current spec does not mandate this.
> It only dictates what must occur when FADD.D is operating on double
> values.
> An implementation can instead "perform" a implicit FCVT.D.S f0,f0
> (which might require no change of internal state) and perform the FADD.D.
> Alternatively, the implementation is free to produce a garbage result.

You are correct that the current spec does not require this. I was
explaining that the proposal to encode narrower floats into wider NaNs
can be consistent. The assembly snippet that I provided has defined
behavior if the proposal to use NaN encoding is adopted and was written
to show that the proposed behavior is consistent.

> The onus is on applications to ensure they match ops and formats.
>
> There is wisdom in ensuring implementations are not over constrained;
> especially not to impose behaviour that only occurs when apps misbehave.

On the one hand, every "don't care" in hardware is probably at least one
fewer gate somewhere (and sometimes many fewer gates). On the other
hand, assuming that apps never misbehave is precisely how security
exploits happen and one of the reasons I have been persistently asking
that supervisor instruction fetch from user memory be categorically
prohibited in RISC-V.

> >> operation) would be treated as a NaN. Perhaps narrower values
> could be seen
> >> as quiet NaNs by wider operations, while wider values would be
> seen as
> >> signaling NaNs by narrower operations?
> >>
> >
> > I do not see the purpose of complicating it in that way.
> >
>
> I see it as a means for software to use the FCLASS.x instructions
> in an
> RVFDQ implementation to quickly determine what width of value is
> in an
> FP register using at most two executions of FCLASS.x.
>
>
> If the actual value in a float32 is NaN then how is the "actual" size
> determined when both FCLASS.D and FCLASS.S return NaN?

That is a potential problem. OK, a better option: FCLASS.x currently
returns a 10-bit one-hot value; expand that to 12 bits with the
additional two explicitly indicating "size mismatch larger/smaller". Or
to 11 bits with an additional "size mismatch" bit, but that eliminates
the option to find the actual size by binary search. (Then again, how
many float sizes will RISC-V ever support? And the 12th bit might be
more useful as "radix mismatch".)

> If the double NaN wrapper encoding for Single is used, then an FMV.X.D
> can be checked directly.

Only on RV64; FMV.X.D does not exist on RV32.

> If float32 value in float64 is the encoding, then FCLASS needs to be
> extended to definitively determine the size (as determined by the last
> legal float operation).
> However, if FCLASS is expanded to return size information, then
> internally the implementation must track current float size, which
> would moot one of the argument to use the F32-in-F64 encoding.

The problem is that, after a context switch on RVG, the last legal float
operation will have been FLD on all registers. The use of a NaN
encoding to store narrower floats in wider cells allows that length
information to be preserved without requiring software to explicitly
track it. The additional hardware burden on RVG is at most a single tag
bit per FP register to indicate if that register currently holds a
single-precision value. One additional column in a 32-entry register
file is a negligible incremental burden compared to the rest of the FPU.


-- Jacob

kr...@berkeley.edu

unread,
Mar 26, 2017, 6:30:27 AM3/26/17
to RISC-V ISA Dev

Thanks all for a long email discussion.

I'm presenting a concrete proposal here to provide a fresh starting
point for discussion.

The proposal is that when an FP register holds a value of type n
narrower than FLEN-width type m, the value is converted to the widest
type m holding the numerically equivalent value when used as the
source value for any widest transfer instruction (FSm/FMV.X.m).
n-type NaN values are left-aligned with zero bits added on the right
to make an m-type NaN value.

Pros:

- This is natural for implementations with internal recoding.

- (minor pro) Software can rely on this behavior to combine transfer
with conversion to wider type (as this would be part of the spec,
there is no way to prevent software relying on this). Note, that FSD
of a single-precision value is not equvalent to FCVT.D.S followed by a
FSW, as the FCVT.D.S will not propagate a NaN payload (unless
non-standard NaN payload propagation extension is implemented).

Cons:

- Unrecoded implementations (which operate on values internally in
IEEE standard format) have to expand n-type values in registers into
m-type sources for transfer instructions, but this logic must
already be present to support FCVT.m.n.

- Unrecoded implementations have to convert m values in registers into
n values before use. This cannot be done as part of a transfer into
the register file (FLm/FMV.m.X), as it is not known at that time
whether the contents will be used as an n-type or m-type value.
Obtaining n-type subnormals from m-type normalized numbers is one
particular complication - as the significand bits are not in the
"correct" position relative to narrower in-memory format. However,
the bits are actually in a better position to simplify functional
unit design and the first stage of an unrecoded implementation would
either take a trap or move them back to this position anyway. This
is the big motivation to use recoding in the first place.

Discussion:

* Why not leave undefined?

Defining the external representation removes a source of
incompatibility between implementations. This simplifies design
verification, security verifcation, debug, task/VM migration, etc.

The benefits of defining this outweigh the very minor backwards
incompatibility introduced (no existing RISC-V-compliant software can
be sensitive to this change, few (no?) commercial hardware
implementations will be rendered obsolete by this change).

* Why not NaN encoding?

The alternative proposal was to encode n values within m NaN encoding
space. This is undesirable as it encroaches upon NaN encoding space,
which could affect/interact with other uses of NaN encoding space (NaN
payload propagation, JIT boxing, etc.). In addition, if the
conversion to wider type is mandated, it can be used by software
during conversion between types, whereas NaN encoding requires
separate explicit conversion instruction. Also, for many optimized
FPU implementations the wider encoding is natural.

* Catching illegal use of single as double?

While there was discussion about the desire to catch illegal use of
the external value holding type n as a value of type m, the proposal
makes this a moot point, as the external value is a legal value of
type m. This is now similar to how integer registers can hold program
variables of any type, including signed or unsigned of different
widths, and it is up to software to interpret values correctly, and in
some cases the intrinsic conversions are what is anyway required.

* What about other uses of FP registers, e.g., decimal?

If additional types of narrower width are later added for the FP
registers, they would also have to have defined external
representations. For example, if decimal FP is added, the natural
extension would be that narrower decimal FP values are expanded to the
widest decimal format (though decimal FP has a complication that
numbers can be non-normalized and hence have non-canonical
representation).

* What about packed-SIMD?

Another concern was the interaction with packed-SIMD
implementations. Packed-SIMD implementations have to cope with a
variety of instructions interpreting the same register bit pattern as
different types. With this proposal, a packed-SIMD implementation
would require that scalar FP values are always represented as an
FLEN-wide scalar FP value if used as input to a non-scalar-FP
instruction. For example, in a machine with FLEN=64, a scalar
single-precision 32-bit floating-point value in an FP register would
appear as the equivalent 64-bit double-precision IEEE bit pattern when
fed to a 4x16-bit packed-SIMD integer instruction.

Another concern in a packed-SIMD implementation is sharing FPU
hardware between scalar and vector execution datapaths. We note that
a scalar FLEN-wide FPU can be modified to support narrower scalar FP
datatypes at low cost, particularly if the internal representation is
per the proposal, though there will be some penalty in latency and
energy. Narrower FPUs dedicated to the packed-SIMD format can then be
used for the vector operations. This design can also be used for
longer packed-SIMD vectors that are composed of N*FLEN-bit elements.

Further implementation optimizations/tradeoffs are possible, including
dedicated lower-latency scalar FPUs for narrower scalar operations
operating off the widened format, and repurposing the FLEN-wide FPU to
operate as one of the narrower FPUs in a vector operation by expanding
the packed FP format to the wider format.

Given these mitigations, there does not seem to be a large problem
with packed-SIMD implementations as currently sketched in the spec.

Also, at the last (5th) workshop there was some discussion in a
working group about the interaction of the vector proposal and the
packed-SIMD proposal. The direction from that group was to drop the
current form of the packed-SIMD proposal as it was redundant with the
vector proposal. However, there was interest in packed-SIMD being
provided for low-end implementations without an FPU, where the
packed-SIMD extension would provide fixed-point arithmetic
instructions operating out of the integer register file to accelerate
DSP functions. This is obviously different than the current P sketch,
and has not been fleshed out.

Suggestions/Comments/Corrections welcome,

Krste

Alex Bradbury

unread,
Mar 26, 2017, 8:51:23 AM3/26/17
to Krste Asanovic, RISC-V ISA Dev
On 26 March 2017 at 11:30, <kr...@berkeley.edu> wrote:
>
> Thanks all for a long email discussion.
>
> I'm presenting a concrete proposal here to provide a fresh starting
> point for discussion.
>
> The proposal is that when an FP register holds a value of type n
> narrower than FLEN-width type m, the value is converted to the widest
> type m holding the numerically equivalent value when used as the
> source value for any widest transfer instruction (FSm/FMV.X.m).
> n-type NaN values are left-aligned with zero bits added on the right
> to make an m-type NaN value.

Hi Krste. I'm a bit concerned that this leaves undefined behaviour for
transfer instructions narrower than FLEN. Considering an RV64IFDQ
system (so FLEN=128). It's very likely I'd want to stick to the RV64G
ABI and avoid recompiling all my libraries, only making use of Q in a
small number of libraries. However I'd need the result of FSD and
FMV.X.D to be well-defined, at least when performed on registers that
contained values of type narrower than FLEN. I'd have to ensure the
kernel context switching code used FSQ, but compiler-inserted spills
in non-Q userspace code would still be using FSD and FLD.

Best,

Alex

kr...@berkeley.edu

unread,
Mar 26, 2017, 10:38:26 AM3/26/17
to Alex Bradbury, Krste Asanovic, RISC-V ISA Dev

Yes, proposal should be amended to say that wider transfers (not just
widest) also transfer the equivalent value of a narrower type.

Krste

Michael Clark

unread,
Mar 26, 2017, 5:23:17 PM3/26/17
to kr...@berkeley.edu, RISC-V ISA Dev

On 26 Mar 2017, at 11:30 PM, kr...@berkeley.edu wrote:

* Why not NaN encoding?

The alternative proposal was to encode n values within m NaN encoding
space.  This is undesirable as it encroaches upon NaN encoding space,
which could affect/interact with other uses of NaN encoding space (NaN
payload propagation, JIT boxing, etc.).  In addition, if the
conversion to wider type is mandated, it can be used by software
during conversion between types, whereas NaN encoding requires
separate explicit conversion instruction.  Also, for many optimized
FPU implementations the wider encoding is natural.

I was thinking about JIT boxing and whether these NaN values would ever escape their specific context and collide. The NaN values are primarily used by context switching code to preserve width information when saving a narrower type in a wider type so the NaN values would usually only appear in the context switch save area (the context switch code being the only code that doesn’t know what width is stored in each of the FPU registers). Compiler generated code would of course use the correct width specific operation. The JIT boxing implementation will (in RV64 mode) be storing doubles so would never see these NaNs and its boxed NaNs would be used in the integer path prior to deciding whether to move a value into an FPU register, so in the case of a collision on the boxed integer side, they would never actually make it from the integer path to the FPU register file, so no “effective” collision. JIT boxing would also need to be redesigned for RV128 or Q.

Nevertheless I understand the rationale for externalising to the wider type, however it is not a width information preserving operation so excludes some packed implementation approaches (which may after all be rare due to the design of the RISC-V FPU register file). Alternate implementations may need to treat all floats as doubles or just implement IMAD.

I note that Packed SIMD obviously can’t preserve width information as the full width of the register would be used to store multiple values, however Packed SIMD does of course lend itself to a binary save of the packed registers versus performing implicit conversions.

The NaN width encapsulation is quite novel. It may crop up somewhere else where storing a packed representation while preserving width information makes more sense. It remains a potentially useful encoding.

Jacob Bachmeyer

unread,
Mar 26, 2017, 7:37:24 PM3/26/17
to Michael Clark, kr...@berkeley.edu, RISC-V ISA Dev
Michael Clark wrote:
>> On 26 Mar 2017, at 11:30 PM, kr...@berkeley.edu
As I understand it, IMAD is not allowed--the presence of Q implies the
presence of D implies the presence of F.

> I note that Packed SIMD obviously can’t preserve width information as
> the full width of the register would be used to store multiple values,
> however Packed SIMD does of course lend itself to a binary save of the
> packed registers versus performing implicit conversions.

As I understand it, Packed SIMD is expected to be a DSP-oriented
extension for RISC-V and focus on fixed-point arithmetic. Since context
switch saves FLEN-width values, packed SIMD tuples would be saved and
restored correctly.

> The NaN width encapsulation is quite novel. It may crop up somewhere
> else where storing a packed representation while preserving width
> information makes more sense. It remains a potentially useful encoding.

I jumped on Alex Bradbury's proposal for the NaN encoding because it
looked interesting and allowed consistent, defined behavior for FP
operations with incorrect inputs (treat bogus input as NaN). Always
storing FP values as FLEN-width internally and doing conversions to and
from that is also consistent (implied conversion on bogus input) but
more-or-less mandates recoding in the FPU, while NaN encoding
more-or-less mandates tag bits if recoding is used, but does not mandate
recoding. Either of these will have an impact on hardware; the
questions then are: are recoding FPUs simpler than non-recoding FPUs?
If not, are tag bits a significant increase in area/power/wiring/etc.
for a recoding FPU beyond the inherent difference between a recoding and
non-recoding FPU?

My concern is that implicit conversions to wider types may favor
performance-optimized FPUs over area-optimized FPUs and I believe that
we should try to keep the minimal-area RISC-V FPU as small as we
reasonably can. (A standard external encoding is needed, for good
reasons that others have explained.)


-- Jacob

Andrew Waterman

unread,
Mar 26, 2017, 7:59:05 PM3/26/17
to Jacob Bachmeyer, Michael Clark, Krste Asanovic, RISC-V ISA Dev
Happily, the most area-conscious FPUs, those in RV32F designs, are
agnostic to this discussion.

Otherwise, our experience has been that, while recoding is obviously a
performance optimization, it doesn't have a first-order impact on
area. Most of the impact is to move the normalization hardware into
the load/store pipe, and out of the functional units. Depending on
the FPU design, this might either slightly reduce or slightly increase
area. The tiniest FPUs will already be designed to share the
normalization hardware between the various functional units, anyway,
so it's probably very minor.

>
>
> -- Jacob
>
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/58D850B2.8050202%40gmail.com.

Michael Clark

unread,
Mar 26, 2017, 8:27:11 PM3/26/17
to jcb6...@gmail.com, kr...@berkeley.edu, RISC-V ISA Dev
That was my conclusion too. The NaN encoding allows recoding (albeit with internal width tags) as well as a packed internal representation. It seemed to be a logical superset as the external representation is width preserving.

I can understand why one would want to avoid the complexity of the NaN encoding even if it does sacrifice implementation flexibility, depending on what is the prevailing “implementation defined” behaviour. It makes sense for obvious reasons that a prevalent “implementation defined” behaviour becomes the standardised behaviour.

Allen Baum

unread,
Mar 27, 2017, 12:32:24 AM3/27/17
to Andrew Waterman, Jacob Bachmeyer, Michael Clark, Krste Asanovic, RISC-V ISA Dev
I think the main issue is performance, but it FP performance. If putting the reformat logic in the load/store pipe extends load/store cycle, that affects the entire machine cycle, most likely, as opposed to adding a stage to the FPU pipe. I'm not saying that will happen (and given byte level alignment logic for regular loads, unlikely) but that what you need to watch out for.

-Allen
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0DKgpENVe-AEToD_pBA2AMesDA15X4duJudnZV3HjgXhw%40mail.gmail.com.

Andrew Waterman

unread,
Mar 27, 2017, 1:06:18 AM3/27/17
to Allen Baum, Jacob Bachmeyer, Michael Clark, Krste Asanovic, RISC-V ISA Dev
In any case, it can be pipelined to avoid the cycle time increase, and
without affecting integer load latency.

Alex Bradbury

unread,
Mar 27, 2017, 4:17:46 AM3/27/17
to jcb6...@gmail.com, Michael Clark, Krste Asanovic, RISC-V ISA Dev
On 27 March 2017 at 00:37, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> I jumped on Alex Bradbury's proposal for the NaN encoding because it looked
> interesting and allowed consistent, defined behavior for FP operations with
> incorrect inputs (treat bogus input as NaN). Always storing FP values as
> FLEN-width internally and doing conversions to and from that is also
> consistent (implied conversion on bogus input) but more-or-less mandates
> recoding in the FPU, while NaN encoding more-or-less mandates tag bits if
> recoding is used, but does not mandate recoding. Either of these will have
> an impact on hardware; the questions then are: are recoding FPUs simpler
> than non-recoding FPUs? If not, are tag bits a significant increase in
> area/power/wiring/etc. for a recoding FPU beyond the inherent difference
> between a recoding and non-recoding FPU?

Hi Jacob, the NaN encoding idea was actually first floated by Krste -
I've updated <https://gist.github.com/asb/a3a54c57281447fc7eac1eec3a0763fa>
to clarify its origin.

Best,

Alex

Victor Moya

unread,
Mar 28, 2017, 6:28:40 AM3/28/17
to RISC-V ISA Dev, kr...@berkeley.edu

I don't completely follow the original reason for allowing in RISC-V for hardware implementations to expose their internal representations of IEEE754 values.  Programmer exposed registers should store bits from memory or bits from operations, IEEE754 operations in this case, not funny implementation dependant bits that are not defined anywhere.

But once that has been, unsurprisingly, discovered to be a big issue I'm not sure why doubling down and deciding on a new 'standard' that goes away from consolidated industry practice to keep exposing that in some hardware implementations may do funny things with IEEE754 values would be a good idea.

When a n bit value is loaded into a m (m > n) register the usual industry options for the upper bits are:

  a) keep the existing values
  b) force the bits to a fixed known value (0s)
  c) undefined

For me a) and b) are clearly the most sane from a programmer and hardware implementation agnostic point of view.

The new option that is being proposed here, keeping with the initial approach that some specific hardware implementations of IEEE754 needs to be 'protected', basically means that the operation is not a load, but a load a 32-bit IEEE754 value and upconvert to 64-bit IEEE754.  Which to me looks completely unnecessary when the programmer and the hardware just wants to do a load.  Same for the corresponding store which should be a store of bits, not another data conversion operation.

Victor

kr...@berkeley.edu

unread,
Apr 3, 2017, 9:35:43 AM4/3/17
to RISC-V ISA Dev

After several mails on list and in private, we decided to take another
look at the proposal for handling different width floating-point
operations in the f registers. We now believe we have a better
solution that does not excessively penalize any particular
implementation style.

This new proposal goes back to using NaN-boxing of narrower width
results in wider f registers. Proposal first, then commentary on how
it affects implementations, followed by pros/cons:

----------------------------------------------------------------------
Proposal:

The f registers are FLEN bits wide. An n-bit floating-point
operation, where n<FLEN, reads input operands from the n
least-significant bits of the source f registers and writes the result
to the n least-significant bits of the destination f register, but
also places all 1s in the uppermost FLEN-n bits of the destination f
register. This 1-extended value represents a negative quiet NaN when
interpreted as any m-bit floating-point value, for n < m <= FLEN.

If an n-bit operation reads register operands previously generated by
a wider m-bit operation (m>n), then the implementation might be very
slow, and so software should avoid generating this code. If this
operation is required (unlikely), the m-bit value should be explicitly
extracted using a wider FSm or an FMV.X.m transfer operation and then
explicitly truncated before being returned to the f registers using a
narrower LDn or FMV.n.X.

----------------------------------------------------------------------
- Commentary for non-recoded implementation

The internal f registers store the literal bit patterns given in the
specification. The floating-point unpacks and repacks the
floating-point values into and out of the external representation on
every operation. Transfer operations (FL*,FMV.X.*) use the value in
the internal register.

- Commentary for recoded implementation

For implementations supporting t>1 different FP types, the internal
registers are tagged with the width of the operation that produced the
result in the register. The tag need not require additional
microarchitectural state bits by making use of a different internal
NaN encoding.

Transfers in to f registers:

When an m-bit value, m<=FLEN, is transferred into an f register
(either with a FLm or a FMV.m.X), the value is scanned to determine if
it is a negative NaN containing an embedded narrower precision value,
n-bits wide (n<m). If t>2, the NaN value is interpreted as the
narrowest type n such that all upper FLEN-n bits are 1.

If this value matches a NaN-encoded narrower type, the internal
recoding scheme generates an internal recoded value based on the lower
n bits, and sets the width tag to n. Otherwise, the value is
interpreted as an m-bit wide value and the width tag is set to m.
Note, there is a possibility the value was actually an m-bit negative
qNaN produced external to the FPU, but this does not cause incorrect
operation of the scheme as an n-bit-tagged value used as input to any
wider w-bit (w>n) operation will be treated as a qNaN.

Transfers out of f registers:

When the transfer size is equal to the internal tag width, the
internal value is packed back to IEEE format. All roundings will have
occured during the operation producing the result - the primary work
is in creating subnormals by shifting the internal normalized value by
the appropriate number of bits and recreating the leading 1 bit.

When the transfer size is narrower than the internal tag width, the
external value produced is a portion of the significand of the larger
internal value, so just requires removing the high bits of the value
produced for the correct internal tag width.

When the transfer size is wider than the internal tag width, the n-bit
external representation is created as above, then a fixed pattern of 1
bits is produced in the upper bits of the wider result.

Operations on matching types:

If the tag fields on input operands matches the type of the operation,
the results are calculated and rounded as expected, and the result is
created with an n-bit tag field.

Operations on narrower types:

If the tag field of an input operand is narrower than required for the
requested operation, then the input operand is treated as a quiet NaN.

Operations on wider types:

If a register containing a wider m-bit operand is used as input to a
narrower n-bit operation, the result should be as if the lower n-bits
of the external m-bit representation were used. This operation should
never occur in correct code, so can be handled by taking a slow M-mode
trap to fix up the results. The trap handler can use an FSm to create
the correct external representation, then use FLn to recreate the
equivalent n-bit internal value, and should obliviously perform this
for all input operands regardless of whether they caused the trap,
then execute the n-bit FP instruction (directly or in emulation),
before restoring the original bit patterns in any non-result registers
and returning.

----------------------------------------------------------------------

Pros/Cons/Discussion

The advantages of this new proposal are that the f registers have a
defined bit encodings regardless of datatype, simplifying using the f
registers to hold other data types in future standard or non-standard
extensions.

The cost to the recoded implementation is primarily the extra tagging
needed to track the internal types, but this can be done without
adding new state bits by recoding NaNs internally. Small
modifications are needed to the pipelines used to transfer values in
and out of the recoded format, but the datapath and latency costs are
minimal. The recoding process has to handle shifting of input
subnormal values for wide operands in any case, and extracting the
NaN-boxed value is a similar process to normalization except for
skipping over leading-1 bits instead of skipping over leading-0 bits,
allowing the datapath muxing to be shared.

There were some worries about polluting the NaN encoding space, but
this is not a real concern as the new NaN values are only produced as
the result of requested floating-point operations, and the
specification performs as expected if the negative qNaN values were
produced in some other way.

Comments/feedback welcome,

Krste

Allen J. Baum

unread,
Apr 3, 2017, 11:11:29 AM4/3/17
to kr...@berkeley.edu, RISC-V ISA Dev
At 6:35 AM -0700 4/3/17, kr...@berkeley.edu wrote:
>
>Operations on wider types:
>
>If a register containing a wider m-bit operand is used as input to a
>narrower n-bit operation, the result should be as if the lower n-bits
>of the external m-bit representation were used. This operation should
>never occur in correct code, so can be handled by taking a slow M-mode
>trap to fix up the results.

(assuming a 32-bit floating op on registers with 64-bit floats in them)
So, does this imply that 32-bit operations should enforce that the upper 32 bits of input for all operands are all1s, and trap if they are not, as well as setting the upper 32-bits to all1s in the result?

If so, I don't see where we need to do any tagging at all (though it might be simpler than checking that the upper-32bits are all zeroes)
--
**************************************************
* Allen Baum tel. (908)BIT-BAUM *
* 248-2286 *
**************************************************

Jacob Bachmeyer

unread,
Apr 3, 2017, 6:13:18 PM4/3/17
to kr...@berkeley.edu, RISC-V ISA Dev
kr...@berkeley.edu wrote:
> ----------------------------------------------------------------------
> Proposal:
>
> The f registers are FLEN bits wide. An n-bit floating-point
> operation, where n<FLEN, reads input operands from the n
> least-significant bits of the source f registers and writes the result
> to the n least-significant bits of the destination f register, but
> also places all 1s in the uppermost FLEN-n bits of the destination f
> register. This 1-extended value represents a negative quiet NaN when
> interpreted as any m-bit floating-point value, for n < m <= FLEN.
>
> If an n-bit operation reads register operands previously generated by
> a wider m-bit operation (m>n), then the implementation might be very
> slow, and so software should avoid generating this code. If this
> operation is required (unlikely), the m-bit value should be explicitly
> extracted using a wider FSm or an FMV.X.m transfer operation and then
> explicitly truncated before being returned to the f registers using a
> narrower LDn or FMV.n.X.
>

FCVT does exactly the operation needed here (converting floating-point
values between different widths). Actually type-punning floating-point
bit patterns is extremely unlikely to be useful in any application.

> ----------------------------------------------------------------------
>
> [...]
>
> Operations on wider types:
>
> If a register containing a wider m-bit operand is used as input to a
> narrower n-bit operation, the result should be as if the lower n-bits
> of the external m-bit representation were used. This operation should
> never occur in correct code, so can be handled by taking a slow M-mode
> trap to fix up the results. The trap handler can use an FSm to create
> the correct external representation, then use FLn to recreate the
> equivalent n-bit internal value, and should obliviously perform this
> for all input operands regardless of whether they caused the trap,
> then execute the n-bit FP instruction (directly or in emulation),
> before restoring the original bit patterns in any non-result registers
> and returning.
>

Stefan O'Rear warned that using traps to handle subnormals is a security
risk (in message-id
<CADJ6UvOoeYcwLVPKeGSbTHOF...@mail.gmail.com>);
are traps to handle invalid operations any safer? This would also be
the first synchronous FPU trap in RISC-V; all other FPU errors merely
accrue exceptions in fcsr.

I agree that blindly extracting bits from a wider value for a narrower
operation is a good choice, since it should reliably produce garbage
results to indicate that the code is faulty. Hardware is however
uniquely positioned to handle this conversion by performing an implicit
fused "FMV.X.D; FMV.S.X" using invisible FLEN-bit temporaries (i.e.
pipeline wiring rather than actual registers). This may be slightly
slower in a recoding FPU, since it is reasonable to optimize assuming
that invalid operations like this are extremely rare and use a few
internal scratchpad registers for this operation.

I do however argue that this is an invalid operation and should set the
FPU "Invalid Operation" flag. Does IEEE754 prohibit that? If so, could
we add an "invalid operand" flag to fcsr for this case?

Going further, could we simply declare "narrow operation on wider
operands" to be an invalid operation that produces a NaN result?
(Admittedly, this would require non-recoding FPU implementations to
ensure that narrower operations are actually using NaN-boxed operands,
but the "is-narrow-value-in-boxed-NaN" outputs are combinatorial
functions of the register contents and can be generated when a value is
written to the register. A small bit of additional logic on each
register could generate these signals asynchronously, effectively hiding
the gate-delay costs albeit with an area tradeoff. The additional gates
switch at most once when the register is loaded with a new value, so the
decrease in logic density is also a decrease in power density, and the
overall effects could go either way for an implementation.)


-- Jacob

kr...@berkeley.edu

unread,
Apr 3, 2017, 8:20:06 PM4/3/17
to jcb6...@gmail.com, kr...@berkeley.edu, RISC-V ISA Dev

>>>>> On Mon, 03 Apr 2017 17:13:15 -0500, Jacob Bachmeyer <jcb6...@gmail.com> said:
| kr...@berkeley.edu wrote:
[...]
|| If an n-bit operation reads register operands previously generated by
|| a wider m-bit operation (m>n), then the implementation might be very
|| slow, and so software should avoid generating this code. If this
|| operation is required (unlikely), the m-bit value should be explicitly
|| extracted using a wider FSm or an FMV.X.m transfer operation and then
|| explicitly truncated before being returned to the f registers using a
|| narrower LDn or FMV.n.X.
| FCVT does exactly the operation needed here (converting floating-point
| values between different widths). Actually type-punning floating-point
| bit patterns is extremely unlikely to be useful in any application.

The purpose of defining this operation is to fill in a missing part of
the specification, not to suggest this should/would be used in
practice.

|| Operations on wider types:
||
|| If a register containing a wider m-bit operand is used as input to a
|| narrower n-bit operation, the result should be as if the lower n-bits
|| of the external m-bit representation were used. This operation should
|| never occur in correct code, so can be handled by taking a slow M-mode
|| trap to fix up the results. The trap handler can use an FSm to create
|| the correct external representation, then use FLn to recreate the
|| equivalent n-bit internal value, and should obliviously perform this
|| for all input operands regardless of whether they caused the trap,
|| then execute the n-bit FP instruction (directly or in emulation),
|| before restoring the original bit patterns in any non-result registers
|| and returning.

| Stefan O'Rear warned that using traps to handle subnormals is a security
| risk (in message-id
| <CADJ6UvOoeYcwLVPKeGSbTHOF...@mail.gmail.com>);

RISC-V spec does not prohibit implementations from using traps into
M-mode to handle subnormals (or anything else for that matter), though
these shouldn't be visible to lower privilege levels

| are traps to handle invalid operations any safer? This would also be
| the first synchronous FPU trap in RISC-V; all other FPU errors merely
| accrue exceptions in fcsr.

Accrued exceptions are part of the IEEE standard and are not errors
per se.

The trap here is only suggested to help certain implementations
complete the specified behavior - and implementations need not trap.
It is similar in nature to the misaligned load/store trap.

| I agree that blindly extracting bits from a wider value for a
| narrower operation is a good choice, since it should reliably
| produce garbage results to indicate that the code is faulty.
| Hardware is however uniquely positioned to handle this conversion by
| performing an implicit fused "FMV.X.D; FMV.S.X" using invisible
| FLEN-bit temporaries (i.e. pipeline wiring rather than actual
| registers). This may be slightly slower in a recoding FPU, since it
| is reasonable to optimize assuming that invalid operations like this
| are extremely rare and use a few internal scratchpad registers for
| this operation.

This is a valid implementation option for a recoded FPU, but hardware
support is not really justified for an operation that would only occur
due to software error (or a compliance test program). Note, the
hardware approach has to cope with all three inputs being converted in
this way for an FMADD.

| I do however argue that this is an invalid operation and should set the
| FPU "Invalid Operation" flag. Does IEEE754 prohibit that?

It is not necessarily an IEEE invalid operation - the bit pattern in
the lower n bits will decide what the IEEE interpretation should be.

| If so, could we add an "invalid operand" flag to fcsr for this case?

No. For one thing, we don't want to force non-recoded implementations
to have to check the upper bits of every operand on every operation to
set this flag. And noone would check it anyway.

| Going further, could we simply declare "narrow operation on wider
| operands" to be an invalid operation that produces a NaN result?

No, see above.

| (Admittedly, this would require non-recoding FPU implementations to
| ensure that narrower operations are actually using NaN-boxed operands,
| but the "is-narrow-value-in-boxed-NaN" outputs are combinatorial
| functions of the register contents and can be generated when a value is
| written to the register. A small bit of additional logic on each
| register could generate these signals asynchronously, effectively hiding
| the gate-delay costs albeit with an area tradeoff. The additional gates
| switch at most once when the register is loaded with a new value, so the
| decrease in logic density is also a decrease in power density, and the
| overall effects could go either way for an implementation.)


Krste



| -- Jacob

Jacob Bachmeyer

unread,
Apr 3, 2017, 11:33:29 PM4/3/17
to kr...@berkeley.edu, RISC-V ISA Dev
kr...@berkeley.edu wrote:
> On Mon, 03 Apr 2017 17:13:15 -0500, Jacob Bachmeyer <jcb6...@gmail.com> said:
>
> | kr...@berkeley.edu wrote:
> [...]
> || If an n-bit operation reads register operands previously generated by
> || a wider m-bit operation (m>n), then the implementation might be very
> || slow, and so software should avoid generating this code. If this
> || operation is required (unlikely), the m-bit value should be explicitly
> || extracted using a wider FSm or an FMV.X.m transfer operation and then
> || explicitly truncated before being returned to the f registers using a
> || narrower LDn or FMV.n.X.
> | FCVT does exactly the operation needed here (converting floating-point
> | values between different widths). Actually type-punning floating-point
> | bit patterns is extremely unlikely to be useful in any application.
>
> The purpose of defining this operation is to fill in a missing part of
> the specification, not to suggest this should/would be used in
> practice.
>

Fair enough, although I saw possible ambiguity in that text. We should
be clear that type-punning bit patterns and actual width conversion are
two different things. Width conversion is common enough to have its own
opcodes, while type-punning is unlikely to ever be useful.

> || Operations on wider types:
> ||
> || If a register containing a wider m-bit operand is used as input to a
> || narrower n-bit operation, the result should be as if the lower n-bits
> || of the external m-bit representation were used. This operation should
> || never occur in correct code, so can be handled by taking a slow M-mode
> || trap to fix up the results. The trap handler can use an FSm to create
> || the correct external representation, then use FLn to recreate the
> || equivalent n-bit internal value, and should obliviously perform this
> || for all input operands regardless of whether they caused the trap,
> || then execute the n-bit FP instruction (directly or in emulation),
> || before restoring the original bit patterns in any non-result registers
> || and returning.
>
> | Stefan O'Rear warned that using traps to handle subnormals is a security
> | risk (in message-id
> | <CADJ6UvOoeYcwLVPKeGSbTHOF...@mail.gmail.com>);
>
> RISC-V spec does not prohibit implementations from using traps into
> M-mode to handle subnormals (or anything else for that matter), though
> these shouldn't be visible to lower privilege levels
>

The problem is that taking a trap opens a timing side channel. Stefan
O'Rear gave a link
(<URL:https://cseweb.ucsd.edu/~dkohlbre/papers/subnormal.pdf>) to a
paper that described using timing channels related to subnormal handling
to leak pixels from a cross-site iframe in Firefox and information about
the contents of a differentially-private database.

> | are traps to handle invalid operations any safer? This would also be
> | the first synchronous FPU trap in RISC-V; all other FPU errors merely
> | accrue exceptions in fcsr.
>
> Accrued exceptions are part of the IEEE standard and are not errors
> per se.
>
> The trap here is only suggested to help certain implementations
> complete the specified behavior - and implementations need not trap.
> It is similar in nature to the misaligned load/store trap.
>
> | I agree that blindly extracting bits from a wider value for a
> | narrower operation is a good choice, since it should reliably
> | produce garbage results to indicate that the code is faulty.
> | Hardware is however uniquely positioned to handle this conversion by
> | performing an implicit fused "FMV.X.D; FMV.S.X" using invisible
> | FLEN-bit temporaries (i.e. pipeline wiring rather than actual
> | registers). This may be slightly slower in a recoding FPU, since it
> | is reasonable to optimize assuming that invalid operations like this
> | are extremely rare and use a few internal scratchpad registers for
> | this operation.
>
> This is a valid implementation option for a recoded FPU, but hardware
> support is not really justified for an operation that would only occur
> due to software error (or a compliance test program). Note, the
> hardware approach has to cope with all three inputs being converted in
> this way for an FMADD.
>

Easy enough: the control logic stalls the pipeline and uses a single
conversion block on each of the three inputs in turn. 3 cycles extra
latency is far less of a side channel than a monitor trap. I am not
saying that traps to handle this should be prohibited, since some
implementations may not be concerned about timing attacks, but it is a
known risk that should be acknowledged.

> | I do however argue that this is an invalid operation and should set the
> | FPU "Invalid Operation" flag. Does IEEE754 prohibit that?
>
> It is not necessarily an IEEE invalid operation - the bit pattern in
> the lower n bits will decide what the IEEE interpretation should be.
>

Ok then, so IEEE754 does not allow "operand length mismatch" to be an
"Invalid Operation".

> | If so, could we add an "invalid operand" flag to fcsr for this case?
>
> No. For one thing, we don't want to force non-recoded implementations
> to have to check the upper bits of every operand on every operation to
> set this flag. And noone would check it anyway.
>

Checking (fflags == 0) is easy, but you are right about software failing
to check a flag that is only set by incorrect software.

> | Going further, could we simply declare "narrow operation on wider
> | operands" to be an invalid operation that produces a NaN result?
>
> No, see above.
>
> | (Admittedly, this would require non-recoding FPU implementations to
> | ensure that narrower operations are actually using NaN-boxed operands,
> | but the "is-narrow-value-in-boxed-NaN" outputs are combinatorial
> | functions of the register contents and can be generated when a value is
> | written to the register. A small bit of additional logic on each
> | register could generate these signals asynchronously, effectively hiding
> | the gate-delay costs albeit with an area tradeoff. The additional gates
> | switch at most once when the register is loaded with a new value, so the
> | decrease in logic density is also a decrease in power density, and the
> | overall effects could go either way for an implementation.)
>

This was my argument that the incremental cost for a non-recoding
implementation to recognize too-wide operands is small--the test for a
NaN-boxed single-precision value in RVFD is effectively a 32-input AND.
Since this is strictly a function of the register contents, it could be
generated inside the FP register file, producing one additional bit on
each read port for
"is-single-precision-value-boxed-in-double-precision-NaN". Importantly,
these tag bits can be generated asynchronously, taking advantage of
other delays in the implementation to avoid affecting timing. A simple
equality check for these values on all operands and the FP opcode
detects operand size mismatch. Generalizing this leads to T-1 computed
tag bits, where T is the number of FP widths supported. Equality check
on the computed tag bits expands similarly, if all tags are equal and
match the FP opcode, the operand widths are consistent. These checks
can be done in parallel with the actual FP operation and a NaN-result
substituted if needed.


-- Jacob

Stefan O'Rear

unread,
Apr 4, 2017, 12:34:45 AM4/4/17
to Jacob Bachmeyer, Krste Asanovic, RISC-V ISA Dev
There's a crucial difference between trapping to handle misuse and
trapping to handle valid but rare data. Denormal traps are insidious
because they allow _correct_ and _idiomatic_ code to be manipulated
into leaking data over observable side channels; the traps proposed
here would never affect code generated by a correct compiler, so they
do not pose a real hazard. (Compiler bugs and adversarially-written
machine code have plenty of other ways to leak data and IMO there is
no meaningful loss in security posture here.)

For the record I am fine with both Krste's most recent proposal and
his previous proposal.

Krste: I take it implementations that choose to trap would generate an
illegal-instruction trap?

> Easy enough: the control logic stalls the pipeline and uses a single
> conversion block on each of the three inputs in turn. 3 cycles extra
> latency is far less of a side channel than a monitor trap. I am not saying
> that traps to handle this should be prohibited, since some implementations
> may not be concerned about timing attacks, but it is a known risk that
> should be acknowledged.

Modern statistical analyses can easily detect 3 cycles over a LAN (see
also: memcmp of passwords considered harmful). But per above, I don't
care about timing channels that only affect erroneous code and
conformance testers.

You are also underestimating the complexity of "stalls the pipeline".

-s

kr...@berkeley.edu

unread,
Apr 4, 2017, 8:53:31 AM4/4/17
to Stefan O'Rear, Jacob Bachmeyer, Krste Asanovic, RISC-V ISA Dev

>>>>> On Mon, 3 Apr 2017 21:34:42 -0700, "Stefan O'Rear" <sor...@gmail.com> said:
| Krste: I take it implementations that choose to trap would generate an
| illegal-instruction trap?

Possibly - this path doesn't have to be fast so could maybe add there
but have to think through interaction with delegation. This is more
like the misaligned trap, only some implementations will use it.

Krste


Stefan O'Rear

unread,
Apr 4, 2017, 12:01:45 PM4/4/17
to Krste Asanovic, RISC-V ISA Dev
On Mon, Apr 3, 2017 at 6:35 AM, <kr...@berkeley.edu> wrote:
> ----------------------------------------------------------------------
> Proposal:
>
> The f registers are FLEN bits wide. An n-bit floating-point
> operation, where n<FLEN, reads input operands from the n
> least-significant bits of the source f registers and writes the result

Proposed tweak: Computational instructions (i.e. not FMV or FSD) check
the upper FLEN-n bits of the register and treat any register value
which is not properly 1-extended as a qNaN.

This has the effect that a register value is interpeted as a non-NaN
for at most one supported length.

> to the n least-significant bits of the destination f register, but
> also places all 1s in the uppermost FLEN-n bits of the destination f
> register. This 1-extended value represents a negative quiet NaN when
> interpreted as any m-bit floating-point value, for n < m <= FLEN.
>
> If an n-bit operation reads register operands previously generated by
> a wider m-bit operation (m>n), then the implementation might be very
> slow, and so software should avoid generating this code. If this
> operation is required (unlikely), the m-bit value should be explicitly
> extracted using a wider FSm or an FMV.X.m transfer operation and then
> explicitly truncated before being returned to the f registers using a
> narrower LDn or FMV.n.X.
>
> ----------------------------------------------------------------------
> - Commentary for non-recoded implementation
>
> The internal f registers store the literal bit patterns given in the
> specification. The floating-point unpacks and repacks the
> floating-point values into and out of the external representation on
> every operation. Transfer operations (FL*,FMV.X.*) use the value in
> the internal register.

Implementations that use the IEEE format directly need to do
marginally more work under the tweak to generate the "is any input a
NaN" signal. Since "is any input a NaN" is only used at the end it
should have plenty of slack though.
The advantage of the tweak is that this trap is no longer needed.
Operations on wider types can *also* treat the input operand as a
quiet NaN.

> ----------------------------------------------------------------------
>
> Pros/Cons/Discussion
>
> The advantages of this new proposal are that the f registers have a
> defined bit encodings regardless of datatype, simplifying using the f
> registers to hold other data types in future standard or non-standard
> extensions.
>
> The cost to the recoded implementation is primarily the extra tagging
> needed to track the internal types, but this can be done without
> adding new state bits by recoding NaNs internally. Small
> modifications are needed to the pipelines used to transfer values in
> and out of the recoded format, but the datapath and latency costs are
> minimal. The recoding process has to handle shifting of input
> subnormal values for wide operands in any case, and extracting the
> NaN-boxed value is a similar process to normalization except for
> skipping over leading-1 bits instead of skipping over leading-0 bits,
> allowing the datapath muxing to be shared.
>
> There were some worries about polluting the NaN encoding space, but
> this is not a real concern as the new NaN values are only produced as
> the result of requested floating-point operations, and the
> specification performs as expected if the negative qNaN values were
> produced in some other way.
>
> Comments/feedback welcome,
>
> Krste

-s

Victor Moya

unread,
Apr 10, 2017, 8:51:16 AM4/10/17
to RISC-V ISA Dev

As far as I understand it the trap would be only required for recoded implementations that don't perform the conversion from float64 to float32 on hardware for every float point operation.  So not sure if making it a requirement for all implementations is really needed but I'm not really against it either.

Overall this looks like a good solution for both recoded and non-recoded implementations.

Victor


On Mon, Apr 3, 2017 at 6:11 PM, Allen J. Baum <allen...@esperantotech.com> wrote:
At 6:35 AM -0700 4/3/17, kr...@berkeley.edu wrote:
>
>Operations on wider types:
>
>If a register containing a wider m-bit operand is used as input to a
>narrower n-bit operation, the result should be as if the lower n-bits
>of the external m-bit representation were used.  This operation should
>never occur in correct code, so can be handled by taking a slow M-mode
>trap to fix up the results.

(assuming a 32-bit floating op on registers with 64-bit floats in them)
So, does this imply that 32-bit operations should enforce that the upper 32 bits of input for all operands are all1s, and trap if they are not, as well as setting the upper 32-bits to all1s in the result?

If so, I don't see where we need to do any tagging at all (though it might be simpler than checking that the upper-32bits are all zeroes)
--
**************************************************
* Allen Baum              tel. (908)BIT-BAUM     *
*                                   248-2286     *
**************************************************
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

kr...@berkeley.edu

unread,
Apr 10, 2017, 9:14:16 AM4/10/17
to Victor Moya, RISC-V ISA Dev

We're rev-ing proposal to drop need for a trap anywhere.

We're working out a minor detail or two but basic idea is to
incorporate (I think Jacob/Stefan's) proposal to treat illegally boxed
values as a canonical qNaN.

Krste
| email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-
| dev/.
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/p062408b4d5082361d673%40%5B192.168.1.50%5D
| .


| --
| You received this message because you are subscribed to the Google Groups
| "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email
| to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/
| .
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/
| CAA0yyKSR2tLvMuW42HmiAb%2B7wGTq8nO-KyxfqVnH1a40BA_vFw%40mail.gmail.com.

Jacob Bachmeyer

unread,
Apr 10, 2017, 6:13:02 PM4/10/17
to kr...@berkeley.edu, Victor Moya, RISC-V ISA Dev
kr...@berkeley.edu wrote:
> We're rev-ing proposal to drop need for a trap anywhere.
>
> We're working out a minor detail or two but basic idea is to
> incorporate (I think Jacob/Stefan's) proposal to treat illegally boxed
> values as a canonical qNaN.
>

I can take only half credit for that--the RISC-V spec already requires
that all NaNs produced be the canonical NaN and the proposed NaN-boxing
leads to a natural interpretation of a boxed narrower value as a wider
NaN. I think that I may have been the first to suggest that _wider_
operands also be treated as NaN, initially (message-id
<58D5CB65...@gmail.com>) suggesting that narrower values be seen as
their containing quiet NaNs by wider operations and that wider operands
be seen as signaling NaNs by narrower operations. The idea was that
FCLASS.? could then be used to quickly determine the actual width of the
value in an FP register, but directly examining the bit patterns is
probably a better option for that use case.

I still suggest adding a "width mismatch" bit to the FCLASS result, but
no longer see a reason to distinguish wider or narrower. RISC-V already
specifies that NaN payloads do not propagate, so the result of any
calculation with a "wrong-width" operand would be the canonical NaN.


-- Jacob

kr...@berkeley.edu

unread,
Apr 10, 2017, 8:27:43 PM4/10/17
to RISC-V ISA Dev

Here's the latest proposal. I've dropped the extensive commentary on
how to implement the recoded version as it became clear during
discussions that there are a few different possible recodings, the
choice of which depends on other factors such as whether functional
units are shared between precisions or not. Nonetheless, we think the
following proposal is not too onerous on any particular implementation
style and seems to close the implementation-dependent hole in the
spec. We seem to have converged, so will add this to next draft of
spec shortly.

Krste

----------------------------------------------------------------------
Another iteration on the FPU proposal....

This version requires that all implementations check for valid
NaN-boxed types when executing narrower-width operations to remove the
need for value-based traps in recoded implementations. It also
describes interactions with the sign-injection instructions, which are
the only instructions in the standard extensions that operate on the
values in f registers as bit vectors.

----------------------------------------------------------------------
Proposal with NaN boxing

The f registers are FLEN bits wide. FLEN can be 32, 64, or 128
depending on which of the F, D, and Q extensions are supported. There
can be up to four different floating-point precisions supported,
including H, F, D, and Q. Half-precision H scalar values are only
supported if the V vector extension is supported.

If multiple floating-point types t>1 are supported, then valid values
of narrower n-bit types, n<FLEN, are represented in the lower n bits
of an FLEN-bit NaN value, in a process termed NaN-boxing. The upper
bits of a valid NaN-boxed value must be all 1s. Valid NaN-boxed n-bit
values therefore appear as negative quiet NaNs (qNaNs) when viewed as
any wider m-bit value, n < m <=FLEN.

Floating-point n-bit transfer operations move values held in IEEE
standard formats into and out of the floating-point unit, and comprise
floating-point loads and stores (FLn/FSn) and floating-point move
instructions (FMV.n.X/FMV.X.n).

Floating-point compute operations calculate results based on the
FLEN-bit values held in the f registers. A narrow n-bit
floating-point compute operation, where n<FLEN, checks that input
operands are correctly NaN-boxed. If so, the n least-significant bits
of the input are used as the input value, otherwise the input value is
treated as a canonical qNaN. An n-bit floating-point result is
written to the n least-significant bits of the destination register,
with all 1s written to the uppermost FLEN-n bits to yield a legal
NaN-boxed value.

Floating-point sign-injection instructions operate on the raw
bit-vector representation of the FP values in registers, and calculate
the value for the sign bit of the result based on the sign bit of the
two source operands, and copy over the non-sign bits from the first
source operand. A narrower n-bit sign-injection instruction, n<FLEN,
will treat any non-NaN-boxed-n-bit input value as an n-bit canonical
qNaN.

----------------------------------------------------------------------
Pros/Cons/Discussion

The advantages of this new proposal are that the f registers have
defined bit encodings regardless of datatype, simplifying using the f
registers to hold other data types in future standard or non-standard
extensions, and not particularly favoring non-recoded or recoded
implementations.

The cost to a recoded implementation is primarily in checking if the
upper bits of a narrower operation represent a legal NaN-boxed value.

The cost to the recoded implementation is primarily the extra tagging
needed to track the internal types and sign bits, but this can be done
without adding new state bits by recoding NaNs internally in the
exponent field. Small modifications are needed to the pipelines used
to transfer values in and out of the recoded format, but the datapath
and latency costs are minimal. The recoding process has to handle
shifting of input subnormal values for wide operands in any case, and
extracting the NaN-boxed value is a similar process to normalization
except for skipping over leading-1 bits instead of skipping over
leading-0 bits, allowing the datapath muxing to be shared.


Krste

Bruce Hoult

unread,
Apr 11, 2017, 9:17:22 AM4/11/17
to Krste Asanovic, RISC-V ISA Dev
"The cost to a recoded implementation is primarily in checking if the
upper bits of a narrower operation represent a legal NaN-boxed value."

NON-recoded, Shirley?

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Krste Asanovic

unread,
Apr 11, 2017, 9:21:24 AM4/11/17
to Bruce Hoult, RISC-V ISA Dev
On Apr 11, 2017, at 6:17 AM, Bruce Hoult <br...@hoult.org> wrote:

"The cost to a recoded implementation is primarily in checking if the
upper bits of a narrower operation represent a legal NaN-boxed value."

NON-recoded, Shirley?


Yes -  thanks for catching the mistake,

Krste

Bruce Hoult

unread,
Apr 11, 2017, 9:36:04 AM4/11/17
to Krste Asanovic, RISC-V ISA Dev
Maybe worth explicitly stating:

ANY use of an n bit value by a non-n bit operation (whether wider or narrower) will result in that operand being treated as a qNaN.

?

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Krste Asanovic

unread,
Apr 11, 2017, 10:13:01 AM4/11/17
to Bruce Hoult, RISC-V ISA Dev
On Apr 11, 2017, at 6:36 AM, Bruce Hoult <br...@hoult.org> wrote:

Maybe worth explicitly stating:

ANY use of an n bit value by a non-n bit operation (whether wider or narrower) will result in that operand being treated as a qNaN.

?

As written, anything <=n bits is treated as expected - I think it actually confuses the issue to call out narrower boxed values as being treated as qNaN (their representation will ensure that).

Also, there is the detail that sign-injection instructions must copy over the actual bit pattern in the non-sign bits and so don’t want to mislead by implying the _canonical_ qNaN is used as value when operand is a boxed narrow type.

Krste


David Horner

unread,
Apr 11, 2017, 4:26:03 PM4/11/17
to RISC-V ISA Dev, br...@hoult.org


On Tuesday, 11 April 2017 10:13:01 UTC-4, krste wrote:
On Apr 11, 2017, at 6:36 AM, Bruce Hoult <br...@hoult.org> wrote:

Maybe worth explicitly stating:

ANY use of an n bit value by a non-n bit operation (whether wider or narrower) will result in that operand being treated as a qNaN.

?

As written, anything <=n bits is treated as expected - I think it actually confuses the issue to call out narrower boxed values as being treated as qNaN (their representation will ensure that).


I am quite satisfied with this result.
It resolves to the behaviour that I originally thought was in effect:
     That mixed precision operations do not automatically convert higher/lower values.
     That explicit conversions instructions are required to convert to a common precision.
     In essence, each floating precision stands alone.
     This allows unimplemented precision opcodes (e.g. Half precision) in well behaved code to be trapped and emulated.
     It avoids a precedent that such automated behaviours can be expected;
     This certainly will help when designing other overloading uses of the float registers (and emulation of them).

 So , perhaps something could be written to explicitly state the requirement for explicit conversions, given the meaning of "treated as expected" has changes.

 

Jacob Bachmeyer

unread,
Apr 11, 2017, 7:17:05 PM4/11/17
to Krste Asanovic, Bruce Hoult, RISC-V ISA Dev
Krste Asanovic wrote:
> On Apr 11, 2017, at 6:36 AM, Bruce Hoult <br...@hoult.org
> <mailto:br...@hoult.org>> wrote:
>>
>> Maybe worth explicitly stating:
>>
>> ANY use of an n bit value by a non-n bit operation (whether wider or
>> narrower) will result in that operand being treated as a qNaN.
>>
>> ?
>
> As written, anything <=n bits is treated as expected - I think it
> actually confuses the issue to call out narrower boxed values as being
> treated as qNaN (their representation will ensure that).
>
> Also, there is the detail that sign-injection instructions must copy
> over the actual bit pattern in the non-sign bits and so don’t want to
> mislead by implying the _canonical_ qNaN is used as value when operand
> is a boxed narrow type.

Bruce raises a valid point, and I think that a better way to handle it
is to separate FP operations into two groups: FP calculation and FP
bit-manipulation.

Operations in the FP calculation group see FP values as numbers, require
operation and operand widths to match, and produce the canonical NaN if
the widths do not match (and in general only produce canonical NaNs).
FCVT between FP registers is bit of a special case, since its operand
and result have different widths, but it is still in the FP calculation
group because its operand must match the operand width in the instruction.

Operations in the FP bit-manipulation group see FP values as
bit-patterns, preserve NaN payloads, and never produce NaN unless their
operand is NaN. These operations include LOAD-FP, STORE-FP, FMV, and
FSGNJ. These operations always produce FLEN-wide boxed FP values if the
operation is narrower than FLEN, with the exceptions of STORE-FP and FMV
to the integer register file. Narrower STORE-FP or FMV.X simply extract
whatever bits would hold the narrower value if the FP register held a
value of the correct width. This produces (possibly
implementation-defined) garbage, but should never happen in a correct
program. (Rationale for permitting an implementation defined result: I
think that the most efficient result differs between recoded and
non-recoded implementations and I want to minimize the burden on
implementations for a "should never happen" case, unlike the general
case of width mismatch on LOAD-FP/STORE-FP, which occurs on context switch.)

I am uncertain which group best fits FCVT between the FP and integer units.



On another note, this change seems to define the result of FSGNJ[NX].D
on single-precision inputs--the sign bit read from a single-precision
input is always 1. The sign bit of a NaN value is officially
meaningless, and FSGNJ*.D can easily produce results that have positive
sign bits but are otherwise boxed narrower values. Should these also be
valid boxed values or should they be considered NaN in any width?



-- Jacob

Michael Clark

unread,
Apr 13, 2017, 9:20:02 PM4/13/17
to Jacob Bachmeyer, Krste Asanovic, Bruce Hoult, RISC-V ISA Dev

On 12 Apr 2017, at 11:17 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

On another note, this change seems to define the result of FSGNJ[NX].D on single-precision inputs--the sign bit read from a single-precision input is always 1.  The sign bit of a NaN value is officially meaningless, and FSGNJ*.D can easily produce results that have positive sign bits but are otherwise boxed narrower values.  Should these also be valid boxed values or should they be considered NaN in any width?

If FSGNJ[NX].D is used to clear the sign bit from a NaN boxed single, then I believe the behaviour of subsequent single precision operations should be implementation defined.

This is indeed an interesting corner case, but it only arises from an abnormal type punned sequence of instructions. I believe, for the purpose of simplicity, there should be one set of specification defined NaN boxing prefixes:

- (FLEN-16) 1 bits for a boxed half
- (FLEN-32) 1 bits for a boxed single
- (FLEN-64) 1 bits for a boxed double

An implementation could choose to ignore the sign bit of a NaN boxed value, and regard the enclosed type as single precision, or an implementation may promote it to a NaN of the larger type. I believe this should remain implementation defined as specifying more than one NaN mask for each width may unnecessarily complicate some implementations. The irregular type punned sequence of instructions does not arise normally and it does not effect the interoperability of the external representation as do FLD/FSD or FMX.D.X/FMX.X.D of single precision values, so it should not be necessary to define the behaviour, and if in defining the behaviour it would be best to have only one prefix per boxed type.

Andrew Waterman

unread,
Apr 14, 2017, 2:44:32 AM4/14/17
to Michael Clark, Jacob Bachmeyer, Krste Asanovic, Bruce Hoult, RISC-V ISA Dev
We reasoned that eliminating this implementation-defined behavior is
more beneficial than the slight area cost for some implementations.
(It isn't particularly onerous for implementations that use the 754
encoding, and we have already released a proof-of-concept recoded
implementation that isn't unduly complicated by this behavior.)

I agree this sort of code shouldn't show up in practice, but nailing
down the behavior will simplify ISA conformance testing, among other
things.

>
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/644046FC-7535-4BCC-ADC9-821E97303C0D%40mac.com.

Jacob Bachmeyer

unread,
Apr 14, 2017, 7:45:44 PM4/14/17
to Andrew Waterman, Michael Clark, Krste Asanovic, Bruce Hoult, RISC-V ISA Dev
FSGNJ* is now defined (and can produce a NaN-boxed value with the wrong
sign bit) in this case, but my question is how will a NaN-boxed value
with the wrong sign bit be interpreted? This is a "should never
happen", but this entire discussion has been about filling a hole in the
spec and I do not want to leave any smaller holes in the spec. (Whether
it is still a NaN-boxed value or actually a NaN is not a concern for me,
as long as it is defined to be one or the other.)


-- Jacob

Stefan O'Rear

unread,
Apr 14, 2017, 7:49:53 PM4/14/17
to Jacob Bachmeyer, Andrew Waterman, Michael Clark, Krste Asanovic, Bruce Hoult, RISC-V ISA Dev
On Fri, Apr 14, 2017 at 4:45 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> FSGNJ* is now defined (and can produce a NaN-boxed value with the wrong sign
> bit) in this case, but my question is how will a NaN-boxed value with the
> wrong sign bit be interpreted? This is a "should never happen", but this

https://github.com/riscv/riscv-isa-manual/blob/master/src/d.tex#L31-L32

> The upper bits of a valid NaN-boxed value must be all 1s. Valid NaN-boxed $n$-bit values therefore appear as negative quiet NaNs

A NaN-boxed value is defined to be syntactically negative, so a
"NaN-boxed value with the wrong sign bit" is a defintional
impossibility, going beyond mere "should never happen".

-s

Michael Clark

unread,
Apr 14, 2017, 7:50:07 PM4/14/17
to jcb6...@gmail.com, Andrew Waterman, Krste Asanovic, Bruce Hoult, RISC-V ISA Dev
It think it should lose the NaN boxing and revert to a NaN of the widest supported type.

That’s if we want one Nan boxed encoding per width.

Otherwise it could be specified that the sign is ignored, and only the exponent and prefix of the NaN payload are significant for type boxing.

I prefer the former.

Krste Asanovic

unread,
Apr 14, 2017, 8:13:50 PM4/14/17
to Michael Clark, jcb6...@gmail.com, Andrew Waterman, Bruce Hoult, RISC-V ISA Dev
That is the spec.

Only all 1s in the upper bits is a valid NaN boxing.

Krste

Jacob Bachmeyer

unread,
Apr 14, 2017, 9:50:18 PM4/14/17
to Krste Asanovic, Michael Clark, Andrew Waterman, Bruce Hoult, RISC-V ISA Dev
Krste Asanovic wrote:
>
>> On Apr 14, 2017, at 4:50 PM, Michael Clark <michae...@mac.com
>> <mailto:michae...@mac.com>> wrote:
>>
>>> On 15 Apr 2017, at 11:45 AM, Jacob Bachmeyer <jcb6...@gmail.com
>>> <mailto:jcb6...@gmail.com>> wrote:
>>>>
>>> FSGNJ* is now defined (and can produce a NaN-boxed value with the
>>> wrong sign bit) in this case, but my question is how will a
>>> NaN-boxed value with the wrong sign bit be interpreted? This is a
>>> "should never happen", but this entire discussion has been about
>>> filling a hole in the spec and I do not want to leave any smaller
>>> holes in the spec. (Whether it is still a NaN-boxed value or
>>> actually a NaN is not a concern for me, as long as it is defined to
>>> be one or the other.)
>>
>> It think it should lose the NaN boxing and revert to a NaN of the
>> widest supported type.
>
> That is the spec.
>
> Only all 1s in the upper bits is a valid NaN boxing.

I request a clarification that FSGNJ* propagates NaN payloads. (FMV is
a pseudo-op using FSGNJ, and FMV.<FLEN> can transfer any floating-point
value within the FP register file.) A careful reading suggests that it
does, but I ask for a specific statement to that effect.


-- Jacob

Krste Asanovic

unread,
Apr 14, 2017, 9:56:08 PM4/14/17
to jcb6...@gmail.com, Michael Clark, Andrew Waterman, Bruce Hoult, RISC-V ISA Dev
Yes. FSGNJ* copy the lower bits of the result from their first operand. Narrower n-bit FSGNJ* will treat input values that are not a legal n-bit NaN-boxed value as an n-bit canonical NaN, otherwise the n-1 bits of the first input operand are copied over, regardless of FP interpretation. FMV.<FLEN> will always copy bit pattern in registers, preserving any narrower NaN-boxed values.

Krste

Allen J. Baum

unread,
Jul 4, 2017, 11:56:04 AM7/4/17
to RISC-V ISA Dev
This isn't a problem yet, but in the security meeting a statement was
made that this will be an extension "on top of" the vector spec.
This kind of implies that a security extension requires some other
extension - which is not true of any other extension (that I am aware
of).

However, that isn't what was meant exactly. The expectation is that
the security spec will require implementation of some subset of the
vector spec - again, not true of any other extension I'm aware of.

So, we have two specs that overlap (in a compatible way - the overlap
would have identical opcodes, etc).

That leads into a bunch of interesting
architectural/prodedural/nit-picking corners about what it means to
be compatible that probably need to be discussed.

Leading to another question: if someone want to implement, say, the
Vector extension, except for "that-one-op"...
- does the "V" bit get set in the ISA register, and we just get an
illegal instruction trap when "that-one-op" gets executed?
- by extension, could we do that for all ops? (I suspect the answer
is technically yes but it would be stupid as it may cause code to be
suboptimal)
- or we just call the incomplete extension something else?\
- or you can't do that?

kr...@berkeley.edu

unread,
Jul 4, 2017, 12:19:58 PM7/4/17
to Allen J. Baum, RISC-V ISA Dev

>>>>> On Tue, 4 Jul 2017 08:55:58 -0700, "Allen J. Baum" <allen...@esperantotech.com> said:

| This isn't a problem yet, but in the security meeting a statement was
| made that this will be an extension "on top of" the vector spec.
| This kind of implies that a security extension requires some other
| extension - which is not true of any other extension (that I am aware
| of).

D requires F. These dependency chains will be common.

| However, that isn't what was meant exactly. The expectation is that
| the security spec will require implementation of some subset of the
| vector spec - again, not true of any other extension I'm aware of.

Specifically, the crypto portion of the security spec - the security
work covers several different things.

| So, we have two specs that overlap (in a compatible way - the overlap
| would have identical opcodes, etc).

No - the crypto should refer back to the vector spec, and include the
pieces needed.

| That leads into a bunch of interesting
| architectural/prodedural/nit-picking corners about what it means to
| be compatible that probably need to be discussed.

| Leading to another question: if someone want to implement, say, the
| Vector extension, except for "that-one-op"...
| - does the "V" bit get set in the ISA register, and we just get an
| illegal instruction trap when "that-one-op" gets executed?
| - by extension, could we do that for all ops? (I suspect the answer
| is technically yes but it would be stupid as it may cause code to be
| suboptimal)

The compliance group decided it was too-difficult/not-useful to try
and to separate out hardware versus software versions of an extension.
To add more complexity - some implementations might implement some
instructions in hardware except to trap for some data values (e.g., FP
subnormals). Performance benchmarks/microbenchmarks can be used to
ascertain the performance of each feature of an implementation, but
are susceptible to gaming.

In practice, each profile will define the ISA that software will
assume and compile to, and implementers supporting that profile will
have to provide all those instructions somehow. Whether implementers
use hardware, software, microcode, JIT compilation, etc., shouldn't
change compliance, but will obviously affect customer choice based on
observed performance on real-life workloads.

| - or we just call the incomplete extension something else?\
| - or you can't do that?

I'm working through proposal to provide more and finer-grain
instruction set module names. The single letter names were good to
get things off the ground and will continue to be used, but it's clear
we need more and finer-grain names for collections of instructions.

Profiles are the orthogonal axis that says what should be supported
for a given software platform. The current spec does encode some
restrictions inside ISA modules, where some of these really should be
lifted out into profile specification. A given profile will have much
less freedom than the cross-product of choices in the included ISA
modules.

Software is king and hardware the servant. Projects won't be
successful if they require crazy customization of the software stack
for each implementation.

Krste


| --
| **************************************************
| * Allen Baum tel. (908)BIT-BAUM *
| * 248-2286 *
| **************************************************

| --
| You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
| To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/p06240814d58167c0b463%40%5B192.168.1.50%5D.

Allen J. Baum

unread,
Jul 4, 2017, 1:09:48 PM7/4/17
to kr...@berkeley.edu, RISC-V ISA Dev
OK, good answers, and I'm glad to see that more than a little thought has been given to this issue. And I'm slapping myself for not remembering the D/F dependency.

At 5:19 PM +0100 7/4/17, kr...@berkeley.edu wrote:
> >>>>> On Tue, 4 Jul 2017 08:55:58 -0700, "Allen J. Baum" <allen...@esperantotech.com> said:
>
>| This isn't a problem yet, but in the security meeting a statement was
>| made that this will be an extension "on top of" the vector spec.
>| This kind of implies that a security extension requires some other
>| extension - which is not true of any other extension (that I am aware
>| of).
>
>D requires F. These dependency chains will be common.
>
>| However, that isn't what was meant exactly. The expectation is that
>| the security spec will require implementation of some subset of the
>| vector spec - again, not true of any other extension I'm aware of.
>
>Specifically, the crypto portion of the security spec - the security
>work covers several different things.
>
>| So, we have two specs that overlap (in a compatible way - the overlap
>| would have identical opcodes, etc).
>
>No - the crypto should refer back to the vector spec, and include the
>pieces needed.
>
>| That leads into a bunch of interesting
>| architectural/prodedural/nit-picking corners about what it means to
>| be compatible that probably need to be discussed.
>
>| Leading to another question: if someone want to implement, say, the
>| Vector extension, except for "that-one-op"...
>| - does the "V" bit get set in the ISA register, and we just get an
>| illegal instruction trap when "that-one-op" gets executed?
>| - by extension, could we do that for all ops? (I suspect the answer
>| is technically yes but it would be stupid as it may cause code to be
>| suboptimal)
>
>The compliance group decided it was too-difficult/not-useful to try
>and to separate out hardware versus software versions of an extension.
>To add more complexity - some implementations might implement some
>instructions in hardware except to trap for some data values (e.g., FP
>subnormals). Performance benchmarks/microbenchmarks can be used to
>ascertain the performance of each feature of an implementation, but
>are susceptible to gaming.
>
>In practice, each profile will define the ISA that software will
>assume and compile to, and implementers supporting that profile will
>have to provide all those instructions somehow. Whether implementers
>use hardware, software, microcode, JIT compilation, etc., shouldn't
>change compliance, but will obviously affect customer choice based on
>observed performance on real-life workloads.
>
>| - or we just call the incomplete extension something else?\
>| - or you can't do that?
>
>I'm working through proposal to provide more and finer-grain
>instruction set module names. The single letter names were good to
>get things off the ground and will continue to be used, but it's clear
>we need more and finer-grain names for collections of instructions.
>
>Profiles are the orthogonal axis that says what should be supported
>for a given software platform. The current spec does encode some
>restrictions inside ISA modules, where some of these really should be
>lifted out into profile specification. A given profile will have much
>less freedom than the cross-product of choices in the included ISA
>modules.
>
>Software is king and hardware the servant. Projects won't be
>successful if they require crazy customization of the software stack
>for each implementation.
>
>Krste
>
>
>| --
>| **************************************************
>| * Allen Baum tel. (908)BIT-BAUM *
>| * 248-2286 *
>| **************************************************
>
>| --
>| You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
>| To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
>| To post to this group, send email to isa...@groups.riscv.org.
>| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>| To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/p06240814d58167c0b463%40%5B192.168.1.50%5D.


Reply all
Reply to author
Forward
0 new messages