Feedback from IAR of the Zfa RISC-V extension

68 views
Skip to first unread message

Anders Lindgren

unread,
May 26, 2023, 7:26:41 AM5/26/23
to isa...@groups.riscv.org

Hi!

This is feedback from IAR on the proposed Zfa extension.

Summary:

 - Overall, this is a good proposal.

 - It is unclear what "quiet" comparison instructions
   corresponds to in C.

 - The constants selected for FLI.i doesn't seem to match real-world
   uses based on a statistic analysis we have conducted.

 - Finally, an new instructions to scale floating-point values are
   proposed.


--------------------

* Overall

We gives a "thumbs up" for this extension, as it fixes a number of
problems with the current FPU instructions, and it improves code size.
In particular, accessing the upper bits of a 64 bit floating point
value on RV32 will simplify library functions.

--------------------

* No assembler syntax

The assembler syntax of the new instructions are omitted.

For most instructions, this is not a problem. However, for FLI.S, it
is. There is a footnote that states that it should accept "min",
"inf", and "nan" and the rest as decimal constants.

However, it is unclear it it should accept something like "FLI.S fa0,
16" (i.e. index number 16) or "FLI.S fa0, 1.0". The latter is easier
to read, but it might become hard to ensure that the tools can handle
this case properly.

--------------------

* Unclear encodings

Unlike the unpriv RISC-V specification, this specification doesn't
provide tables for the encoding. Instead, encoding is expressed like
"These instructions are encoded like their FMIN and FMAX counterparts,
but with instruction bit 13 set to 1."

On one hand, this is provides enough information to
implement hardware and tools. On the other hand, it makes the manual
specification hard to read and thus makes any implementation process
more error prone.

--------------------

* When should FLEQ.S be used?

It is not clear when a compiler should select to use the "quiet"
comparison instructions over the original comparison instructions.

I suggest that the specification explains the C language constructions
that is suited for these instructions.

If the new functions are expected to be utilized using builtin
functions, those functions should be specified either in this
specification or in a companion specification. In the latter case,
that specification should be ratified along the Zfa specification.

--------------------

* The constants of the FLI.i instructions

In my experience, the constants in the table 25.1 doesn't represent
the most commonly used floating-point constants.

However, just relying on gut feeling isn't a good way to design
processor instructions.

Instead, I instrumented the IAR compiler and build a large body of
code. The table below is the head of that list (it combines both 32
and 64 bit types).

Clearly, I see a different pattern compared to the constants provided
in the Zfa suggestion. (Of course, it could be possible to refine this
analysis further by investigating the context of the constants -- e.g.
if the "-1.0" is used for addition, in which case it could be
rewritten as a subtraction.)

On the other hand, constructing a non-fractional floating-point value
is easy in RISC-V assembly, for example:

    addi      a0, zero, 123
    fcvt.d.w  fa0, a0

Whereas it is a lot harder to create a more complex value, especially
for types larger than 32 bits.

Of the proposed constants, there are some that are have very little or
no presence in my statistical material:

    256      -- 4
    2^15     -- 2 (-2^15 occurs 9 times)
    2^-8     -- 2
    1.25     -- 1 (-1.25 occurs 7 times)
    2^16     -- 0
    2^-7     -- 0
    2^-15    -- 0
    2^-16    -- 0
    0.3125   -- 0
    0.375    -- 0
    0.4375   -- 0
    0.625    -- 0
    0.875    -- 0
    1.75     -- 0
    0.0625   -- 0

Based on the statistics, I suggest that we add the following constants:

     5.0
    10.0
     6.0
    60.0
   120.0
    pi/2  (or -pi/2) (which seems to be used a lot more than pi)

Note: There is no need for 0.0, as this can be created using:

    fcvt.d.w fa0, zero

Reservation: The material used might not be representative for real
world applications as it contains a lot of code specifically used to
test compilers. However, I believe that this provide a better
selection than the GCC standard library, which I understand was used
as the base for the proposed Zfa constants.


Appendix: Statistical analysis of floating-point constants.
("*" = Part of the proposed Zfa extension.)

1.0                                      | 1466   *
2.0                                      | 615    *
3.0                                      | 530    *
4.0                                      | 413    *
-1.0                                     | 391    *
10.0                                     | 333
5.0                                      | 296
0.5                                      | 238    *
6.0                                      | 208
120.0                                    | 201
60.0                                     | 164
70.0                                     | 147
7.0                                      | 143
9.0                                      | 134
150.0                                    | 132
Infinity                                 | 130    *
-2.0                                     | 128
16.0                                     | 115    *
17.0                                     | 111
130.0                                    | 110
12.0                                     | 109
18.0                                     | 107
1280.0                                   | 102
3600.0                                   | 99
11.0                                     | 91
90.0                                     | 83
NaN                                      | 83     *
-3.0                                     | 81
8.0                                      | 81     *
-90.0                                    | 79
2700.0                                   | 77
-4.0                                     | 75
14.0                                     | 73
-5.0                                     | 67
25.0                                     | 67
30.0                                     | 64
1000.0                                   | 64
1.234                                    | 63
20.0                                     | 63
26.0                                     | 62
2.2                                      | 59
1.100000023841858                        | 59
-1.7363228039172371                      | 56
10000000000.0                            | 56
-60.0                                    | 56
1.1                                      | 53
5.0e-324                                 | 52
1.0e+35                                  | 49
99.0                                     | 48
100.0                                    | 48
1.0000000000000001e-35                   | 48
5.4324                                   | 47
13.0                                     | 46
-0.5                                     | 45
-6.0                                     | 44
110.0                                    | 44
-Infinity                                | 41
0.25                                     | 41     *
28.0                                     | 41     *
21.0                                     | 40
162.0                                    | 38
3.3                                      | 38
1.5                                      | 37
15.0                                     | 37
0.0                                      | 36
123.0                                    | 35
22.0                                     | 35
80.0                                     | 34
180.0                                    | 34
23.0                                     | 33
31.0                                     | 28

--------------------

* Suggestion: Add a "scaling" instruction (mul/div by power-of-two)

One operation that I often see in floating-point based code is when a
value of scaled up or down with a power of two, where the scale value
is provided as an immediate.

For example, to multiple fa0 by 8.0, we could use:

    SCALEUP.D    fa0, fa0, 3

From an implementation point of view, this is almost trivial to
implement in hardware, as a scaling corresponds to an adjustment of
the "exp" field, with some logic associated with overflow:s (when
scaling up) and underflow (when scaling down).

In addition, this is useful when constructing constants.

    fli.d       fa0, 1.0        # Note: Unsure about the syntax.
    scaledown.d fa0, 14         # Creates 2 ^ -14

This could reduce the need for power of two:s in the FLI.i
instruction.

--

Anders Lindgren
Lead engineer of the IAR compiler for RISC-V
E-mail: anders....@iar.com

IAR Systems AB
Box 23051, Strandbodgatan 1
SE-750 23 Uppsala, Sweden
www.iar.com
LinkedIn

Bruce Hoult

unread,
May 26, 2023, 9:14:01 AM5/26/23
to Anders Lindgren, isa...@groups.riscv.org
> * Suggestion: Add a "scaling" instruction (mul/div by power-of-two)

> One operation that I often see in floating-point based code is when a
> value of scaled up or down with a power of two, where the scale value
> is provided as an immediate.

> For example, to multiple fa0 by 8.0, we could use:

>     SCALEUP.D    fa0, fa0, 3

> From an implementation point of view, this is almost trivial to
> implement in hardware, as a scaling corresponds to an adjustment of
> the "exp" field, with some logic associated with overflow:s (when
> scaling up) and underflow (when scaling down).

I agree and already suggested this instruction in this thread on May 4, under the name FADDEXP, plus two equally cheap and useful partners:

- FEXP. Extracts the exponent from a double precision operand register, debiases the exponent, and delivers an integer result in the range [-1077..+1023]

- FFRAC. Normalises (if necessary) an FP value, then sets the exponent to the bias, thus returning a value in [1.0, 2.0).

- FADDEXP. Adds an integer to the exponent of a double precision operand giving a double precision result (possibly newly INF, 0, or denormalised).

Along with FCLASS (which we already, thankfully, have), these instructions are very useful for accelerating the implementation of transcendental functions.




--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/b34b5ed5-cd47-aaf8-edd9-1daf69fb75bb%40iar.com.

Kito Cheng

unread,
May 26, 2023, 12:49:27 PM5/26/23
to Anders Lindgren, isa...@groups.riscv.org


Anders Lindgren <anders....@iar.com>於 2023年5月26日 週五,19:26寫道:
--

Earl Killian

unread,
May 27, 2023, 10:08:18 AM5/27/23
to Anders Lindgren, isa...@groups.riscv.org

On 5/26/23 07:26, Anders Lindgren wrote:

  - The constants selected for FLI.i doesn't seem to match real-world
   uses based on a statistic analysis we have conducted.

The idea was to encode values that would only require no more a few gates to synthesize, rather than requiring a lookup table (e.g. pi/2 requires that). That meant a few bits of exponent and significand come directly from the instruction. Given that constraint, I believe the proposal was based on static statistics gathered from libc and some other sources. Since FLI is primarily for code size rather performance, static statistics were most appropriate.


 - Finally, an new instructions to scale floating-point values are
   proposed.

Please consider implementing this in a compiler and gathering some statistics. Another place that exponent scaling is useful is in conversion from integer to floating-point (e.g. for switching from integer DSP to FP).

-Earl


Reply all
Reply to author
Forward
0 new messages