71 views

Skip to first unread message

May 26, 2023, 7:26:41 AM5/26/23

to isa...@groups.riscv.org

Hi!

This is feedback from IAR on the proposed Zfa extension.

Summary:

- Overall, this is a good proposal.

- It is unclear what "quiet" comparison instructions

corresponds to in C.

- The constants selected for FLI.i doesn't seem to match
real-world

uses based on a statistic analysis we have conducted.

- Finally, an new instructions to scale floating-point values are

proposed.

--------------------

* Overall

We gives a "thumbs up" for this extension, as it fixes a number of

problems with the current FPU instructions, and it improves code
size.

In particular, accessing the upper bits of a 64 bit floating point

value on RV32 will simplify library functions.

--------------------

* No assembler syntax

The assembler syntax of the new instructions are omitted.

For most instructions, this is not a problem. However, for FLI.S,
it

is. There is a footnote that states that it should accept "min",

"inf", and "nan" and the rest as decimal constants.

However, it is unclear it it should accept something like "FLI.S
fa0,

16" (i.e. index number 16) or "FLI.S fa0, 1.0". The latter is
easier

to read, but it might become hard to ensure that the tools can
handle

this case properly.

--------------------

* Unclear encodings

Unlike the unpriv RISC-V specification, this specification doesn't

provide tables for the encoding. Instead, encoding is expressed
like

"These instructions are encoded like their FMIN and FMAX
counterparts,

but with instruction bit 13 set to 1."

On one hand, this is provides enough information to

implement hardware and tools. On the other hand, it makes the
manual

specification hard to read and thus makes any implementation
process

more error prone.

--------------------

* When should FLEQ.S be used?

It is not clear when a compiler should select to use the "quiet"

comparison instructions over the original comparison instructions.

I suggest that the specification explains the C language
constructions

that is suited for these instructions.

If the new functions are expected to be utilized using builtin

functions, those functions should be specified either in this

specification or in a companion specification. In the latter case,

that specification should be ratified along the Zfa specification.

--------------------

* The constants of the FLI.i instructions

In my experience, the constants in the table 25.1 doesn't
represent

the most commonly used floating-point constants.

However, just relying on gut feeling isn't a good way to design

processor instructions.

Instead, I instrumented the IAR compiler and build a large body of

code. The table below is the head of that list (it combines both
32

and 64 bit types).

Clearly, I see a different pattern compared to the constants
provided

in the Zfa suggestion. (Of course, it could be possible to refine
this

analysis further by investigating the context of the constants --
e.g.

if the "-1.0" is used for addition, in which case it could be

rewritten as a subtraction.)

On the other hand, constructing a non-fractional floating-point
value

is easy in RISC-V assembly, for example:

addi a0, zero, 123

fcvt.d.w fa0, a0

Whereas it is a lot harder to create a more complex value,
especially

for types larger than 32 bits.

Of the proposed constants, there are some that are have very
little or

no presence in my statistical material:

256 -- 4

2^15 -- 2 (-2^15 occurs 9 times)

2^-8 -- 2

1.25 -- 1 (-1.25 occurs 7 times)

2^16 -- 0

2^-7 -- 0

2^-15 -- 0

2^-16 -- 0

0.3125 -- 0

0.375 -- 0

0.4375 -- 0

0.625 -- 0

0.875 -- 0

1.75 -- 0

0.0625 -- 0

Based on the statistics, I suggest that we add the following
constants:

5.0

10.0

6.0

60.0

120.0

pi/2 (or -pi/2) (which seems to be used a lot more than pi)

Note: There is no need for 0.0, as this can be created using:

fcvt.d.w fa0, zero

Reservation: The material used might not be representative for
real

world applications as it contains a lot of code specifically used
to

test compilers. However, I believe that this provide a better

selection than the GCC standard library, which I understand was
used

as the base for the proposed Zfa constants.

Appendix: Statistical analysis of floating-point constants.

("*" = Part of the proposed Zfa extension.)

1.0 |
1466 *

2.0 | 615 *

3.0 | 530 *

4.0 | 413 *

-1.0 | 391 *

10.0 | 333

5.0 | 296

0.5 | 238 *

6.0 | 208

120.0 | 201

60.0 | 164

70.0 | 147

7.0 | 143

9.0 | 134

150.0 | 132

Infinity | 130 *

-2.0 | 128

16.0 | 115 *

17.0 | 111

130.0 | 110

12.0 | 109

18.0 | 107

1280.0 | 102

3600.0 | 99

11.0 | 91

90.0 | 83

NaN | 83 *

-3.0 | 81

8.0 | 81 *

-90.0 | 79

2700.0 | 77

-4.0 | 75

14.0 | 73

-5.0 | 67

25.0 | 67

30.0 | 64

1000.0 | 64

1.234 | 63

20.0 | 63

26.0 | 62

2.2 | 59

1.100000023841858 | 59

-1.7363228039172371 | 56

10000000000.0 | 56

-60.0 | 56

1.1 | 53

5.0e-324 | 52

1.0e+35 | 49

99.0 | 48

100.0 | 48

1.0000000000000001e-35 | 48

5.4324 | 47

13.0 | 46

-0.5 | 45

-6.0 | 44

110.0 | 44

-Infinity | 41

0.25 | 41 *

28.0 | 41 *

21.0 | 40

162.0 | 38

3.3 | 38

1.5 | 37

15.0 | 37

0.0 | 36

123.0 | 35

22.0 | 35

80.0 | 34

180.0 | 34

23.0 | 33

31.0 | 28

--------------------

* Suggestion: Add a "scaling" instruction (mul/div by
power-of-two)

One operation that I often see in floating-point based code is
when a

value of scaled up or down with a power of two, where the scale
value

is provided as an immediate.

For example, to multiple fa0 by 8.0, we could use:

SCALEUP.D fa0, fa0, 3

From an implementation point of view, this is almost trivial to

implement in hardware, as a scaling corresponds to an adjustment
of

the "exp" field, with some logic associated with overflow:s (when

scaling up) and underflow (when scaling down).

In addition, this is useful when constructing constants.

fli.d fa0, 1.0 # Note: Unsure about the syntax.

scaledown.d fa0, 14 # Creates 2 ^ -14

This could reduce the need for power of two:s in the FLI.i

instruction.

--

Anders Lindgren

Lead engineer of the IAR compiler for RISC-V

E-mail: anders....@iar.com

Anders Lindgren

Lead engineer of the IAR compiler for RISC-V

E-mail: anders....@iar.com

__
____
____
____
____ ____ ____
__

IAR Systems AB

Box 23051, Strandbodgatan 1

SE-750 23 Uppsala, Sweden

www.iar.com

LinkedIn

May 26, 2023, 9:14:01 AM5/26/23

to Anders Lindgren, isa...@groups.riscv.org

> * Suggestion: Add a "scaling" instruction (mul/div by power-of-two)

>

> One operation that I often see in floating-point based code is when a

> value of scaled up or down with a power of two, where the scale value

> is provided as an immediate.

>

> For example, to multiple fa0 by 8.0, we could use:

>

> SCALEUP.D fa0, fa0, 3

>

> From an implementation point of view, this is almost trivial to

> implement in hardware, as a scaling corresponds to an adjustment of

> the "exp" field, with some logic associated with overflow:s (when

> scaling up) and underflow (when scaling down).

>

> One operation that I often see in floating-point based code is when a

> value of scaled up or down with a power of two, where the scale value

> is provided as an immediate.

>

> For example, to multiple fa0 by 8.0, we could use:

>

> SCALEUP.D fa0, fa0, 3

>

> From an implementation point of view, this is almost trivial to

> implement in hardware, as a scaling corresponds to an adjustment of

> the "exp" field, with some logic associated with overflow:s (when

> scaling up) and underflow (when scaling down).

I agree and already suggested this instruction in this thread on May 4, under the name FADDEXP, plus two equally cheap and useful partners:

- FEXP. Extracts the exponent from a double precision operand register, debiases the exponent, and delivers an integer result in the range [-1077..+1023]

- FFRAC. Normalises (if necessary) an FP value, then sets the exponent to the bias, thus returning a value in [1.0, 2.0).

- FADDEXP. Adds an integer to the exponent of a double precision operand giving a double precision result (possibly newly INF, 0, or denormalised).

Along with FCLASS (which we already, thankfully, have), these instructions are very useful for accelerating the implementation of transcendental functions.

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/b34b5ed5-cd47-aaf8-edd9-1daf69fb75bb%40iar.com.

May 26, 2023, 12:49:27 PM5/26/23

to Anders Lindgren, isa...@groups.riscv.org

May 27, 2023, 10:08:18 AM5/27/23

to Anders Lindgren, isa...@groups.riscv.org

On 5/26/23 07:26, Anders Lindgren wrote:

- The constants selected for FLI.i doesn't seem to match real-world

uses based on a statistic analysis we have conducted.

The idea was to encode values that would only require no more a few gates to synthesize, rather than requiring a lookup table (e.g. pi/2 requires that). That meant a few bits of exponent and significand come directly from the instruction. Given that constraint, I believe the proposal was based on static statistics gathered from libc and some other sources. Since FLI is primarily for code size rather performance, static statistics were most appropriate.

- Finally, an new instructions to scale floating-point values are

proposed.

Please consider implementing this in a compiler and gathering some statistics. Another place that exponent scaling is useful is in conversion from integer to floating-point (e.g. for switching from integer DSP to FP).

-Earl

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu