Experimenting with a floating point

214 views
Skip to first unread message

Anton Krug

unread,
Dec 8, 2017, 10:08:08 AM12/8/17
to RISC-V SW Dev

Hi Everybody,


I'm experimenting with a rv32iamf core and I'm not sure if this is by design or not. In essence, I'm fine with the low precision of single precision and want to avoid double precision software implementation.

The whole codebase is using floats, but then when I call functions such as powf, or sqrtf they will still convert the float internally to double and then use the double for few moment before retunring to float again, which I don't want.


80002620 <powf>:
powf():
80002620: fa010113          addi sp,sp,-96
80002624: 04912a23          sw s1,84(sp)
80002628: 00007497          auipc s1,0x7
8000262c: 02c48493          addi s1,s1,44 # 80009654 <__fdlib_version>
80002630: 04812c23          sw s0,88(sp)
80002634: 02812e27          fsw fs0,60(sp)
80002638: 02912c27          fsw fs1,56(sp)
8000263c: 03212a27          fsw fs2,52(sp)
80002640: 04112e23          sw ra,92(sp)
80002644: 05212823          sw s2,80(sp)
80002648: 05312623          sw s3,76(sp)
8000264c: 03312827          fsw fs3,48(sp)
80002650: 20a504d3          fmv.s fs1,fa0
80002654: 20b58453          fmv.s fs0,fa1
80002658: 564000ef          jal ra,80002bbc <__ieee754_powf>
8000265c: 0004a403          lw s0,0(s1)
80002660: fff00793          li a5,-1
80002664: 20a50953          fmv.s fs2,fa0
80002668: 06f40a63          beq s0,a5,800026dc <powf+0xbc>
8000266c: a08427d3          feq.s a5,fs0,fs0
80002670: 06078663          beqz a5,800026dc <powf+0xbc>
80002674: a094a7d3          feq.s a5,fs1,fs1
80002678: 1e078463          beqz a5,80002860 <powf+0x240>
8000267c: f00009d3          fmv.s.x fs3,zero
80002680: a134a7d3          feq.s a5,fs1,fs3
80002684: 08078463          beqz a5,8000270c <powf+0xec>
80002688: a13427d3          feq.s a5,fs0,fs3
8000268c: 14078463          beqz a5,800027d4 <powf+0x1b4>
80002690: 20948553          fmv.s fa0,fs1
80002694: 00100793          li a5,1
80002698: 00f12423          sw a5,8(sp)
8000269c: 00007797          auipc a5,0x7
800026a0: 8ec78793          addi a5,a5,-1812 # 80008f88 <_read_r+0x110>
800026a4: 00f12623          sw a5,12(sp)
800026a8: 02012423          sw zero,40(sp)
800026ac: 564040ef          jal ra,80006c10 <__extendsfdf2>
800026b0: 20840553          fmv.s fa0,fs0
800026b4: 00a12823          sw a0,16(sp)
800026b8: 00b12a23          sw a1,20(sp)
800026bc: 554040ef          jal ra,80006c10 <__extendsfdf2>
800026c0: 00a12c23          sw a0,24(sp)
800026c4: 00b12e23          sw a1,28(sp)
800026c8: 02012023          sw zero,32(sp)
800026cc: 02012223          sw zero,36(sp)
800026d0: 00007797          auipc a5,0x7
800026d4: 8887a907          flw fs2,-1912(a5) # 80008f58 <_read_r+0xe0>
800026d8: 0c040463          beqz s0,800027a0 <powf+0x180>
800026dc: 05c12083          lw ra,92(sp)
800026e0: 05812403          lw s0,88(sp)
800026e4: 21290553          fmv.s fa0,fs2
800026e8: 05412483          lw s1,84(sp)
800026ec: 05012903          lw s2,80(sp)
800026f0: 04c12983          lw s3,76(sp)
800026f4: 03c12407          flw fs0,60(sp)
800026f8: 03812487          flw fs1,56(sp)
800026fc: 03412907          flw fs2,52(sp)
80002700: 03012987          flw fs3,48(sp)
80002704: 06010113          addi sp,sp,96
80002708: 00008067          ret
8000270c: 55d010ef          jal ra,80004468 <finitef>
80002710: 1c050063          beqz a0,800028d0 <powf+0x2b0>
80002714: f00007d3          fmv.s.x fa5,zero
80002718: a0f927d3          feq.s a5,fs2,fa5
8000271c: fc0780e3          beqz a5,800026dc <powf+0xbc>
80002720: 20948553          fmv.s fa0,fs1
80002724: 545010ef          jal ra,80004468 <finitef>
80002728: fa050ae3          beqz a0,800026dc <powf+0xbc>
8000272c: 20840553          fmv.s fa0,fs0
80002730: 539010ef          jal ra,80004468 <finitef>
80002734: fa0504e3          beqz a0,800026dc <powf+0xbc>
80002738: 20948553          fmv.s fa0,fs1
8000273c: 00400793          li a5,4
80002740: 00f12423          sw a5,8(sp)
80002744: 00007797          auipc a5,0x7
80002748: 84478793          addi a5,a5,-1980 # 80008f88 <_read_r+0x110>
8000274c: 00f12623          sw a5,12(sp)
80002750: 02012423          sw zero,40(sp)
80002754: 4bc040ef          jal ra,80006c10 <__extendsfdf2>

and then i uses software implementation to work on double.

I think it's ussing the correct multilib (with correct abi):

riscv-unknown-elf-gcc/bin/../lib/gcc/riscv64-unknown-elf/7.1.1/../../../../riscv64-unknown-elf/lib/rv32imaf/ilp32f/libm.a(lib_a-wf_pow.o) (__ieee754_powf)
Shouldn't the purpose of the powf/sqrtf be to avoid doubles compared to the pow/sqrt ?

Or this is on purpose and by design?

Best regards,
Anton


Anton Krug

unread,
Dec 8, 2017, 11:13:42 AM12/8/17
to RISC-V SW Dev

Forgot to add that the powf is calling the __extendsfdf2 often which is the float to double convertor which is used as if you would do msoft-float approach (which I try to avoid)

 

https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html

 

Of course then there are other soft float add/sub/div implementations added as well.

 

 

Regards,

Anton

 

From: Anton Krug [mailto:anton...@microsemi.com]
Sent: Friday, December 8, 2017 3:08 PM
To: RISC-V SW Dev <sw-...@groups.riscv.org>
Subject: [sw-dev] Experimenting with a floating point

 

EXTERNAL EMAIL

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/00396c6ea14a4be39d39d5d43306c074%40microsemi.com.

Jim Wilson

unread,
Dec 8, 2017, 2:47:23 PM12/8/17
to Anton Krug, RISC-V SW Dev
On Fri, Dec 8, 2017 at 8:12 AM, Anton Krug <anton...@microsemi.com> wrote:
> Forgot to add that the powf is calling the __extendsfdf2 often which is the
> float to double convertor which is used as if you would do msoft-float
> approach (which I try to avoid)

I assume you are using newlib. Looking at the code, it appears that
everything is done in float or in float-sized integers, except when
there is an error. If there is a domain error, an overflow, or an
underflow, the code fills in an exception structure and then depending
on which standard newlib is configured for, it will force a particular
return value and/or call matherr with the exception structure, and/or
set errno. The call to matherr with the exception structure is the
problem as the structure is defined as

struct exception
{
int type;
char *name;
double arg1;
double arg2;
double retval;
int err;
};

so setting the arg1/arg2/retval fields requires a float to double
conversion. The structure definition comes from the SVID standard, so
we can't change it without problems. It might be possible to
reorganize the code a little to avoid setting the struct exception
fields if we aren't calling matherr. Or maybe give you the option of
configuring newlib without the SVID support. Any change here would
take some time to develop and test.

It isn't clear if this is a serious problem, because if you have
domain/overflow/underflow errors, then the performance of the code may
not be your biggest problem. Most calls to powf should not cause
errors, and hence should not be doing float to double conversions.

There is a global variable _LIB_VERSION that you can set to choose
which standard the math library will conform to. This can be set to
_IEEE_, _POSIX_, _SVID_, or _XOPEN_. If set to _SVID_ or _XOPEN_,
then matherr will be called. If set to _IEEE_ or _POSIX_, then
matherr will not be called. The struct exception fields are still set
though, so you still get the float to double conversions you don't
want. The default is _XOPEN_, so it is calling matherr by default.

Jim

Anton Krug

unread,
Dec 8, 2017, 11:32:43 PM12/8/17
to Jim Wilson, RISC-V SW Dev

Thank your Jim for the deep explanation. Yes in case of an exception the performance is the least of the worry. Yes it's the newlib

I'm worried if there is something else happening as well, I have seen that the software add, subtraction and division were included:

__divdf3

__adddf3

__subdf3

Are this required for the exceptions, or there is something else happening?


I tried the following:


_LIB_VERSION_TYPE _LIB_VERSION = _POSIX_;  // and tried _IEEE_


But still the divdf3 are still present, If I understand it correctly the divdf3 is added there on compile time, while the global variable affects functionality at runtime only.


Anton



From: Jim Wilson <ji...@sifive.com>
Sent: 08 December 2017 19:47:20
To: Anton Krug
Cc: RISC-V SW Dev
Subject: Re: [sw-dev] RE: Experimenting with a floating point
 
EXTERNAL EMAIL

Jim Wilson

unread,
Dec 9, 2017, 7:09:17 PM12/9/17
to Anton Krug, RISC-V SW Dev
On Fri, Dec 8, 2017 at 8:32 PM, Anton Krug <anton...@microsemi.com> wrote:
> I'm worried if there is something else happening as well, I have seen that
> the software add, subtraction and division were included:
>
> __divdf3

For pow (x, y), if you have a negative x, a non-integral y, and the
result is a NaN, then you get a domain error. We need a double NaN
for the struct exception retval field, which is generated by doing a
double 0.0/0.0 operation. There may be a better way to do this for a
single-float-only target, but this is exception code so probably not
critical.

> __adddf3
> __subdf3

These are both called from rint. For pow (x, y), if you have a
negative x, and the result is an infinity, then we need a double
HUGE_VAL for the struct exception retval field, which is provided by a
macro that calls a compiler builtin, except that this needs to be
positive if y*0.5 is an integer, and negative if y*0.5 is a
non-integer, so rint is called to check to see if y*0.5 is an integral
value or not. But we do have an rintf function, so this one is
fixable by calling rintf instead of rint. It appears that powf is the
only place where this mistake is made with rint/rintf. The other
float functions that use rint appear to be correctly calling rintf.
Fixing this gets rid of the adddf3 and subdf3 calls. This is
exception handling code though, so probably not performance critical.

And that reminds me that we could replace the double 0.0/0.0 with
another compiler builtin to generate a NaN. Fixing this gets rid of
the divdf3 calls.

However, we will still be left with the extendsfdf2 calls, because of
the struct exception fields. That is much more work to get rid of.

> I tried the following:
> _LIB_VERSION_TYPE _LIB_VERSION = _POSIX_; // and tried _IEEE_
>
> But still the divdf3 are still present, If I understand it correctly the
> divdf3 is added there on compile time, while the global variable affects
> functionality at runtime only.

This is run-time only, so yes, the divdf3 and friends will still be
there in the binary.

Since I was looking at disassembled code, I noticed that matherr is
just two instructions, one to load 0 into the return value register,
and one to return. So the default version does nothing. This is just
a hook provided so that programmers can override it if they want to do
something more interesting on error, but it is unlikely that many
newlib users are defining this hook. It looks like we are doing a lot
of work for very little benefit. If someone cared enough about this,
they could add a configure option to newlib to disable the matherr
support. This would get rid of the extendsfdf2 calls, at the expense
of losing some SVID/XOPEN compatibility. This would be a moderate
size project though, as there are an awful lot of matherr calls in
newlib, and every single one would have to be tested if someone
changed this. It might be reasonable to try adding ISO C 99 fenv
support as a replacement, but that makes it an even bigger project.

Jim

Anton Krug

unread,
Dec 9, 2017, 8:47:31 PM12/9/17
to Jim Wilson, Anton Krug, RISC-V SW Dev
On Sun, Dec 10, 2017 at 12:09 AM, Jim Wilson <ji...@sifive.com> wrote:
On Fri, Dec 8, 2017 at 8:32 PM, Anton Krug <anton...@microsemi.com> wrote:
> I'm worried if there is something else happening as well, I have seen that
> the software add, subtraction and division were included:
>
> __divdf3

For pow (x, y), if you have a negative x, a non-integral y, and the
result is a NaN, then you get a domain error.  We need a double NaN
for the struct exception retval field, which is generated by doing a
double 0.0/0.0 operation.  There may be a better way to do this for a
single-float-only target, but this is exception code so probably not
critical.

You mean powf? Because the pow is working with doubles as inputs. Could it be replaced by a macro or buildin function to hand craft the bits and then cast it into the double. When I was trying this out, it work pretty well and I had the options to create silent/quiet NaNs with or without payloads.
 

> __adddf3
> __subdf3

These are both called from rint.  For pow (x, y), if you have a
negative x, and the result is an infinity, then we need a double
HUGE_VAL for the struct exception retval field, which is provided by a
macro that calls a compiler builtin, except that this needs to be
positive if y*0.5 is an integer, and negative if y*0.5 is a
non-integer, so rint is called to check to see if y*0.5 is an integral
value or not.  But we do have an rintf function, so this one is
fixable by calling rintf instead of rint.  It appears that powf is the
only place where this mistake is made with rint/rintf.  The other
float functions that use rint appear to be correctly calling rintf.
Fixing this gets rid of the adddf3 and subdf3 calls.  This is
exception handling code though, so probably not performance critical.

I noticed the rint vs rintf as well, but didn't wanted to dig into too many questions at the same time.
 

And that reminds me that we could replace the double 0.0/0.0 with
another compiler builtin to generate a NaN.  Fixing this gets rid of
the divdf3 calls.

Should I provide you the code I had for generation of these without using 0.0/0.0?
 

However, we will still be left with the extendsfdf2 calls, because of
the struct exception fields.  That is much more work to get rid of.

Yes that would be great, getting rid of the rint, adddf3, subdf3 and divdf3, the performance shouldn't be affected, but the bytecode will be smaller and some small cores with small memories might benefit even from small savings. And keeping the extendsfsf2 because has benefits and it sounds not worth to write alternative. The RISCV was designed to cover such huge domain now means that there could be extremely tiny implementation upto extremly large implementations.
 

> I tried the following:
> _LIB_VERSION_TYPE _LIB_VERSION = _POSIX_;  // and tried _IEEE_
>
> But still the divdf3 are still present, If I understand it correctly the
> divdf3 is added there on compile time, while the global variable affects
> functionality at runtime only.

This is run-time only, so yes, the divdf3 and friends will still be
there in the binary.

Since I was looking at disassembled code, I noticed that matherr is
just two instructions, one to load 0 into the return value register,
and one to return.  So the default version does nothing.  This is just
a hook provided so that programmers can override it if they want to do
something more interesting on error, but it is unlikely that many
newlib users are defining this hook.  It looks like we are doing a lot
of work for very little benefit.  If someone cared enough about this,
they could add a configure option to newlib to disable the matherr
support.  This would get rid of the extendsfdf2 calls, at the expense
of losing some SVID/XOPEN compatibility.  This would be a moderate
size project though, as there are an awful lot of matherr calls in
newlib, and every single one would have to be tested if someone
changed this.  It might be reasonable to try adding ISO C 99 fenv
support as a replacement, but that makes it an even bigger project.

Yes the hook has interesting benefits and the alternative sounds as large project with small returns.
 

Jim


--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

Jim Wilson

unread,
Dec 9, 2017, 9:09:31 PM12/9/17
to Anton Krug, Anton Krug, RISC-V SW Dev
On Sat, Dec 9, 2017 at 5:47 PM, Anton Krug <anton...@gmail.com> wrote:
>> And that reminds me that we could replace the double 0.0/0.0 with
>> another compiler builtin to generate a NaN. Fixing this gets rid of
>> the divdf3 calls.
>
> Should I provide you the code I had for generation of these without using
> 0.0/0.0?

I don't need code to compute NaNs, I can just use the compiler
builtins. The compiler builtins load NaN patterns from memory. This
is target dependent, in that different targets can specify different
NaN representations as the default. If these are not the right NaN
patterns for RISC-V, we can fix that. This is using 0x7ff80000 for
the upper word of a double quiet NaN, and 0x7ff40000 for a signalling
NaN. It is a weekend, I can check later if these are right or not.

gamma02:2109$ cat tmp3.c
double sub1 () { return __builtin_nan (""); }
double sub2 () { return __builtin_nans (""); }
gamma02:2110$ cat tmp3.s
.file "tmp3.c"
.option nopic
.text
.align 1
.globl sub1
.type sub1, @function
sub1:
lui a5,%hi(.LC0)
lw a0,%lo(.LC0)(a5)
lw a1,%lo(.LC0+4)(a5)
ret
.size sub1, .-sub1
.align 1
.globl sub2
.type sub2, @function
sub2:
lui a5,%hi(.LC1)
lw a0,%lo(.LC1)(a5)
lw a1,%lo(.LC1+4)(a5)
ret
.size sub2, .-sub2
.section .srodata.cst8,"aM",@progbits,8
.align 3
.LC0:
.word 0
.word 2146959360
.LC1:
.word 0
.word 2146697216
.ident "GCC: (GNU) 7.2.0"
gamma02:2111$

Jim

Tommy Murphy

unread,
Dec 10, 2017, 10:59:44 AM12/10/17
to RISC-V SW Dev
> It isn't clear if this is a serious
> problem, because if you have
> domain/overflow/underflow errors,
> then the performance of the code may
> not be your biggest problem.

It's a problem in a resource (e.g. memory) constrained embedded environment because of the need to carry double overhead when only doing float arithmetic thus bloating the program size arguably unnecessarily.

Anton Krug

unread,
Dec 11, 2017, 5:40:15 AM12/11/17
to Tommy Murphy, RISC-V SW Dev
On small projects these software implementations of double arithmetic become pretty large compared to the rest of the project. If the target would have only 32KB of resources and only the divdf3, adddf3 and subdf3 would take over 9KB of it then it's pretty significant. Especially when it's not really needed.


-----Original Message-----
From: Tommy Murphy [mailto:tommy_...@hotmail.com]
Sent: Sunday, December 10, 2017 4:00 PM
To: RISC-V SW Dev <sw-...@groups.riscv.org>
Subject: Re: [sw-dev] RE: Experimenting with a floating point

EXTERNAL EMAIL


--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/b021e40a-eeef-490d-8381-2ead60631694%40groups.riscv.org

Jim Wilson

unread,
Dec 13, 2017, 4:20:27 PM12/13/17
to Anton Krug, Tommy Murphy, RISC-V SW Dev
On Mon, Dec 11, 2017 at 2:39 AM, Anton Krug <anton...@microsemi.com> wrote:
> On small projects these software implementations of double arithmetic become pretty large compared to the rest of the project. If the target would have only 32KB of resources and only the divdf3, adddf3 and subdf3 would take over 9KB of it then it's pretty significant. Especially when it's not really needed.

Patches to fix this have been committed to the upstream sourceware.org
newlib project.

https://sourceware.org/ml/newlib-cvs/2017-q4/msg00081.html
https://sourceware.org/ml/newlib-cvs/2017-q4/msg00082.html

Jim

Anton Krug

unread,
Dec 14, 2017, 5:11:16 AM12/14/17
to Jim Wilson, Tommy Murphy, RISC-V SW Dev

Thank you very much, that was pretty fast.

I will try to do a build today.


From: Jim Wilson <ji...@sifive.com>
Sent: 13 December 2017 21:20:25
To: Anton Krug
Cc: Tommy Murphy; RISC-V SW Dev

Subject: Re: [sw-dev] RE: Experimenting with a floating point
 
EXTERNAL EMAIL

Anton Krug

unread,
Dec 14, 2017, 7:30:32 AM12/14/17
to Jim Wilson, Tommy Murphy, RISC-V SW Dev

I was applying the patch by hand:


if (_LIB_VERSION == _SVID_) {
  exc.retval = HUGE;
  y *= 0.5;
  if(x<0.0f&&rintf(y)!=y) exc.retval = -HUGE;
} else {
  exc.retval = HUGE_VAL;
  y *= 0.5;
  if(x<0.0f&&rintf(y)!=y) exc.retval = -HUGE_VAL;
}

And I'm wondering should be there y*=0.5f; instead? The y is used only for the condition.





From: Anton Krug <anton...@microsemi.com>
Sent: 14 December 2017 10:11:09
To: Jim Wilson

Cc: Tommy Murphy; RISC-V SW Dev
Subject: Re: [sw-dev] RE: Experimenting with a floating point
 
EXTERNAL EMAIL

Thank you very much, that was pretty fast.

I will try to do a build today.


From: Jim Wilson <ji...@sifive.com>
Sent: 13 December 2017 21:20:25
To: Anton Krug
Cc: Tommy Murphy; RISC-V SW Dev
Subject: Re: [sw-dev] RE: Experimenting with a floating point
 
EXTERNAL EMAIL


On Mon, Dec 11, 2017 at 2:39 AM, Anton Krug <anton...@microsemi.com> wrote:
> On small projects these software implementations of double arithmetic become pretty large compared to the rest of the project. If the target would have only 32KB of resources and only the divdf3, adddf3 and subdf3 would take over 9KB of it then it's pretty significant. Especially when it's not really needed.

Patches to fix this have been committed to the upstream sourceware.org
newlib project.

https://sourceware.org/ml/newlib-cvs/2017-q4/msg00081.html
https://sourceware.org/ml/newlib-cvs/2017-q4/msg00082.html

Jim

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

Jim Wilson

unread,
Dec 14, 2017, 12:44:58 PM12/14/17
to Anton Krug, Tommy Murphy, RISC-V SW Dev
On Thu, Dec 14, 2017 at 4:30 AM, Anton Krug <anton...@microsemi.com> wrote:
> y *= 0.5;
>
> And I'm wondering should be there y*=0.5f; instead? The y is used only for
> the condition.

Any reasonable compiler will get this right, and perform a
single-float multiply. GCC does this, even if you don't request
optimization.

I had to change the "x<0.0" to "x<0.0f" because the code reviewer
insisted that it be changed, even though there was nothing wrong with
the first form. The code reviewer didn't notice the multiply by 0.5,
and I didn't think to volunteer it, so it didn't get "fixed".

Jim

Anton Krug

unread,
Dec 15, 2017, 7:57:52 AM12/15/17
to Jim Wilson, Tommy Murphy, RISC-V SW Dev
That sounds, good. Thank you very much again.

I will look through few other calls if there will be similar issues (like rintf vs rint).


-----Original Message-----
From: Jim Wilson [mailto:ji...@sifive.com]
Sent: Thursday, December 14, 2017 5:45 PM
To: Anton Krug <anton...@microsemi.com>
Cc: Tommy Murphy <tommy_...@hotmail.com>; RISC-V SW Dev <sw-...@groups.riscv.org>
Subject: Re: [sw-dev] RE: Experimenting with a floating point

EXTERNAL EMAIL


Reply all
Reply to author
Forward
0 new messages