Exp() function reloaded

hopcode

unread,

Apr 14, 2012, 10:31:40 PM4/14/12

to

Hi,
here my method for the exponentianal function e^x .
it is a mix of 2 fundamentals:
- Taylor series running at a 10 steps for decimals -1.0 < x < +1.0
- a couple of basic rules of the logarithms.

Explanation.
say we want to calculate e^20.3. well,

20.3 float double is 40344CCCCCCCCCCDh hexadecimal

we convert it to base2 by multiplying it by lg2(e)

20.3 x 1.4426950408889632824453128103079f
result 20.3 base2 403D4965C85C0166h

it is needed to know later how much we should shift left
our partial result in its integer part.

now, for the fundamental of powers we know that

n^20.3 can be rewritten as n^20 * n^0.3

also we round down 20.3 base2 to get

20.0 and 0.3 float double remainder

20.0 = 403D000000000000h
0.3 = 3FD2597217005980h

we store the remainder 0.3 now for later use.

then we convert the integral part 20.0 base2 to an integer value,

20.0 float double -> integer, s
after conversion
s = 1Dh 29 decimal

we shift 1 by s times on the left

i = 1 << s

thus, i = 1 * 2^29 = 20000000h ( 536'870'912 decimal)

3 additional checks are needed here. to avoid
overflow after 63 shifts; in the case s > 63 and
checking wether s push i out of the float double capacity.

we reconvert i to float for later use
20000000h integer = 41C0000000000000h float double

we convert the float decimals we stored above in base2
to float decimals in baseE. this is because we want to use
Taylor series from the e^n. i choosed 10 steps max, and having
-1.0 < decimals < +1.0 works enough good.
NOTE: you can extend the range of action of the
Taylor series up to +-8, stepping it ~20 or more times.

recall Taylor now on e^x :
e^x = 1 + x + x^2 / 2! + x^3 / 3! .... + x^n / n!

now, according fundamentals of logarithms

a) if e^x = 2^q
b) and generally, lgBASE(x)^n = n * lgBASE(x)
then we can extend a) this way,

x * ln(e) = q * ln(2)

c) and thus x = q * ln(2) should verify the a) as true identity.

now, because ln(2) = lg10(2) / lg10(e)
lg10(2) = 0.30102999566398119521373889472449f
lg10(e) = 0.43429448190325179004808384911378f
ln(2) = 0.69314718055994536943283387715543f

we apply c)

x = q * ln(2) where q = r our remainder

x = 0.3 base2 * 0.69314718055994536943283387715543f
x = 3FC9700ADD042628 baseE

we give this x to the Taylor expansion routine.
to get back

3FF3848660139D52 as result of exp() on the remainder 0.3

finally for the fundamental of powers above
we multiply

41C0000000000000h * 3FF3848660139D52h
n^20 base2 * n^0.3 base2 = n^20.3 base2
to get back

41C3848660139D52h that corresponds exactly

to our decimal 654904512.1532385

The resulting assmbly code can be found on my website at

http://sites.google.com/site/x64lab/home/reloaded-algorithms/my-exp-function-jexp

it is ~40 lines of code (my Taylors's exp() code + main routine)
it accepts only +numbers for now, and makes no exaustive check
on the floats. for those and negative values i leave it to the
reader's creativity ( being e^-x essentially 1 / e^x ).
this is the fastest method i know, it should time ~80 cycles totally,
i didnt check it yet. it's not so important.

of some relevance to me was

1) avoid the Intel Approx Math library license
2) avoid things like the cmath library
3) avoid the 2 FPU slow instructions
FYL2X to compute y * log2(x)
F2XM1 to compute 2^x - 1
because 250/300 cycles for 2 instro
it's the insanity, 100%, pure :-) especially
on tests i am doing from huge RND-data outputs.

4) using Taylor series the right way,
because we would need lot of steps to get
the right values on plugging in large x, as in the example e^20.3

but the true-truth is that i am not yet ready
for Chebyshev; simply because i need some time
to understand something more of his genial calculus.
if you have simplified references about him,
please share it.

and not much time to write a full assembly
math library. if someone is interested to contribute
the library is open source, under MPL license, but assembly
required, please. the library lies under the name "amrt",
in the same way as my other one, "art"
a-ssembly-run-t-ime. i will write/update it from time to time
only on my needs.
you find me on clax86.

Thank you all,

hopcode aka Marc Rainer Kranz

--
.:x64lab:.
group http://groups.google.com/group/x64lab
site http://sites.google.com/site/x64lab

pete

unread,

Apr 15, 2012, 1:26:10 AM4/15/12

to

hopcode wrote:
>
> Hi,
> here my method for the exponentianal function e^x .
> it is a mix of 2 fundamentals:
> - Taylor series running at a 10 steps for decimals -1.0 < x < +1.0
> - a couple of basic rules of the logarithms.

> it is ~40 lines of code (my Taylors's exp() code + main routine)
> it accepts only +numbers for now, and makes no exaustive check
> on the floats. for those and negative values i leave it to the
> reader's creativity ( being e^-x essentially 1 / e^x ).

I've code Taylors's exp(x) for both positive and negative x in C,
using -1.0 < x < +1.0,
but without making use of e^-x being essentially 1 / e^x.

/* BEGIN new.c output */

fs_expl(20.3) is 654904512.153239
fs_expl(20.3) - 654904512.153230 is 8.583069e-006

fs_expl(-20.3) is 1.526940e-009
fs_expl(-20.3) - 1.526940e-009 is 1.591266e-016

/* END new.c output */

/* BEGIN new.c */

#include <stdio.h>

long double fs_expl(long double x)
{
long unsigned n, square;
long double b, e, old;

for (square = 0; x > 1; x /= 2) {
++square;
}
while (-1 > x) {
++square;
x /= 2;
}
e = b = n = 1;
do {
b /= n++;
b *= x;
e += b;
b /= n++;
b *= x;
old = e;
e += b;
} while (e > old);
while (square-- != 0) {
e *= e;
}
return e;
}

int
main(void)
{
puts("/* BEGIN new.c output */\n");
printf("fs_expl(20.3) is %lf\n",
fs_expl(20.3));
printf("fs_expl(20.3) - 654904512.153230 is %le\n\n",
fs_expl(20.3) - 654904512.153230);

printf("fs_expl(-20.3) is %le\n",
fs_expl(-20.3));
printf("fs_expl(-20.3) - 1.526940e-009 is %le\n\n",
fs_expl(-20.3) - 1.526940e-009);

puts("/* END new.c output */");
return 0;
}

/* END new.c */

--
pete

Dave Dodson

unread,

Apr 15, 2012, 12:47:09 AM4/15/12

to

Rather than inventing these yourself, which is quite difficult to do with both the ultimate in accuracy and speed, I think you should take a look at a book such as

Elementary Functions: Algorithms and Implementation, by Jean-Michael Muller, available from Amazon.com, or

the somewhat dated Software Manual for the Elementary Functions, by Cody and Waite, available for free download from http://www.ebook3000.com/William-J--Cody---Software-Manual-for-the-Elementary-Functions_43370.html.

You probably would learn quite a bit from these authors, who have years of experience doing this kind of thing.

Dave

hopcode

unread,

Apr 15, 2012, 6:18:06 AM4/15/12

to

Il 15.04.2012 07:26, pete ha scritto:
>
> I've code Taylors's exp(x) for both positive and negative x in C,
> using -1.0< x< +1.0,
> but without making use of e^-x being essentially 1 / e^x.
>

> ...

> for (square = 0; x> 1; x /= 2) {

> while (-1> x) {
> x /= 2;
> do {
> b /= n++;
> b /= n++;
> ...

that is a waster divisional toy, pete.
replace it with something better. ;-)

Cheers,

--
.:mrk[hopcode]

hopcode

unread,

Apr 15, 2012, 8:16:47 AM4/15/12

to

Il 15.04.2012 06:47, Dave Dodson ha scritto:
> Rather than inventing these yourself, which is quite difficult to do with both...
>
Hi Dave,
rather than quoting my explanation entirely again
you could have outlined how Chebyshev's polynomials
deserve the "invention" appelative; the Ruffini's rule
for example or the Taylor expansion too.

in fact what i did is an "application" because i assemble
and apply 2 or more things together taking
advantage-of and trying-to preserve their inner native accuracy etc.

> Elementary Functions: Algorithms and Implementation, by Jean-Michael Muller, available from Amazon.com, or
>

from the first page preview on Amazon, it seems very interesting and
accessible to simple-minded-squirrel like i am.
i will give a try in my town library.
very thanks for the reference.

> You probably would learn quite a bit from these authors, who have years of experience doing this kind of thing.

probability here doesnt exceed 1 standard deviation from the mean ;-)

Cheers,

--
.:mrk[hopcode]

pete

unread,

Apr 15, 2012, 9:30:05 AM4/15/12

to

hopcode wrote:
>
> Il 15.04.2012 07:26, pete ha scritto:
> >
> > I've code Taylors's exp(x) for both positive and negative x in C,
> > using -1.0< x< +1.0,
> > but without making use of e^-x being essentially 1 / e^x.
> >
> > ...
> > for (square = 0; x> 1; x /= 2) {
> > while (-1> x) {
> > x /= 2;
> > do {
> > b /= n++;
> > b /= n++;
> > ...
>
> that is a waster divisional toy, pete.
> replace it with something better. ;-)

The point that I was trying make,
is that if you can do Taylor's for (-1.0 < x),
then it may not be necessary to consider

e^-x being essentially 1 / e^x.

--
pete

io_x

unread,

Apr 16, 2012, 3:01:10 AM4/16/12

to

"pete" <pfi...@mindspring.com> ha scritto nel messaggio
news:4F8A5B...@mindspring.com...

> hopcode wrote:
>>
>> Hi,
>> here my method for the exponentianal function e^x .
>> it is a mix of 2 fundamentals:
>> - Taylor series running at a 10 steps for decimals -1.0 < x < +1.0
>> - a couple of basic rules of the logarithms.
>
>> it is ~40 lines of code (my Taylors's exp() code + main routine)
>> it accepts only +numbers for now, and makes no exaustive check
>> on the floats. for those and negative values i leave it to the
>> reader's creativity ( being e^-x essentially 1 / e^x ).
>
> I've code Taylors's exp(x) for both positive and negative x in C,
> using -1.0 < x < +1.0,
> but without making use of e^-x being essentially 1 / e^x.
>
> /* BEGIN new.c output */
>
> fs_expl(20.3) is 654904512.153239
> fs_expl(20.3) - 654904512.153230 is 8.583069e-006
>
> fs_expl(-20.3) is 1.526940e-009
> fs_expl(-20.3) - 1.526940e-009 is 1.591266e-016
>
> /* END new.c output */
>

Pet's one is better and fast of what i use...

#include <stdio.h>
#include <math.h>

#define u32 unsigned
#define u64 unsigned __int64
#define i32 int
#define u16 unsigned short
#define u8 unsigned char
#define i8 char
#define i16 short
#define sdC __stdcall

#define F for
#define R return
#define W while
#define G goto
#define P printf

double ke=2.7182818284590452353602874713527;
int con;

long double fs_expl(long double x)
{
long unsigned n, square;
long double b, e, old;

con=0;
for (square = 0; x > 1; x /= 2) {++con;
++square;
}
while (-1 > x) {++con;

++square;
x /= 2;
}
e = b = n = 1;

do {++con;

b /= n++;
b *= x;
e += b;
b /= n++;
b *= x;
old = e;
e += b;
} while (e > old);

while (square-- != 0) {++con;

e *= e;
}
return e;
}

// alcune cose [quelle giuste]
// provengono da "Satoshi Tomabechi"
// per il resto non so 100% come funziona...
// 0 per errore altrimenti ritorna il numero di cicli
u32 mexp(double* r, double espon)
{double ip, fp, t, x;
u32 v, w, i;

i=0;
if(r==0) R 0;
*r=0.0;
if(espon==0.0) {*r=1.0;
R 1;}
else if(espon<0) x=-espon;
else x= espon;
fp=modf(x, &ip); t=ke;
if(ip>0xFFFFFFF) R 0;
F(w=1, v=ip; w<v; ++w)
{++i; t=t*ke;}
ip=t;
v=2; t=1+fp; x=fp;
F(;;){++i;
x= x*(fp/v);
if(x==0.0) break;
t+=x; ++v;
}
t*=ip;
if(espon>=0) *r= t;
else *r=1.0/t;
R i;
}

int main(void)
{long double xx, dd;
double d, r, x, y;
u32 i;
dd=12.1929292992; d=dd;
x=exp(d); i=mexp(&y,d);
P("exp(%.20f)=%.20f, %.20f i=%d\n", d, x, y, i);
xx=fs_expl(dd);
x=xx;
P("exp(%.20f)=%.20f i=%d\n", d, x, con);
R 0;
}

exp(12.19292929919999935000)=197388.53005457221300000000,
197388.53005457215480000000 i=144
exp(12.19292929919999935000)=197388.53005457221300000000 i=18

io_x

unread,

Apr 16, 2012, 5:50:22 AM4/16/12

to

"io_x" <a...@b.c.invalid> ha scritto nel messaggio
news:4f8bc299$0$1377$4faf...@reader2.news.tin.it...

>
> "pete" <pfi...@mindspring.com> ha scritto nel messaggio
> news:4F8A5B...@mindspring.com...

fs_expl(1000000) here would go to seg fault...
it seems sloppy

#include <stdio.h>

#define u32 unsigned
#define u64 unsigned __int64
#define i32 int
#define u16 unsigned short
#define u8 unsigned char
#define i8 char
#define i16 short
#define sdC __stdcall

#define F for
#define R return
#define W while
#define G goto
#define P printf

int con;

long double fs_expl(long double x)
{
long unsigned n, square;
long double b, e, old;

con=0;
for (square = 0; x > 1; x /= 2) {++con;
++square;
}
while (-1 > x) {++con;
++square;
x /= 2;
}
e = b = n = 1;
do {++con;
b /= n++;
b *= x;
e += b;
b /= n++;
b *= x;
old = e;
e += b;
} while (e > old);
while (square-- != 0) {++con;
e *= e;
}
return e;
}

#include <math.h>
#include <float.h>

double ke =2.7182818284590452353602874713527;
double maxv=DBL_MAX/ke;

// alcune cose [quelle giuste]
// provengono da "Satoshi Tomabechi"
// per il resto non so 100% come funziona...
// 0 per errore altrimenti ritorna il numero di cicli
u32 mexp(double* r, double espon)

{double ip, fp, t, x, y;
u32 v, w, i;

if(r==0) R 0;
*r=0.0; i=0;
if(espon==0.0) {*r=1.0; R 1;}
else if(espon< 0.0) x=-espon;

else x= espon;
fp=modf(x, &ip); t=ke;
if(ip>0xFFFFFFF) R 0;
F(w=1, v=ip; w<v; ++w)

{++i; // integer part
if(t>maxv)
{if(espon<0)
{*r=0.0; R i;}
else {*r=DBL_MAX; R 0;} // where is +INF?

}
t=t*ke;
}
ip=t;
v=2; t=1+fp; x=fp;

F(;;){++i; // fractional part
x= x*(fp/v);
if(x<=DBL_EPSILON) break;

t+=x; ++v;
}
t*=ip;
if(espon>=0) *r= t;
else *r=1.0/t;
R i;
}

int main(void)

{long double xx=3.0, dd;

double d, r, x, y;
u32 i;

dd=120.1929292992; d=dd;

x=exp(d); i=mexp(&y,d);
P("exp(%.20f)=%.20f, %.20f i=%d\n", d, x, y, i);
xx=fs_expl(dd);
x=xx;
P("exp(%.20f)=%.20f i=%d\n", d, x, con);
R 0;
}

---
exp(120.19292929920000290000)=1.581706715156910220000000000000000000000e+52,
1.581706715156901713000000000000000000000e+52
i=130
exp(120.19292929920000290000)=1.581706715156910220000000000000000000000e+52 i=25

my seems slower

io_x

unread,

Apr 16, 2012, 9:46:45 AM4/16/12

to

"io_x" <a...@b.c.invalid> ha scritto nel messaggio

news:4f8bea40$0$1381$4faf...@reader1.news.tin.it...

Do you like the exp function bithread threadExp()? :)

#include <stdio.h>
#include <stdint.h>

#define u64 uint64_t
#define u32 uint32_t
#define u16 uint16_t
#define u8 uint8_t
#define i64 int64_t
#define i32 int32_t
#define i16 int16_t
#define i8 int8_t

// macro for keywords
#define R return
#define P printf
#define F for
#define G goto

u32 maxei=0;
double maxef=0.0;

// alcune cose [quelle giuste]
// provengono da "Satoshi Tomabechi"
// per il resto non so 100% come funziona...
// 0 per errore altrimenti ritorna il numero di cicli
u32 mexp(double* r, double espon)
{double ip, fp, t, x, y;
u32 v, w, i;

// una volta sola...
if(maxei==0) // calcola esponente intero massimo+1
{F(w=1, t=ke; w<0xFFFFFFFF; ++w)
{if(t>=maxv) break;
t=t*ke;
}
if(w==0xFFFFFFFF) R 0;
maxei=w; maxef=t;
}
if(r==0) R 0;
*r=0.0;

if(espon==0.0) {*r=1.0; R 1;}
else if(espon< 0.0) x=-espon;
else x= espon;
fp=modf(x, &ip); t=ke;

if(ip>0xFFFFFFF||ip>=maxef)
{if(espon<0) {*r=0.0; R 1;}
else {*r=DBL_MAX; R 0;}
}
F(w=1, v=ip, i=0; w<v; ++w)
{++i; t=t*ke; }

ip=t;
v=2; t=1+fp; x=fp;
F(;;){++i;

x= x*(fp/v);
if(x<=DBL_EPSILON) break;
t+=x; ++v;
}
t*=ip;
if(espon>=0) *r= t;
else *r=1.0/t;
R i;
}

#include <windows.h>

unsigned long __stdcall thrInteger(void* a)
{double t, ip;
u32 w, v, i, *pu;
t=ke; ip=*(double*)a; i=0;

F(w=1, v=ip; w<v; ++w)
{++i;

t=t*ke;
}
pu=(u32*)((u8*) a + sizeof(double));
*(double*)a=t; *pu=i;
R 0;
}

unsigned long __stdcall thrFractional(void* a)
{double x, t, fp;
u32 w, v, i, *pu;

fp=*(double*)a; i=0;

v=2; t=1+fp; x=fp;
F(;;){++i;

x= x*(fp/v);
if(x<=DBL_EPSILON) break;
t+=x; ++v;
}

pu=(u32*)((u8*) a + sizeof(double));
*(double*)a=t; *pu=i;
R 0;
}

u32 threadExp(double* r, double espon)
{double ip[2], fp[2], t;
void *hThr[4];
unsigned long IDThr;
u32 w, i, *wi, *wf;

// una volta sola...
if(maxei==0) // calcola esponente intero massimo+1
{F(w=1, t=ke; w<0xFFFFFFFF; ++w)
{if(t>=maxv) break;
t=t*ke;
}
if(w==0xFFFFFFFF) R 0;
maxei=w; maxef=t;
}
if(r==0) R 0;
*r=0.0;
if(espon==0.0) {*r=1.0; R 1;}
else if(espon< 0.0) t=-espon;
else t= espon;
fp[0]=modf(t, &ip[0]);
if(ip[0]>0xFFFFFFF||ip[0]>=maxef)
{if(espon<0) {*r=0.0; R 1;}
else {*r=DBL_MAX; R 0;}
}
// un thread calcola la parte intera
hThr[0]=CreateThread(NULL,0,thrInteger, ip,0,&IDThr);
if(hThr[0]==0) R 0;
// un thread calcola la parte frazionaria
hThr[1]=CreateThread(NULL,0,thrFractional, fp, 0,&IDThr);
if(hThr[1]==0) {CloseHandle(hThr[0]); R 0;}
// aspetta la fine di entrambi...
F(i=0; i<=1; ++i)
WaitForSingleObject(hThr[i], INFINITE);
CloseHandle(hThr[1]); CloseHandle(hThr[0]);
wi=(u32*)((u8*)&ip+sizeof(double)); // are there aligned u32 problems?
wf=(u32*)((u8*)&fp+sizeof(double));
i=*wi+*wf; t=ip[0]*fp[0];

if(espon>=0) *r= t;
else *r=1.0/t;
R i;
}

int main(void)
{long double xx=3.0, dd;
double d, r, x, y;
u32 i;

dd=120.1929292992; d=dd;

x=exp(d); i=threadExp(&y,d);

io_x

unread,

Apr 16, 2012, 10:07:38 AM4/16/12

to

"io_x" <a...@b.c.invalid> ha scritto nel messaggio

news:4f8c21ad$0$1375$4faf...@reader1.news.tin.it...

>
> "io_x" <a...@b.c.invalid> ha scritto nel messaggio
> news:4f8bea40$0$1381$4faf...@reader1.news.tin.it...
>
> Do you like the exp function bithread threadExp()? :)

i find one couple of error in the past post.. excuse me...
than i think is not sure the aligned of u32 type in
"wi=(u32*)((u8*)&ip+sizeof(double)); etc"
here is ok but there??

but because it is easier for me i chose the risk in this machine
at last...

if(ip>0xFFFFFFF||ip>=maxei)

if(ip[0]>0xFFFFFFF||ip[0]>=maxei)

{if(espon<0) {*r=0.0; R 1;}
else {*r=DBL_MAX; R 0;}
}
// un thread calcola la parte intera
hThr[0]=CreateThread(NULL,0,thrInteger, ip,0,&IDThr);
if(hThr[0]==0) R 0;
// un thread calcola la parte frazionaria
hThr[1]=CreateThread(NULL,0,thrFractional, fp, 0,&IDThr);
if(hThr[1]==0) {CloseHandle(hThr[0]); R 0;}

F(i=0; i<=1; ++i)
WaitForSingleObject(hThr[i], INFINITE);
CloseHandle(hThr[1]); CloseHandle(hThr[0]);
wi=(u32*)((u8*)&ip+sizeof(double)); // are there aligned u32 problems?
wf=(u32*)((u8*)&fp+sizeof(double));
i=*wi+*wf; t=ip[0]*fp[0];
if(espon>=0) *r= t;
else *r=1.0/t;
R i;
}

int main(void)
{long double xx=3.0, dd;
double d, r, x, y;
u32 i;

dd=700.1929292992; d=dd;

x=exp(d); i=threadExp(&y,d);
P("exp(%.20f)=%.20f, %.20f i=%d\n", d, x, y, i);

//xx=fs_expl(dd);

pete

unread,

Apr 16, 2012, 6:12:34 PM4/16/12

to

io_x wrote:
>
> "io_x" <a...@b.c.invalid> ha scritto nel messaggio
> news:4f8bc299$0$1377$4faf...@reader2.news.tin.it...
> >
> > "pete" <pfi...@mindspring.com> ha scritto nel messaggio
> > news:4F8A5B...@mindspring.com...
>
> fs_expl(1000000) here would go to seg fault...
> it seems sloppy

It *is* sloppy.
I posted it only to try to make a point about the algorithm
concerning the case when there were negative arguments
in the function call,
and I'm not sure whether or not the point applies
to the way that hopcode is implementing the exp function.
I wrote it to be portable
to all conforming freestanding C implementations,
as an academic exercise in C programming.

The real function is here:

http://www.mindspring.com/~pfilandr/C/fs_math/fs_math.c

#include <float.h>

long double fs_expl(long double x);

long double fs_logl(long double x);

long double fs_expl(long double x)
{
long unsigned n, square;

long double b, e;
static long double x_max, x_min, epsilon;
static int initialized;

if (!initialized) {
initialized = 1;
x_max = fs_logl(LDBL_MAX);
x_min = fs_logl(LDBL_MIN);
epsilon = LDBL_EPSILON / 4;
}
if (x_max >= x && x >= x_min) {

for (square = 0; x > 1; x /= 2) {

++square;
}
while (-1 > x) {

++square;
x /= 2;
}
e = b = n = 1;
do {

b /= n++;
b *= x;
e += b;
b /= n++;
b *= x;

e += b;
} while (b > epsilon);

while (square-- != 0) {
e *= e;
}

} else {
e = x > 0 ? LDBL_MAX : 0;
}
return e;
}

long double fs_logl(long double x)
{
long int n;
long double a, b, c, epsilon;
static long double A, B, C;
static int initialized;

if (x > 0 && LDBL_MAX >= x) {
if (!initialized) {
initialized = 1;
B = 1.5;
do {
A = B;
B = 1 / A + A / 2;
} while (A > B);
B /= 2;
C = fs_logl(A);
}
for (n = 0; x > A; x /= 2) {
++n;
}
while (B > x) {
--n;
x *= 2;
}
a = (x - 1) / (x + 1);
x = C * n + a;
c = a * a;
n = 1;
epsilon = LDBL_EPSILON * x;
if (0 > a) {
if (epsilon > 0) {
epsilon = -epsilon;
}
do {
n += 2;
a *= c;
b = a / n;
x += b;
} while (epsilon > b);
} else {
if (0 > epsilon) {
epsilon = -epsilon;
}
do {
n += 2;
a *= c;
b = a / n;
x += b;
} while (b > epsilon);
}
x *= 2;
} else {
x = -LDBL_MAX;
}
return x;
}

--
pete

io_x

unread,

Apr 17, 2012, 3:36:15 AM4/17/12

to

"io_x" <a...@b.c.invalid> ha scritto nel messaggio

news:4f8c268c$0$1378$4faf...@reader2.news.tin.it...

>
> "io_x" <a...@b.c.invalid> ha scritto nel messaggio
> news:4f8c21ad$0$1375$4faf...@reader1.news.tin.it...
>>
>> "io_x" <a...@b.c.invalid> ha scritto nel messaggio
>> news:4f8bea40$0$1381$4faf...@reader1.news.tin.it...
>>
>> Do you like the exp function bithread threadExp()? :)
>
> i find one couple of error in the past post.. excuse me...
> than i think is not sure the aligned of u32 type in
> "wi=(u32*)((u8*)&ip+sizeof(double)); etc"
> here is ok but there??
>
> but because it is easier for me i chose the risk in this machine
> at last...

it is not ok, the thread one is the slower one, than
if x>33 can be abs(mexp(x)-exp(x))> DBL_EPSILON and this is not good...

possible i prefer fixed point because the error in float point
seems depend from magnitude of the number in a way i not like
and not understand and not find a way to precise calculate it...

long double ke =2.7182818284590452353602874713527;
long double maxv =LDBL_MAX/ke;
u32 maxei=0;
long double maxef=0.0;

// alcune cose [quelle giuste]
// provengono da "Satoshi Tomabechi"
// per il resto non so 100% come funziona...
// 0 per errore altrimenti ritorna il numero di cicli

u32 mexp(long double* r, long double espon)
{long double iip, ffp, x, t, y;
double ip, fp, xx;

u32 v, w, i;
// una volta sola...
if(maxei==0) // calcola esponente intero massimo+1

{F(w=0, t=1.0; w<0xFFFFFFFF; ++w)

{if(t>=maxv) break;
t=t*ke;
}
if(w==0xFFFFFFFF) R 0;
maxei=w; maxef=t;
}
if(r==0) R 0;
*r=0.0;
if(espon==0.0) {*r=1.0; R 1;}
else if(espon< 0.0) x=-espon;
else x= espon;

if(x>maxei)

{if(espon<0) {*r=0.0; R 1;}

else {*r=LDBL_MAX; R 0;}
}
if(x>DBL_MAX) R 0;
xx=x; fp=modf(xx, &ip);
F(w=0, t=1.0, v=ip, i=0; w<v; ++w)
{++i; t=t*ke; }
ffp=fp; iip=t;
v=2; t=1+ffp; x=ffp;
F(;;){++i;
x= x*(ffp/v);
if(x<=LDBL_EPSILON) break;
t+=x; ++v;
}
t*=iip;

if(espon>=0) *r= t;
else *r=1.0/t;
R i;
}

#include <windows.h>

#include <time.h>

unsigned long __stdcall thrInteger(void* a)

{long double t, ip;

u32 w, v, i, *pu;

t=1.0; ip=*(long double*)a; i=0;
F(w=0, v=ip; w<v; ++w)
{++i;
t=t*ke;
}
pu=(u32*)((u8*) a + 4*sizeof(u32));
*(long double*)a=t; *pu=i;

R 0;
}

unsigned long __stdcall thrFractional(void* a)

{long double x, t, fp;

u32 w, v, i, *pu;

fp=*(long double*)a; i=0;

v=2; t=1+fp; x=fp;
F(;;){++i;
x= x*(fp/v);

if(x<=LDBL_EPSILON) break;
t+=x; ++v;
}
pu=(u32*)((u8*) a + 4*sizeof(u32));
*(long double*)a=t; *pu=i;
R 0;
}

u32 threadExp(long double* r, long double espon)
{long double ip[8], fp[8], t;
double iip, ffp, tt;

void *hThr[4];
unsigned long IDThr;
u32 w, i, *wi, *wf;

// una volta sola...
if(maxei==0) // calcola esponente intero massimo+1

{F(w=0, t=1.0; w<0xFFFFFFFF; ++w)

{if(t>=maxv) break;
t=t*ke;
}
if(w==0xFFFFFFFF) R 0;
maxei=w; maxef=t;
}
if(r==0) R 0;
*r=0.0;
if(espon==0.0) {*r=1.0; R 1;}
else if(espon< 0.0) t=-espon;
else t= espon;

if(t>DBL_MAX) R 0;
tt=t;
ffp=modf(tt, &iip);
ip[0]=iip; fp[0]=ffp;
if(iip>0xFFFFFFF||ip[0]>=maxei)

{if(espon<0) {*r=0.0; R 1;}

else {*r=LDBL_MAX; R 0;}

}
// un thread calcola la parte intera
hThr[0]=CreateThread(NULL,0,thrInteger, ip,0,&IDThr);
if(hThr[0]==0) R 0;
// un thread calcola la parte frazionaria
hThr[1]=CreateThread(NULL,0,thrFractional, fp, 0,&IDThr);
if(hThr[1]==0) {CloseHandle(hThr[0]); R 0;}
F(i=0; i<=1; ++i)
WaitForSingleObject(hThr[i], INFINITE);
CloseHandle(hThr[1]); CloseHandle(hThr[0]);

wi=(u32*)((u8*)&ip+4*sizeof(u32)); // are there aligned u32 problems?
wf=(u32*)((u8*)&fp+4*sizeof(u32));
t=ip[0]*fp[0];
i=*wi+*wf;

if(espon>=0) *r= t;
else *r=1.0/t;
R i;
}

int main(void)

{long double xx, dd;

double d, r, x, y;

u32 i, z, w, c;

dd=700.1929292992; d=dd;
x=exp(d); i=mexp(&xx,d);
P("exp(%.20f)=%.20f, %.20Lf i=%d\n", d, x, xx, i);
//xx=fs_expl(dd);
//x=xx;
//P("exp(%.20f)=%.20f i=%d\n", d, x, con);
srand(0);
F(i=0, c=0; i<100000; ++i)
{ne:
z=rand(); w=rand();
d=z? (double)w/z: (double) w;
if(d>33){++c; G ne;}
dd=d; mexp(&xx, dd);
y=exp(d);
if(abs(y- (double) xx)>DBL_EPSILON)
{P("Errore: exp(%.20f)=%.20f!=%.20Lf c=%u\n", d, y, xx, c);
break;
}
}
if(i==100000) P("all ok c=%u\n", c);
R 0;
}

phreda

unread,

Apr 19, 2012, 10:43:13 PM4/19/12

to

perhaps this help

http://www.quinapalus.com/efunc.html

I implement in 16.16 fixed point arithmetic in r4

http://code.google.com/p/reda4/source/browse/trunk/r4/Lib/math.txt

pablo

hopcode

unread,

Apr 24, 2012, 8:48:00 AM4/24/12

to

Hi pablo,

Il 20.04.2012 04:43, phreda ha scritto:
> perhaps this help
>
> http://www.quinapalus.com/efunc.html

thanks for the link, i knew it already. that's a
simple and clever method on 4 decimals; but it requires
some practical proof to get ~8 decimals precision.
btw, by using a couple of lookup tables
it is possible to make it completely branchless,
just because those magnitudes (of 2's factors) depend
on positions of a single bit.

>
> I implement in 16.16 fixed point arithmetic in r4
>
> http://code.google.com/p/reda4/source/browse/trunk/r4/Lib/math.txt

nice project! no 64bit release ?

Cheers,

--
.:mrk[hopcode]

phreda

unread,

Apr 25, 2012, 3:36:34 PM4/25/12

to

Hi hopcode

yes, when finish the first optimice compile, I hope soon, I try to
write the x64 version. your site is in my list.. :)

About a table lookup, my previous implementation of SIN and COS use
table lookup, I not measure the speed up but are noticiable in a 3d
program..

I don't know how add more bits to fractional part in this algorithms
but my plan in 64bit arena is keep the 16 and get 48.16 fixed point
rutines.

Pablo