intel x86 floating point power funtion in assembly?

Matt C.

unread,

Dec 20, 2004, 1:55:05 PM12/20/04

to

I've been trying to find a good power function (x^y) that is
written in 100% intel x86 fpu assembly language. Does anybody
have one, or do you know where I can find one?
Thanks,
Matt

Terje Mathisen

unread,

Dec 20, 2004, 4:09:31 PM12/20/04

to

Matt C. wrote:

The naive approach is to simply do exp(ln(x)*y), right?

This is relatively straightforward, except that we'd prefer to use
binary logs and exponentiation:

FYL2X calculates the product of a scale factor (i.e. Y) and the binary
logarithm of the input number (X).

F2XM1 returns the result of binary exponentiation minus 1.0, so the
following code should do it:

; Assume the caller has loaded first Y, then X onto the fp stack!
power proc
fyl2x ; Binary log
f2xm1 ; Result - 1
fld1 ; Load 1.0
faddp st(1),st(0) ; Add together and pop
ret ; Return the result as ST(0)
power endp

Next is to detect some of the special cases:

Integer y?

Negative y?

Y close to zero?

X close to zero or one?

Terje
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Grumble

unread,

Dec 20, 2004, 5:00:59 PM12/20/04

to

Terje Mathisen wrote:

> ***********************************************************************
> NOTICE: This e-mail transmission, and any documents, files or previous
> e-mail messages attached to it, may contain confidential or privileged
> information. If you are not the intended recipient, or a person
> responsible for delivering it to the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or use of
> any of the information contained in or attached to this message is
> STRICTLY PROHIBITED. If you have received this transmission in error,
> please immediately notify the sender and delete the e-mail and attached
> documents. Thank you.
> ***********************************************************************

Terje,

Could you please not post this nonsense to Usenet?

--
Regards, Grumble

spam...@crayne.org

unread,

Dec 20, 2004, 9:37:31 PM12/20/04

to

TITLE Exponentiation Routines

; This module contains exponentiation routines:

; 1. EXP (e^x)
; 2. TEN2THEX (10^x)
; 3. Y2THEX (Y^x)
; 4. TWO2THEX (2^x) - which is actually a support routine

; All routines use the identity: Y^x = 2 ^ (x * log2 Y)
.386
INCLUDE COMMON.SEG

CC1 EQU 02H

PUBLIC EXP,TEN2THEX,Y2THEX,TWO2THEX
AGROUP GROUP ASEG
ASEG SEGMENT PARA PUBLIC USE16 'CODE'
ASSUME CS:AGROUP,DS:COMMON

;==============================================================================

; EXP(ST) = e^value in ST. Returns e^x in ST

EXP PROC NEAR
FLDL2E ;ST =log2 e, ST(1) = x
FMULP ST(1),ST ;ST = x * log2 e
CALL TWO2THEX ;ST = e^x
RET
EXP ENDP

;==============================================================================

; TEN2THEX(ST) = 10 ^ value in ST. Returns 10 ^ x in ST

TEN2THEX PROC NEAR
FLDL2T ;ST = log2 10
FMULP ST(1),ST ;ST = x * log2 10
CALL TWO2THEX ;ST = 10 ^ x
RET
TEN2THEX ENDP

;==============================================================================

; Y2THEX = ST ^ value in ST(1). Returns Y ^x in ST

; ST = Y > 0, ST(1) = x

Y2THEX PROC NEAR
FYL2X ;ST = Y * log2 x
CALL TWO2THEX ;ST = Y ^ x
RET
Y2THEX ENDP

;==============================================================================

; TWO2THEX = 2 ^ value in ST (can be negative). Returns 2 ^ x in
ST.

; To implement 2^x, we use F2XM1 [2^x-1] where -0.5 <= x <= 0.5
; with the appropriate scaling.

; Operation:
; 1. FSCALE produces x * 2^n. ST = n integer, ST(1) = x
; 2. F2XM1 produces 2^x -1. 0 <= x <= 0.5 (here)
; 3. X2 = X - X1 where X1 = X rounded to nearest integer
; X2 is positive and is the fractional part of X when
X > 0
; and is the distance rounded if X < 0.
; 4. If X2 > 0.5, subtract 0.5 to force to range 0 <= x
and note.
; 5. Calculate 2^n = (2 ^ X2) * (2 ^ X1) = 2 ^ (X2 + X1)
= 2 ^ X
; 6. If X2 was adjusted, multiply by 2 ^ 1/2.

TWO2THEX PROC NEAR
PUSH AX
FSTCW CNTL_WORD
FLDCW CW_DOWN ;Round down
; ;X
FLD ST(0) ;X,X
FRNDINT ;X1,X
FLDCW CNTL_WORD ;Normal rounding
FSUB ST(1),ST ;X1, X2
FXCH ;X2,X1
FLD HALF ;0.5, X2, X1
FXCH ;X2, 0.5, X1
FPREM ;ST = X2 (C1 = 0) or X2-0.5 (C1 = 1)
FSTSW AX ;Save status
FSTP ST(1) ;Dump 1/2
F2XM1 ;2 ^ X2 -1, X1
FLD1 ;1, 2 ^ X2 - 1, X1
FADDP ST(1),ST ;2 ^ X2, X1
TEST AH,CC1 ;Test C1 set
JZ NO_ADJ ;No, skip adjust by 2^1/2
FLD1 ;1, 2 ^ X2, X1
FADD ST,ST(0) ;2, 2 ^ X2, X1
FSQRT ;2 ^ 1/2, 2 ^ X2, X1
FMULP ST(1),ST ;2 ^ X2, X1
NO_ADJ: FSCALE ;(2 ^ X2) * (2 ^ X1) = 2 ^ (X2 + X1)
= 2 ^ X
FSTP ST(1) ;Dump X1
POP AX
RET
TWO2THEX ENDP

ASEG ENDS
END
_

Matt C.

unread,

Dec 21, 2004, 1:44:16 AM12/21/04

to

Ok, thanks. I'll try all these examples and see what happens.
Matt

Terje Mathisen

unread,

Dec 21, 2004, 3:47:15 AM12/21/04

to

Grumble wrote:

> Terje Mathisen wrote:
>
[Horrible lawyer-mandated, but legally useless NOTICE snipped]

> Terje,
>
> Could you please not post this nonsense to Usenet?

Grumble, I would have replied directly, but your email address was
obviously fake, so that's impossible. :-(

I work for a large norwegian-based company (Hydro), which started
earlier this year to append these RFC-breaking 'legal notice's to _all_
outgoing mail, even mail originating on a local unix system.

To get around this for non-company mail I have been forced to run a ssh
tunnel to my login host in the US, however this does not apply to Usenet
posts though, so most of the groups I freqent are OK.

c.l.a.x86 is an exception to this rule, since it is moderated. :-(

(Don't misunderstand me, moderation is very good! I am really thankful
that someone will handle the work of keeping clax more or less on topic
and flame-free.)

This means that my local news-server (news.hda.hydro.com) will take all
my clax posts, and automatically _mail_ them to the designated
moderator, with no way for me to override this process.

I am really sorry about it, but the only alteratives are for me to
totally stop posting to clax, or for our friendly moderator to install
some kind of filter that removes such notices.

OK?

spam...@crayne.org

unread,

Dec 21, 2004, 4:44:13 AM12/21/04

to

On Tue, 21 Dec 2004 08:47:15 +0000 (UTC), Terje Mathisen
<spam...@crayne.org> wrote:

>Grumble wrote:
>> Terje Mathisen wrote:

<snip>

>I am really sorry about it, but the only alteratives are for me to
>totally stop posting to clax, or for our friendly moderator to install
>some kind of filter that removes such notices.
>
>OK?

Or:

3. If your posts appear on your local news server, but you don't
get any replies, then probably the news server is not aware that
clax86 is a moderated group. If you can't get their tech support
folks to fix the problem, then email your post to:
clax86...@crayne.org
[This is the direct path to the CLAX86 robomoderator,
and is completely independent of the process by which your news
server
converts moderated posts to email.]

>
>Terje

--
Arargh412 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html

To reply by email, remove the garbage from the reply address.

Matt C.

unread,

Dec 21, 2004, 2:27:16 PM12/21/04

to

For some reason the examples give strange results for
non-powers of two for a base example
129^2 should be 16641, but instead gives me: 16130.96905234060
which is exactly what my original function gave me, i figured it was
my old function not working properly
I should note that I am using this function in a .dll library.
Also, my system is
Intel Celeron 2.7ghz with 512mb and windowsXP w/SP1 and SP2 installed.
Any other suggestions?

Terje Mathisen

unread,

Dec 21, 2004, 4:18:18 PM12/21/04

to

spam...@crayne.org wrote:

> On Tue, 21 Dec 2004 08:47:15 +0000 (UTC), Terje Mathisen

>>I am really sorry about it, but the only alteratives are for me to
>>totally stop posting to clax, or for our friendly moderator to install
>>some kind of filter that removes such notices.
>>

> Or:
>
> 3. If your posts appear on your local news server, but you don't
> get any replies, then probably the news server is not aware that
> clax86 is a moderated group. If you can't get their tech support
> folks to fix the problem, then email your post to:
> clax86...@crayne.org
> [This is the direct path to the CLAX86 robomoderator,
> and is completely independent of the process by which your news
> server
> converts moderated posts to email.]

Great!

I can easily do this, I just have to remember to convert the posting to
mail manually.

So this post should be RFC-compliant. :-)

Charles A. Crayne

unread,

Dec 21, 2004, 10:09:51 PM12/21/04

to

On Tue, 21 Dec 2004 21:18:18 +0000 (UTC)
Terje Mathisen <spam...@crayne.org> wrote:

:I just have to remember to convert the posting to
:mail manually.

If it would be easier for you, you might be interested to know that I can
now also offer the option to receive posts from clax86 by email. No one is
using this option, as yet, but if you would like to try it, just let me
know.

-- Chuck

Matt

unread,

Dec 25, 2004, 11:29:22 PM12/25/04

to

"Matt C." <spam...@crayne.org> wrote in message
news:n4ags05qqkh84ivon...@4ax.com...

Here is some code which will handle the exponentation x^y where y is a
positive integer or zero. Extrapolation to negative integers is easy: take
the reciprocal of x if y is negative and then take the absolute value of y.
If you need fractional exponents, then Terje's method is probably best.

double exp(double base, unsigned int exp)
{
double acc = 1.0;

while(exp > 0)
{
if (exp & 1)
acc *= base;
base *= base;
}

return acc;
}

This code will not lose any bits of precision except what is inherent in
floating-point. A simple assembly translation:

_exp:
fld1
fld qword [esp+4]
mov eax, [esp+12]

; st1 = acc
; st0 = base
test eax, eax
jz .done

.top:
shr eax, 1
jnc .skip
fmul st1, st0

.skip:
fmul st0, st0
jnz .top

.done:
ffreep st0
; st0 = acc
ret

If you're after speed, unrolling that should give a significant speed gain.
The critical path is going to be 32 FP multiplies which is 160 cycles on a
P4. You can cut that to 128 (20% faster) if you use SSE/SSE2 instead. The
disadvantage to SSE/SSE2 is limited compatibility.

-Matt

Justin L. Kennedy

unread,

Jan 5, 2005, 2:32:57 PM1/5/05

to

Matt <spam...@crayne.org> wrote:
> double exp(double base, unsigned int exp)
> {
> double acc = 1.0;

> while(exp > 0)
> {
> if (exp & 1)
> acc *= base;
> base *= base;
> }

> return acc;
> }

It does not appear that exp changes thus the loop will run infinitly. I
think you forgot to shift the lsb off, but I forget how exactly the
algorithm works.

--
Justin L. Kennedy
Georgia Institute of Technology, Atlanta Georgia, 30332
Email: jk...@prism.gatech.edu

Matt

unread,

Jan 10, 2005, 2:36:52 AM1/10/05

to

"Justin L. Kennedy" <spam...@crayne.org> wrote in message
news:crhbca$aa3$1...@news-int2.gatech.edu...

> Matt <spam...@crayne.org> wrote:
>> double exp(double base, unsigned int exp)
>> {
>> double acc = 1.0;
>
>> while(exp > 0)
>> {
>> if (exp & 1)
>> acc *= base;
>> base *= base;
>> }
>
>> return acc;
>> }
>
> It does not appear that exp changes thus the loop will run infinitly. I
> think you forgot to shift the lsb off, but I forget how exactly the
> algorithm works.

Yes, basically add "exp >>= 1;" to the end of that loop:

double exp(double base, unsigned int exp)
{
double acc = 1.0;

while(exp > 0)
{
if (exp & 1)
acc *= base;
base *= base;

exp >>= 1;
}

return acc;
}

Best-case each iteration is 5 cycles. Worst-case each iteration is about 19
cycles (branch taken and mispredicted).

-Matt

Terje Mathisen

unread,

Jan 10, 2005, 7:32:40 AM1/10/05

to

Matt wrote:

Assuming unpredictable bit patterns in the exponent, a branchless
version might be faster:

mov eax,[exp] ; Exponent
fld1 ; Accumulator, return value if (exp==0)
fld [base] ; Current power of base
test eax,eax ; exp > 0 ?
jz done
next:
fld1 ; Multiplicator for zero power
shr eax,1 ; Sets carry if (exp & 1), zero flag if done

fcmovc st,st(1) ; Overwrite 1.0 with base^power if carry

fmulp st(2),st ; acc *= (exp & 1)? base : 1.0;
fmul st,st ; base *= base;
jnz next
done:
fstp st ; FPOP to get rid of base power
ret

Since the two multiplications are independent, this version could run in
just one or two cycles more than the time for a single FMUL, and do so
without any lost time due to branches.