Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion greatest common divisor
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
PatD  
View profile  
 More options Nov 28 1999, 3:00 am
Newsgroups: comp.lang.asm.x86
From: P...@govnet.com (PatD)
Date: 1999/11/28
Subject: Re: greatest common divisor

some people have suggested (or asked why I didn't use)
subtraction-only methods...

why didn't I use it ?
easy, because it's bad !

a nice implementation of that idea can be seen at Paul Hsieh's website:
http://www.azillionmonkeys.com/qed/asmexample.html

it goes like this:

gcd: add     edx,eax
      jne     L1
      mov     eax,1
      jmp     L2
 L1:  mov     ebx,edx ; 1 dep: edx
      add     eax,eax ; 1
      sub     ebx,eax ; 2 dep: eax
      shr     eax,1   ; 2
      mov     ecx,ebx ; 3 dep: ebx
      sar     ebx,1fH ; 3
      and     ebx,ecx ; 4 dep: ecx
      add     eax,ebx ; 5 dep: ebx
      sub     edx,eax ; 6 dep: eax
      test    eax,eax ; 6
      jne     L1      ; 6
      mov     eax,edx
 L2:
(see the page for details)

now, remember the "original" C version ?

int gcd(int a, int b)
{
        int r;
        if(a == 0)
                return b;
        while(b != 0)
        {
                r = a % b;
                a = b;
                b = r;
        }
    return a;

}

what's the advantage of that one over a subtraction-only method ?

there's basically only one: speed !

imagine gcd(1,1000000) ! how many iterations are needed for that one ?
that's the point

the C version may use slow operations, but it only takes a couple of runs
to come up with the result

btw, that algorithm is some 2000 years old ! and still perfectly valid :)

now, on to the real stuff:

while I was toying around with a branchless array-version, I finally
decided to print out all intermediate values
and, after some long look into that...
(boring I might add... pages of pages of numbers only :)
noticed that a couple of steps
can simply be ...stepped over !

first of all, "simplifying" by 2 should be done as much as possible
the smaller the numbers, the faster we get a result
second: there's no need to work with max(a,b) !
it's enough to use max(a,b) - min(a,b) only

even better, no storage space for temporary values is needed
one can simply calculate the next values immediately ! :)

in short, all we have to do is:

get min(a,b)
get max(a,b)
calculate max(a,b) - min(a,b)
simplify those 2 values by 2 as much as possible
loop until max(a,b) - min(a,b) = 0

that's it already

back to our gcd(1,1000000) example
how many iterations do we need ?

1  1 000 000
1     15 625
1      1 953
1         61
1         15
1          7
1          3
1          1
1          0

done ! amazing, isn't it ? :)

btw, worst case input for that alg. is actually 2 consecutive Fibonacci numbers
(Fn = Fn-1 + Fn-2, F0=1, F1=1)

e.g. 89 and 55
the original c version does:
(89,55) -> (55,34) -> (34,21) -> (21,13) -> (13,8) -> (8,5) -> (5,3) -> (3,2) -> (2,1) -> (1,0)
"my" version:
(89,55) -> (55,17) -> (17,19) -> (17,1) -> (1,1) -> (1,0)

:)

(subtraction-only is left as an exercice for the reader)

finally, combining all of the above, some (lots) of ideas from replies
and a bit of thinking of my own, lead to this (final ?) asm version,
(I know, you've all been waiting for this, but I felt like boring you with details first :)

; int gcdAsm(int a, int b)
;
; computes gcd(a,b) according to:
; 1. a   even, b   even: gcd(a,b) = 2 * gcd(a/2,b/2), and remember how often this happened
; 2. a   even, b uneven: gcd(a,b) = gcd(a/2,b)
; 3. a uneven, b   even: gcd(a,b) = gcd(a,b/2)
; 4. a uneven, b uneven: a>b ? a -= b : b -= a, i.e. gcd(a,b) = gcd(min(a,b),max(a,b) - min(a,b))
; do 1., repeat 2. - 4. until a = 0 or b = 0
; return (a + b) corrected by the remembered value from 1.

        BITS 32
        GLOBAL _gcdAsm

        SECTION .text

_gcdAsm:
        push ebp
        mov ebp,esp
        push ebx
        push ecx
        push edx
        push edi

        mov eax,[ebp + 8]       ; eax = a (0 <= a <= 2^31 - 1 (= 2 147 483 647))
        mov ebx,[ebp + 12]      ; ebx = b (0 <= b <= 2^31 - 1)

        ; by definition: gcd(a,0) = a, gcd(0,b) = b, gcd(0,0) = 1 !
        mov ecx,eax
        or ecx,ebx
        bsf ecx,ecx                     ; extract the greatest common power of 2 of a and b...
        jnz notBoth0
        mov eax,1                       ; if a = 0 and b = 0, return 1
        jmp done
notBoth0:
        mov edi,ecx                     ; ...and remember it for later use

        test eax,eax
        jnz aNot0
        mov eax,ebx                     ; if a = 0, return b
        jmp done
aNot0:
        test ebx,ebx
        jz done                         ; if b = 0, return a

        bsf ecx,eax                     ; "simplify" a as much as possible
        shr eax,cl
        bsf ecx,ebx                     ; "simplify" b as much as possible
        shr ebx,cl

mainLoop:
        mov ecx,ebx
        sub ecx,eax                     ; b - a
        sbb edx,edx                     ; edx = 0 if b >= a or -1 if a > b
        and ecx,edx                     ; ecx = 0 if b >= a or b - a if a > b

        add eax,ecx                     ; a-new = min(a,b)
        sub ebx,ecx                     ; b-new = max(a,b)

        sub ebx,eax                     ; we're now sure that that difference is >= 0

        bsf ecx,eax                     ; "simplify" as much as possible by 2
        shr eax,cl

        bsf ecx,ebx                     ; "simplify" as much as possible by 2
        shr ebx,cl

        jnz mainLoop            ; keep looping until ebx = 0

        mov ecx,edi                     ; multiply back with the common power of 2
        shl eax,cl

done:
        pop edi
        pop edx
        pop ecx
        pop ebx
        mov esp,ebp
        pop ebp
        ret                                     ; eax = gcd(a,b)

this should even satisfy those that asked for "cleaner" code with less exit-points :)

("bsf" is really only an option on ppro, p2 or better
 but, since this is supposed to run on p2... :)

all comments, as usual, welcome

Yours

P

" A child of five could understand this, fetch me a child of five "
Groucho Marx


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.