[go] math/big: implement lehmer's GCD algorithm

Brian Kessler (Gerrit)

unread,

Aug 28, 2017, 4:01:55 AM8/28/17

to Ian Lance Taylor, golang-co...@googlegroups.com

Brian Kessler has uploaded this change for review.

math/big: implement lehmer's GCD algorithm

Updates #15833

Lehmer's GCD algorithm uses single precision calculations
to simulate several steps of multiple precision calculations
in Euclid's GCD algorithm which leads to a considerable
speed up.  This implementation uses Collin's simplified
testing condition on the single digit cosequences that
requires only one quotient and avoids any possibility of
overflow.

name                          old time/op  new time/op  delta
GCD10x10/WithoutXY-4          1.82µs ±24%  0.28µs ± 6%  -84.40%  (p=0.008 n=5+5)
GCD10x10/WithXY-4             1.69µs ± 6%  1.71µs ± 6%     ~     (p=0.595 n=5+5)
GCD10x100/WithoutXY-4         1.87µs ± 2%  0.56µs ± 4%  -70.13%  (p=0.008 n=5+5)
GCD10x100/WithXY-4            2.61µs ± 2%  2.65µs ± 4%     ~     (p=0.635 n=5+5)
GCD10x1000/WithoutXY-4        2.75µs ± 2%  1.48µs ± 1%  -46.06%  (p=0.008 n=5+5)
GCD10x1000/WithXY-4           5.29µs ± 2%  5.25µs ± 2%     ~     (p=0.548 n=5+5)
GCD10x10000/WithoutXY-4       10.7µs ± 2%  10.3µs ± 0%   -4.38%  (p=0.008 n=5+5)
GCD10x10000/WithXY-4          22.3µs ± 6%  22.1µs ± 1%     ~     (p=1.000 n=5+5)
GCD10x100000/WithoutXY-4      93.7µs ± 2%  99.4µs ± 2%   +6.09%  (p=0.008 n=5+5)
GCD10x100000/WithXY-4          196µs ± 2%   199µs ± 2%     ~     (p=0.222 n=5+5)
GCD100x100/WithoutXY-4        10.1µs ± 2%   2.5µs ± 2%  -74.84%  (p=0.008 n=5+5)
GCD100x100/WithXY-4           21.4µs ± 2%  21.3µs ± 7%     ~     (p=0.548 n=5+5)
GCD100x1000/WithoutXY-4       11.3µs ± 2%   4.4µs ± 4%  -60.87%  (p=0.008 n=5+5)
GCD100x1000/WithXY-4          24.7µs ± 3%  23.9µs ± 1%     ~     (p=0.056 n=5+5)
GCD100x10000/WithoutXY-4      26.6µs ± 1%  20.0µs ± 2%  -24.82%  (p=0.008 n=5+5)
GCD100x10000/WithXY-4         78.7µs ± 2%  78.2µs ± 2%     ~     (p=0.690 n=5+5)
GCD100x100000/WithoutXY-4      174µs ± 2%   171µs ± 1%     ~     (p=0.056 n=5+5)
GCD100x100000/WithXY-4         563µs ± 4%   561µs ± 2%     ~     (p=1.000 n=5+5)
GCD1000x1000/WithoutXY-4       120µs ± 5%    29µs ± 3%  -75.71%  (p=0.008 n=5+5)
GCD1000x1000/WithXY-4          355µs ± 4%   358µs ± 2%     ~     (p=0.841 n=5+5)
GCD1000x10000/WithoutXY-4      140µs ± 2%    49µs ± 2%  -65.07%  (p=0.008 n=5+5)
GCD1000x10000/WithXY-4         626µs ± 3%   628µs ± 9%     ~     (p=0.690 n=5+5)
GCD1000x100000/WithoutXY-4     340µs ± 4%   259µs ± 6%  -23.79%  (p=0.008 n=5+5)
GCD1000x100000/WithXY-4       3.76ms ± 4%  3.82ms ± 5%     ~     (p=0.310 n=5+5)
GCD10000x10000/WithoutXY-4    3.11ms ± 3%  0.54ms ± 2%  -82.74%  (p=0.008 n=5+5)
GCD10000x10000/WithXY-4       7.96ms ± 3%  7.69ms ± 3%     ~     (p=0.151 n=5+5)
GCD10000x100000/WithoutXY-4   3.88ms ± 1%  1.27ms ± 2%  -67.21%  (p=0.008 n=5+5)
GCD10000x100000/WithXY-4      38.1ms ± 2%  38.8ms ± 1%     ~     (p=0.095 n=5+5)
GCD100000x100000/WithoutXY-4   208ms ± 1%    25ms ± 4%  -88.07%  (p=0.008 n=5+5)
GCD100000x100000/WithXY-4      533ms ± 5%   525ms ± 4%     ~     (p=0.548 n=5+5)

Change-Id: Ic1e007eb807b93e75f4752e968e98c1f0cb90e43
---
M src/math/big/int.go
M src/math/big/int_test.go
M src/math/big/rat.go
3 files changed, 121 insertions(+), 68 deletions(-)

diff --git a/src/math/big/int.go b/src/math/big/int.go
index 63a750c..a47eb12 100644
--- a/src/math/big/int.go
+++ b/src/math/big/int.go
@@ -466,7 +466,7 @@
 		return z
 	}
 	if x == nil && y == nil {
-		return z.binaryGCD(a, b)
+		return z.lehmerGCD(a, b)
 	}
 
 	A := new(Int).Set(a)
@@ -505,64 +505,127 @@
 	return z
 }
 
-// binaryGCD sets z to the greatest common divisor of a and b, which both must
-// be > 0, and returns z.
-// See Knuth, The Art of Computer Programming, Vol. 2, Section 4.5.2, Algorithm B.
-func (z *Int) binaryGCD(a, b *Int) *Int {
-	u := z
-	v := new(Int)
+// lehmerGCD sets z to the greatest common divisor of a and b,
+// both of which must be > 0, and returns z.
+// See Knuth, The Art of Computer Programming, Vol. 2, Section 4.5.2, Algorithm L.
+// This implementation uses the improved condition by Collin's requiring only one
+// quotient and avoiding the possibility of single Word overflow.
+// See Jebelean, "Improving the multiprecision Euclidean algorithm"
+// Design and Implementation of Symbolic Computation Systems pp 45-58
+func (z *Int) lehmerGCD(a, b *Int) *Int {
 
-	// use one Euclidean iteration to ensure that u and v are approx. the same size
-	switch {
-	case len(a.abs) > len(b.abs):
-		// must set v before u since u may be alias for a or b (was issue #11284)
-		v.Rem(a, b)
-		u.Set(b)
-	case len(a.abs) < len(b.abs):
-		v.Rem(b, a)
-		u.Set(a)
+	A := new(Int)
+	B := new(Int)
+
+	// ensure A >= B
+	switch a.abs.cmp(b.abs) {
+	case -1:
+		B.Set(a)
+		A.Set(b)
 	default:
-		v.Set(b)
-		u.Set(a)
+		B.Set(b)
+		A.Set(a)
 	}
-	// a, b must not be used anymore (may be aliases with u)
 
-	// v might be 0 now
-	if len(v.abs) == 0 {
-		return u
-	}
-	// u > 0 && v > 0
-
-	// determine largest k such that u = u' << k, v = v' << k
-	k := u.abs.trailingZeroBits()
-	if vk := v.abs.trailingZeroBits(); vk < k {
-		k = vk
-	}
-	u.Rsh(u, k)
-	v.Rsh(v, k)
-
-	// determine t (we know that u > 0)
+	// temp variables for multiprecision update
 	t := new(Int)
-	if u.abs[0]&1 != 0 {
-		// u is odd
-		t.Neg(v)
-	} else {
-		t.Set(u)
-	}
+	r := new(Int)
+	s := new(Int)
+	w := new(Int)
 
-	for len(t.abs) > 0 {
-		// reduce t
-		t.Rsh(t, t.abs.trailingZeroBits())
-		if t.neg {
-			v, t = t, v
-			v.neg = len(v.abs) > 0 && !v.neg // 0 has no sign
-		} else {
-			u, t = t, u
+	// loop invariant A >= B
+	for len(B.abs) > 1 {
+
+		// Initialize the digits
+		var a1, a2, u0, u1, u2, v0, v1, v2 Word
+
+		m := len(B.abs) // m >= 2
+		n := len(A.abs) // n >= m >= 2
+
+		// extract the top Word of bits from A and B
+		h := nlz(A.abs[n-1])
+		a1 = (A.abs[n-1] << h) | (A.abs[n-2] >> (_W - h))
+		// B may have implicit zero words in the high bits if the lengths differ
+		switch {
+		case n == m:
+			a2 = (B.abs[n-1] << h) | (B.abs[n-2] >> (_W - h))
+		case n == m+1:
+			a2 = (B.abs[n-2] >> (_W - h))
+		default:
+			a2 = 0
 		}
-		t.Sub(u, v)
+
+		// Since we are calculating with full words to avoid overflow
+		// we use even to track the sign of the cosequences
+		// for even iterations: u0, v1 >= 0 && u1, v0 <= 0
+		// for odd  iterations: u0, v1 <= 0 && u1, v0 >= 0
+		// the first iteration starts with k=1 (odd)
+		even := false
+		// variables to track the cosequences
+		u0, u1, u2 = 0, 1, 0
+		v0, v1, v2 = 0, 0, 1
+
+		// Calculate the quotient and cosequences using Collin's stopping condition
+		for a2 >= v2 && (a1-a2) >= (v1+v2) {
+			q := a1 / a2
+			a1, a2 = a2, a1-q*a2
+			u0, u1, u2 = u1, u2, u1+q*u2
+			v0, v1, v2 = v1, v2, v1+q*v2
+			even = !even
+		}
+
+		// Multiprecision Step
+		if v0 != 0 {
+			// Simulate the effect of the single precision steps using the cosequences
+			// A = u0*A + v0*B
+			// B = u1*A + v1*B
+
+			t.abs = t.abs.setWord(u0)
+			s.abs = s.abs.setWord(v0)
+			t.neg = !even
+			s.neg = even
+
+			t = t.Mul(A, t)
+			s = s.Mul(B, s)
+
+			r.abs = r.abs.setWord(u1)
+			w.abs = w.abs.setWord(v1)
+			r.neg = even
+			w.neg = !even
+
+			r = r.Mul(A, r)
+			w = w.Mul(B, w)
+
+			A = A.Add(t, s)
+			B = B.Add(r, w)
+
+		} else {
+			// Single-digit calculations failed to simluate any quotients
+			// do a standard Euclidean step
+			t = t.Rem(A, B)
+			A, B, t = B, t, A
+		}
 	}
 
-	return z.Lsh(u, k)
+	if len(B.abs) > 0 {
+		// standard Euclidean algorithm base case for B a single Word
+		if len(A.abs) > 1 {
+			// A is longer than a single Word
+			t = t.Rem(A, B)
+			A, B, t = B, t, A
+		}
+		if len(B.abs) > 0 {
+			// A and B are both a single Word
+			a1, a2 := A.abs[0], B.abs[0]
+			for a2 != 0 {
+				a1, a2 = a2, a1%a2
+			}
+			A.abs[0] = a1
+		}
+	}
+	*z = *A
+
+	return z
 }
 
 // Rand sets z to a pseudo-random number in [0, n) and returns z.
diff --git a/src/math/big/int_test.go b/src/math/big/int_test.go
index 65e24f1..f346c81 100644
--- a/src/math/big/int_test.go
+++ b/src/math/big/int_test.go
@@ -708,38 +708,28 @@
 
 	D := new(Int).GCD(X, Y, a, b)
 	if D.Cmp(d) != 0 {
-		t.Errorf("GCD(%s, %s): got d = %s, want %s", a, b, D, d)
+		t.Errorf("GCD(%s, %s, %s, %s): got d = %s, want %s", x, y, a, b, D, d)
 	}
 	if x != nil && X.Cmp(x) != 0 {
-		t.Errorf("GCD(%s, %s): got x = %s, want %s", a, b, X, x)
+		t.Errorf("GCD(%s, %s, %s, %s): got x = %s, want %s", x, y, a, b, X, x)
 	}
 	if y != nil && Y.Cmp(y) != 0 {
-		t.Errorf("GCD(%s, %s): got y = %s, want %s", a, b, Y, y)
-	}
-
-	// binaryGCD requires a > 0 && b > 0
-	if a.Sign() <= 0 || b.Sign() <= 0 {
-		return
-	}
-
-	D.binaryGCD(a, b)
-	if D.Cmp(d) != 0 {
-		t.Errorf("binaryGcd(%s, %s): got d = %s, want %s", a, b, D, d)
+		t.Errorf("GCD(%s, %s, %s, %s): got y = %s, want %s", x, y, a, b, Y, y)
 	}
 
 	// check results in presence of aliasing (issue #11284)
 	a2 := new(Int).Set(a)
 	b2 := new(Int).Set(b)
-	a2.binaryGCD(a2, b2) // result is same as 1st argument
+	a2.GCD(X, Y, a2, b2) // result is same as 1st argument
 	if a2.Cmp(d) != 0 {
-		t.Errorf("binaryGcd(%s, %s): got d = %s, want %s", a, b, a2, d)
+		t.Errorf("aliased z = a GCD(%s, %s, %s, %s): got d = %s, want %s", x, y, a, b, a2, d)
 	}
 
 	a2 = new(Int).Set(a)
 	b2 = new(Int).Set(b)
-	b2.binaryGCD(a2, b2) // result is same as 2nd argument
+	b2.GCD(X, Y, a2, b2) // result is same as 2nd argument
 	if b2.Cmp(d) != 0 {
-		t.Errorf("binaryGcd(%s, %s): got d = %s, want %s", a, b, b2, d)
+		t.Errorf("aliased z = b GCD(%s, %s, %s, %s): got d = %s, want %s", x, y, a, b, b2, d)
 	}
 }
 
diff --git a/src/math/big/rat.go b/src/math/big/rat.go
index f0f436e..99bc09b 100644
--- a/src/math/big/rat.go
+++ b/src/math/big/rat.go
@@ -422,7 +422,7 @@
 		neg := z.a.neg
 		z.a.neg = false
 		z.b.neg = false
-		if f := NewInt(0).binaryGCD(&z.a, &z.b); f.Cmp(intOne) != 0 {
+		if f := NewInt(0).GCD(nil, nil, &z.a, &z.b); f.Cmp(intOne) != 0 {
 			z.a.abs, _ = z.a.abs.div(nil, z.a.abs, f.abs)
 			z.b.abs, _ = z.b.abs.div(nil, z.b.abs, f.abs)
 			if z.b.abs.cmp(natOne) == 0 {

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Aug 28, 2017, 4:04:35 AM8/28/17

to Robert Griesemer, golang-co...@googlegroups.com

Hi,

I had a go at implementing Lehmer's algorithm for the GCD and got pretty good speed up for a basic implementation. It should be straightforward to apply the same algorithm to the extended GCD case if this looks good.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Emmanuel Odeke (Gerrit)

unread,

Aug 28, 2017, 3:15:51 PM8/28/17

to Brian Kessler, Robert Griesemer, golang-co...@googlegroups.com

Very cool, thank you @briankessler!

Patch set 1:Run-TryBot +1

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Gobot Gobot (Gerrit)

unread,

Aug 28, 2017, 3:27:25 PM8/28/17

to Brian Kessler, Emmanuel Odeke, Robert Griesemer, golang-co...@googlegroups.com

TryBots beginning. Status page: https://farmer.golang.org/try?commit=dfe843ec

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Gobot Gobot (Gerrit)

unread,

Aug 28, 2017, 3:38:14 PM8/28/17

to Brian Kessler, Emmanuel Odeke, Robert Griesemer, golang-co...@googlegroups.com

TryBots are happy.

Patch set 1:TryBot-Result +1

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Aug 29, 2017, 9:01:19 AM8/29/17

to Brian Kessler, Robert Griesemer, Gobot Gobot, Emmanuel Odeke, golang-co...@googlegroups.com

Some suggestions for improvement, some questions, and some nit picks regarding your comments.

View Change

16 comments:

File src/math/big/int.go:
- Patch Set #1, Line 509: // both of which must be > 0, and returns z.
  which both must be > 0
- Patch Set #1, Line 511: Collin's
  s/Collin's/Collins/
  (according to the citations in the paper below)
- Patch Set #1, Line 513: "
  please add a comma at the end of this line
- Patch Set #1, Line 514:
  s/pp 45-58/, pp 45-48./
- Patch Set #1, Line 517: A := new(Int)
  simpler:
  A := new(Int).Set(a)
  B := new(Int).Set(b)

- ```
// ensure A >= B
```

- ```
if A.abs.cmp(b.abs) < 0 {
    A, B = B, A
}
```
  (these are pointer swaps, so this is cheap)
  But (read on):
  In the end you want the result in z. That is, ideally you want A to be z so we can re-use z's memory and we don't need the assignment *z = *A before the return.
  Because a and b are never used anymore in this function body (I believe), we can do:
```
// ensure a >= b
if a.abs.cmp(b.abs) < 0 {
   a, b = b, a
}
```
  // don't destroy incoming values of a and b
  B := new(Int).Set(b) // do this first in case b is an alias of z
  A := z.Set(a)
  With the first assignment we make sure to have a copy of b. If b is an alias for z it doesn't matter, because b has been copied before we change z. Then we use z as A. If z is an alias for a, the assignment is a no-op (superfluous, but still a no-op). Otherwise it uses z's memory if it can. At the end, because A == z, we don't need the *z = *A assignment.
- Patch Set #1, Line 539: Initialize
  A comment on comments:
  In this package I've used all lower-case comments when the comment is short and when it's not a sentence. Here:
  // initialize the digits
  I've used Uppercase (first word) and end with a period, if the comment is a proper sentence. In that case I also try to do proper punctuation.
  Please try to follow this existing style.
- Patch Set #1, Line 568: Calculate
  s/Calculate/calculate/ or use a period at the end.
- Patch Set #1, Line 569: (
  no need for these parentheses (both sides of >=)
- Patch Set #1, Line 577: // Multiprecision Step
  lowercase M
- Patch Set #1, Line 588:
  These are all integers (pointers) not nat slices. There's no need to assign to the receiver as it can be updated in place:
  t.Mul(A, t)
  is sufficient (no need to assign to t).
- Patch Set #1, Line 589: .
  ditto
- Patch Set #1, Line 596: Mul
  ditto for all 4 lines
- Patch Set #1, Line 605: .
  and here as well
- Patch Set #1, Line 614: t
  and here
- Patch Set #1, Line 626:
  This throws away whatever underlying array belonged to z. What you want is to re-use that array if at all possible and avoid this assignment altogether.
File src/math/big/rat.go:
- Patch Set #1, Line 425: if f := NewInt(0).GCD(nil, nil, &z.a, &z.b); f.Cmp(intOne) != 0 {
  why can't this be lehmerGCD? (both z.a and z.b are > 0)

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Aug 30, 2017, 3:14:45 AM8/30/17

to Robert Griesemer, Gobot Gobot, Emmanuel Odeke, golang-co...@googlegroups.com

I made most of the updates. There are still a few open items around the reuse of backing arrays. Also, I have the extended Lehmer algorithm working with similar performance improvements.

https://perf.golang.org/search?q=upload:20170829.1

Its basically identical except the update steps are the Extended euclidean algorithm. Should I include that with this CL or a subsequent one.

View Change

16 comments:

File src/math/big/int.go:

- Patch Set #1, Line 509: // both of which must be > 0, and returns z.
  which both must be > 0

- Done

- Patch Set #1, Line 511: Collin's

- s/Collin's/Collins/ […]
  Done

- Patch Set #1, Line 513: "
  please add a comma at the end of this line

- Done

- Patch Set #1, Line 514:

- s/pp 45-58/, pp 45-48. […]
  Done

- Patch Set #1, Line 517: A := new(Int)

- simpler: […]
  I updated to use this simplification and reuse z's memory. But I think the assignment is still necessary due to pointer swaps in the loop causing A and z to point to different Ints.

- Patch Set #1, Line 539: Initialize

- A comment on comments: […]
  Done

- Patch Set #1, Line 568: Calculate
  s/Calculate/calculate/ or use a period at the end.

- Done

- Patch Set #1, Line 569: (
  no need for these parentheses (both sides of >=)

- Done

- Patch Set #1, Line 577: // Multiprecision Step
  lowercase M

- Done

- Patch Set #1, Line 588:

- These are all integers (pointers) not nat slices. […]
  Related to the above comments about reusing z's backing array, updating these in place will throw out the backing arrays. I haven't benchmarked it, but do you think its worth checking if introducing a couple of more temp variables here is worthwhile to avoid extra allocations?

- Patch Set #1, Line 589: .
  ditto

- Done

- Patch Set #1, Line 596: Mul
  ditto for all 4 lines

- Done

- Patch Set #1, Line 605: .
  and here as well

- Done

- Patch Set #1, Line 614: t
  and here

- Done

- Patch Set #1, Line 626:

- This throws away whatever underlying array belonged to z. […]
  With the update above, I think this is still needed because A and z may not point to the same Int at the end due to the pointer swaps in the Euclidean update.

- A, B, t = B, t, A

- but after the change above, its not wasting z's underlying nat array since that was used to back A.
File src/math/big/rat.go:

- Patch Set #1, Line 425: if f := NewInt(0).GCD(nil, nil, &z.a, &z.b); f.Cmp(intOne) != 0 {

- why can't this be lehmerGCD? (both z.a and z. […]
  No specific reason. I just used the generic GCD in case the internal implementation changed at some point in the future, but using lehmerGCD avoids the extra checks so I will update.

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Aug 30, 2017, 3:15:49 AM8/30/17

to Emmanuel Odeke, Gobot Gobot, Robert Griesemer, golang-co...@googlegroups.com

Brian Kessler uploaded patch set #2 to this change.

View Change

3 files changed, 116 insertions(+), 68 deletions(-)

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Aug 30, 2017, 7:22:51 AM8/30/17

to Brian Kessler, Robert Griesemer, Gobot Gobot, Emmanuel Odeke, golang-co...@googlegroups.com

I have some more minor nits. The Go code looks ok, but I need to think it through a bit more, or find another pair of eyes (I'm not familiar with the respective math; and this is a pretty crucial piece of code).

View Change

8 comments:

Commit Message:
- Patch Set #2, Line 7: lehmer's
  s/lehmer/Lehmer/
- Patch Set #2, Line 14: '
  s/Collin's/Collins'/
- Patch Set #2, Line 15: that
  s/that/which/
File src/math/big/int.go:
- Patch Set #2, Line 556: for
  s/for/For/
- Patch Set #2, Line 557: for
  ditto
- Patch Set #2, Line 558: the
  s/the/The/
  and add a period at the end
- Patch Set #2, Line 599: Single
  c/Single/single/
  (or add a period at the end of the sentence)
- Patch Set #2, Line 622:
  agreed that this is needed
  one could perhaps do
```
if A != z {
   *z = *A
}
```
  but an integer is small, so it may not be worth it (or even be counterproductive)

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Aug 30, 2017, 1:25:43 PM8/30/17

to Brian Kessler, Adam Langley, Aliaksandr Valialkin, Robert Griesemer, Gobot Gobot, Emmanuel Odeke, golang-co...@googlegroups.com

Robert Griesemer removed Adam Langley from this change.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Aug 30, 2017, 1:27:15 PM8/30/17

to Brian Kessler, Robert Griesemer, Aliaksandr Valialkin, Gobot Gobot, Emmanuel Odeke, golang-co...@googlegroups.com

Removed reviewer Adam Langley.

Aliaksandr, I removed Adam. While he would be great he doesn't have any bandwidth for at the moment.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Aug 30, 2017, 3:14:29 PM8/30/17

to Robert Griesemer, Aliaksandr Valialkin, Gobot Gobot, Emmanuel Odeke, golang-co...@googlegroups.com

I cleaned up all of the comments and commit message. I will make sure to follow the standard comment style in the future.

Definitely agree that it deserves a thorough confirmation. I choose the basic implementation from the first section of the Jebelean paper because it seemed to be the easiest to verify correctness and didn't contain the edge cases of the version in Knuth. Would it be worthwhile to include a slow but definitely correct implementation of Euclid's algorithm solely for testing? There is currently checkGcd called using quick.Check that confirms d = x*a + b*y for the extended algorithm. I could include another quick.Check that confirms both implementations produce the same results for further confidence.

View Change

7 comments:

Commit Message:

- Patch Set #2, Line 7: lehmer's
  s/lehmer/Lehmer/

- Done

- Patch Set #2, Line 14: '
  s/Collin's/Collins'/

- Done

- Patch Set #2, Line 15: that
  s/that/which/

- Done
File src/math/big/int.go:

- Patch Set #2, Line 556: for
  s/for/For/

- Done

- Patch Set #2, Line 557: for
  ditto

- Done

- Patch Set #2, Line 558: the

- s/the/The/ […]
  Done

- Patch Set #2, Line 599: Single

- c/Single/single/ […]
  Done

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Aug 30, 2017, 3:15:03 PM8/30/17

to Emmanuel Odeke, Gobot Gobot, Robert Griesemer, Aliaksandr Valialkin, golang-co...@googlegroups.com

Brian Kessler uploaded patch set #3 to this change.

View Change

math/big: implement Lehmer's GCD algorithm



Updates #15833

Lehmer's GCD algorithm uses single precision calculations
to simulate several steps of multiple precision calculations
in Euclid's GCD algorithm which leads to a considerable

speed up.  This implementation uses Collins' simplified
testing condition on the single digit cosequences which

3 files changed, 116 insertions(+), 68 deletions(-)

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Aug 31, 2017, 5:19:19 AM8/31/17

to Brian Kessler, Robert Griesemer, Aliaksandr Valialkin, Gobot Gobot, Emmanuel Odeke, golang-co...@googlegroups.com

Patch Set 2:
(7 comments)

I cleaned up all of the comments and commit message. I will make sure to follow the standard comment style in the future.
Definitely agree that it deserves a thorough confirmation. I choose the basic implementation from the first section of the Jebelean paper because it seemed to be the easiest to verify correctness and didn't contain the edge cases of the version in Knuth. Would it be worthwhile to include a slow but definitely correct implementation of Euclid's algorithm solely for testing? There is currently checkGcd called using quick.Check that confirms d = x*a + b*y for the extended algorithm. I could include another quick.Check that confirms both implementations produce the same results for further confidence.

Yes, that seems worthwhile: add the basic Euclid's algorithm to the testing code and use it to verify the fast version.

I'm pretty excited to see this happen as it should also speed up rational arithmetic significantly.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Sep 2, 2017, 3:15:43 PM9/2/17

to Emmanuel Odeke, Gobot Gobot, Robert Griesemer, Aliaksandr Valialkin, golang-co...@googlegroups.com

Brian Kessler uploaded patch set #4 to this change.

View Change

math/big: implement Lehmer's GCD algorithm



Updates #15833

Lehmer's GCD algorithm uses single precision calculations
to simulate several steps of multiple precision calculations
in Euclid's GCD algorithm which leads to a considerable

speed up.  This implementation uses Collins' simplified
testing condition on the single digit cosequences which

3 files changed, 153 insertions(+), 68 deletions(-)

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Sep 2, 2017, 3:28:18 PM9/2/17

to Robert Griesemer, Aliaksandr Valialkin, Gobot Gobot, Emmanuel Odeke, golang-co...@googlegroups.com

Patch Set 3:

Patch Set 2:
(7 comments)
I cleaned up all of the comments and commit message. I will make sure to follow the standard comment style in the future.
Definitely agree that it deserves a thorough confirmation. I choose the basic implementation from the first section of the Jebelean paper because it seemed to be the easiest to verify correctness and didn't contain the edge cases of the version in Knuth. Would it be worthwhile to include a slow but definitely correct implementation of Euclid's algorithm solely for testing? There is currently checkGcd called using quick.Check that confirms d = x*a + b*y for the extended algorithm. I could include another quick.Check that confirms both implementations produce the same results for further confidence.
Yes, that seems worthwhile: add the basic Euclid's algorithm to the testing code and use it to verify the fast version.
I'm pretty excited to see this happen as it should also speed up rational arithmetic significantly.

Ok, I added the reference Euclid's algorithm for testing with quick.Check.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Sep 2, 2017, 5:09:32 PM9/2/17

to Brian Kessler, Robert Griesemer, Aliaksandr Valialkin, Gobot Gobot, Emmanuel Odeke, golang-co...@googlegroups.com

Patch Set 3:
> > Patch Set 2:
>
> (7 comments)
>
> I cleaned up all of the comments and commit message. I will
make sure to follow the standard comment style in the future.
>
> Definitely agree that it deserves a thorough confirmation. I
choose the basic implementation from the first section of the
Jebelean paper because it seemed to be the easiest to verify
correctness and didn't contain the edge cases of the version in
Knuth. Would it be worthwhile to include a slow but definitely
correct implementation of Euclid's algorithm solely for testing?
There is currently checkGcd called using quick.Check that confirms
d = x*a + b*y for the extended algorithm. I could include another
quick.Check that confirms both implementations produce the same
results for further confidence.
> Yes, that seems worthwhile: add the basic Euclid's algorithm to
the testing code and use it to verify the fast version.
> I'm pretty excited to see this happen as it should also speed up
rational arithmetic significantly.
Ok, I added the reference Euclid's algorithm for testing with
quick.Check.

Thanks. Just FYI I am ooo now for two weeks so most likely I won't get to this before I return. If you don't hear from me by 9/20 please ping me. Thanks.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Emmanuel Odeke (Gerrit)

unread,

Sep 3, 2017, 6:32:12 PM9/3/17

to Brian Kessler, Robert Griesemer, Aliaksandr Valialkin, Gobot Gobot, golang-co...@googlegroups.com

R=gri

I am running the trybots and I've also added a tag to remind @gri when he returns mid month

Patch set 4:Run-TryBot +1

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Gobot Gobot (Gerrit)

unread,

Sep 3, 2017, 6:40:37 PM9/3/17

to Brian Kessler, Emmanuel Odeke, Robert Griesemer, Aliaksandr Valialkin, golang-co...@googlegroups.com

TryBots beginning. Status page: https://farmer.golang.org/try?commit=527d3d58

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Gobot Gobot (Gerrit)

unread,

Sep 3, 2017, 6:51:18 PM9/3/17

to Brian Kessler, Emmanuel Odeke, Robert Griesemer, Aliaksandr Valialkin, golang-co...@googlegroups.com

TryBots are happy.

Patch set 4:TryBot-Result +1

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Sep 25, 2017, 1:31:16 AM9/25/17

to goph...@pubsubhelper.golang.org, Gobot Gobot, Emmanuel Odeke, Robert Griesemer, Aliaksandr Valialkin, golang-co...@googlegroups.com

Patch Set 4:

Patch Set 3:
> > Patch Set 2:
>
> (7 comments)
>
> I cleaned up all of the comments and commit message. I will
make sure to follow the standard comment style in the future.
>
> Definitely agree that it deserves a thorough confirmation. I
choose the basic implementation from the first section of the
Jebelean paper because it seemed to be the easiest to verify
correctness and didn't contain the edge cases of the version in
Knuth. Would it be worthwhile to include a slow but definitely
correct implementation of Euclid's algorithm solely for testing?
There is currently checkGcd called using quick.Check that confirms
d = x*a + b*y for the extended algorithm. I could include another
quick.Check that confirms both implementations produce the same
results for further confidence.
> Yes, that seems worthwhile: add the basic Euclid's algorithm to
the testing code and use it to verify the fast version.
> I'm pretty excited to see this happen as it should also speed up
rational arithmetic significantly.
Ok, I added the reference Euclid's algorithm for testing with
quick.Check.
Thanks. Just FYI I am ooo now for two weeks so most likely I won't get to this before I return. If you don't hear from me by 9/20 please ping me. Thanks.

Pinging to see if you had a chance to look at this again or identified any additional reviewers. Thanks.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Sep 25, 2017, 9:27:46 AM9/25/17

to Brian Kessler, goph...@pubsubhelper.golang.org, Robert Griesemer, Gobot Gobot, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

Apologies for the delay.

View Change

2 comments:

File src/math/big/int_test.go:
- Patch Set #4, Line 681: func (z *Int) euclidGCD(a, b *Int) *Int {
  You are always throwing away z; also this is testing code. No need to pass in z. Just make this a function of a and b returning a result.
- Patch Set #4, Line 701: if a.Cmp(zero) <= 0 || b.Cmp(zero) <= 0 {
  if a.Sign() <= 0 || b.Sign() <= 0 ...
  and get rid of zero.

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Sep 26, 2017, 1:07:31 AM9/26/17

to goph...@pubsubhelper.golang.org, Robert Griesemer, Gobot Gobot, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

Cleaned up the test.

View Change

2 comments:

File src/math/big/int_test.go:

- Patch Set #4, Line 681: func (z *Int) euclidGCD(a, b *Int) *Int {

- You are always throwing away z; also this is testing code. No need to pass in z. […]
  Done

- Patch Set #4, Line 701: if a.Cmp(zero) <= 0 || b.Cmp(zero) <= 0 {

- if a.Sign() <= 0 || b.Sign() <= 0 ... […]
  Done

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Sep 26, 2017, 1:08:57 AM9/26/17

to Emmanuel Odeke, Gobot Gobot, Robert Griesemer, goph...@pubsubhelper.golang.org, Aliaksandr Valialkin, golang-co...@googlegroups.com

Brian Kessler uploaded patch set #6 to this change.

View Change

math/big: implement Lehmer's GCD algorithm



Updates #15833

Lehmer's GCD algorithm uses single precision calculations
to simulate several steps of multiple precision calculations
in Euclid's GCD algorithm which leads to a considerable

speed up.  This implementation uses Collins' simplified
testing condition on the single digit cosequences which

3 files changed, 151 insertions(+), 68 deletions(-)

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Oct 3, 2017, 7:11:08 PM10/3/17

to Brian Kessler, goph...@pubsubhelper.golang.org, Robert Griesemer, Gobot Gobot, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

LGTM.

I'd still love to have somebody with expertise in GCD look over this. I haven't forgotten. Sorry about all these delays.

Patch set 6:Run-TryBot +1

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Gobot Gobot (Gerrit)

unread,

Oct 3, 2017, 7:11:45 PM10/3/17

to Brian Kessler, goph...@pubsubhelper.golang.org, Robert Griesemer, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

TryBots beginning. Status page: https://farmer.golang.org/try?commit=b4cd2a75

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Gobot Gobot (Gerrit)

unread,

Oct 3, 2017, 7:23:03 PM10/3/17

to Brian Kessler, goph...@pubsubhelper.golang.org, Robert Griesemer, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

TryBots are happy.

Patch set 6:TryBot-Result +1

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Oct 10, 2017, 12:20:43 AM10/10/17

to goph...@pubsubhelper.golang.org, Gobot Gobot, Robert Griesemer, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

Patch Set 6: Run-TryBot+1

LGTM.
I'd still love to have somebody with expertise in GCD look over this. I haven't forgotten. Sorry about all these delays.

Sounds good. Let me know if there is anything else you'd like from me. I'm not sure if I could help find reviewers at all.

I also have the code for the extended algorithm ready to go. I was waiting on that CL in case there were updates on this one since they share the same single-digit code.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Aliaksandr Valialkin (Gerrit)

unread,

Oct 22, 2017, 12:02:02 PM10/22/17

to Brian Kessler, goph...@pubsubhelper.golang.org, Filippo Valsorda, Gobot Gobot, Robert Griesemer, Emmanuel Odeke, golang-co...@googlegroups.com

cc'ing Filippo, since this may optimize crypto package.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Oct 24, 2017, 6:42:13 PM10/24/17

to Brian Kessler, goph...@pubsubhelper.golang.org, Robert Griesemer, Filippo Valsorda, Gobot Gobot, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

Hi Brian;

Uri Mendlovic from Google who is familiar with Lehmer GCD took a look and he had the following comments:

The performance is O(num words ^ 2), which can be improved. This should help for numbers with tens of words or more.

There is a variant that manipulates the leading words (a1, a2) by aligning them to the MSBit. This replaces the division/multiplication with shifts and may be faster.

I saw that there is also another naive implementation for the case of extended GCD (when the coefficients is of interest). This implementation is much slower than both the old binary GCD and the purposed implementation. The new implementation can be modified to return the coefficient with small (x2) performance loss, and thus replacing the duplicate implementation.

----

Overall he agrees that the code is algorithmically correct. I'm going to approve this and check it in so we can move forward with incremental changes.

Thanks for doing this and bearing with me.

gri

Patch set 6:Code-Review +2

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Oct 24, 2017, 6:42:49 PM10/24/17

to Robert Griesemer, Brian Kessler, goph...@pubsubhelper.golang.org, golang-...@googlegroups.com, Filippo Valsorda, Gobot Gobot, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

Robert Griesemer merged this change.

View Change

Approvals:
  Robert Griesemer: Looks good to me, approved; Run TryBots
  Gobot Gobot: TryBots succeeded

Reviewed-on: https://go-review.googlesource.com/59450
Run-TryBot: Robert Griesemer <g...@golang.org>
TryBot-Result: Gobot Gobot <go...@golang.org>
Reviewed-by: Robert Griesemer <g...@golang.org>


---
M src/math/big/int.go
M src/math/big/int_test.go
M src/math/big/rat.go
3 files changed, 151 insertions(+), 68 deletions(-)

diff --git a/src/math/big/int.go b/src/math/big/int.go
index 73d48de..c5ff672 100644
--- a/src/math/big/int.go
+++ b/src/math/big/int.go
@@ -476,7 +476,7 @@
 		return z
 	}
 	if x == nil && y == nil {
-		return z.binaryGCD(a, b)
+		return z.lehmerGCD(a, b)
 	}
 
 	A := new(Int).Set(a)
@@ -515,64 +515,122 @@
 	return z
 }
 
-// binaryGCD sets z to the greatest common divisor of a and b, which both must
-// be > 0, and returns z.
-// See Knuth, The Art of Computer Programming, Vol. 2, Section 4.5.2, Algorithm B.
-func (z *Int) binaryGCD(a, b *Int) *Int {
-	u := z
-	v := new(Int)
+// lehmerGCD sets z to the greatest common divisor of a and b,
+// which both must be > 0, and returns z.
+// See Knuth, The Art of Computer Programming, Vol. 2, Section 4.5.2, Algorithm L.
+// This implementation uses the improved condition by Collins requiring only one
+// quotient and avoiding the possibility of single Word overflow.
+// See Jebelean, "Improving the multiprecision Euclidean algorithm",
+// Design and Implementation of Symbolic Computation Systems, pp 45-58.
+func (z *Int) lehmerGCD(a, b *Int) *Int {
 
-	// use one Euclidean iteration to ensure that u and v are approx. the same size
-	switch {
-	case len(a.abs) > len(b.abs):
-		// must set v before u since u may be alias for a or b (was issue #11284)
-		v.Rem(a, b)
-		u.Set(b)
-	case len(a.abs) < len(b.abs):
-		v.Rem(b, a)
-		u.Set(a)
-	default:
-		v.Set(b)
-		u.Set(a)
+	// ensure a >= b
+	if a.abs.cmp(b.abs) < 0 {
+		a, b = b, a
 	}
-	// a, b must not be used anymore (may be aliases with u)
 
-	// v might be 0 now
-	if len(v.abs) == 0 {
-		return u
-	}
-	// u > 0 && v > 0
+	// don't destroy incoming values of a and b
+	B := new(Int).Set(b) // must be set first in case b is an alias of z
+	A := z.Set(a)
 
-	// determine largest k such that u = u' << k, v = v' << k
-	k := u.abs.trailingZeroBits()
-	if vk := v.abs.trailingZeroBits(); vk < k {
-		k = vk
-	}
-	u.Rsh(u, k)
-	v.Rsh(v, k)
-
-	// determine t (we know that u > 0)
+	// temp variables for multiprecision update
 	t := new(Int)
-	if u.abs[0]&1 != 0 {
-		// u is odd
-		t.Neg(v)
-	} else {
-		t.Set(u)
-	}
+	r := new(Int)
+	s := new(Int)
+	w := new(Int)
 
-	for len(t.abs) > 0 {
-		// reduce t
-		t.Rsh(t, t.abs.trailingZeroBits())
-		if t.neg {
-			v, t = t, v
-			v.neg = len(v.abs) > 0 && !v.neg // 0 has no sign
-		} else {
-			u, t = t, u
+	// loop invariant A >= B
+	for len(B.abs) > 1 {
+
+		// initialize the digits
+		var a1, a2, u0, u1, u2, v0, v1, v2 Word
+
+		m := len(B.abs) // m >= 2
+		n := len(A.abs) // n >= m >= 2
+
+		// extract the top Word of bits from A and B
+		h := nlz(A.abs[n-1])
+		a1 = (A.abs[n-1] << h) | (A.abs[n-2] >> (_W - h))
+		// B may have implicit zero words in the high bits if the lengths differ
+		switch {
+		case n == m:
+			a2 = (B.abs[n-1] << h) | (B.abs[n-2] >> (_W - h))
+		case n == m+1:
+			a2 = (B.abs[n-2] >> (_W - h))
+		default:
+			a2 = 0
 		}
-		t.Sub(u, v)
+
+		// Since we are calculating with full words to avoid overflow,
+		// we use 'even' to track the sign of the cosequences.
+		// For even iterations: u0, v1 >= 0 && u1, v0 <= 0
+		// For odd  iterations: u0, v1 <= 0 && u1, v0 >= 0
+		// The first iteration starts with k=1 (odd).
+		even := false
+		// variables to track the cosequences
+		u0, u1, u2 = 0, 1, 0
+		v0, v1, v2 = 0, 0, 1
+
+		// calculate the quotient and cosequences using Collins' stopping condition
+		for a2 >= v2 && a1-a2 >= v1+v2 {
+			q := a1 / a2
+			a1, a2 = a2, a1-q*a2
+			u0, u1, u2 = u1, u2, u1+q*u2
+			v0, v1, v2 = v1, v2, v1+q*v2
+			even = !even
+		}
+
+		// multiprecision step
+		if v0 != 0 {
+			// simulate the effect of the single precision steps using the cosequences
+			// A = u0*A + v0*B
+			// B = u1*A + v1*B
+
+			t.abs = t.abs.setWord(u0)
+			s.abs = s.abs.setWord(v0)
+			t.neg = !even
+			s.neg = even
+
+			t.Mul(A, t)
+			s.Mul(B, s)
+
+			r.abs = r.abs.setWord(u1)
+			w.abs = w.abs.setWord(v1)
+			r.neg = even
+			w.neg = !even
+
+			r.Mul(A, r)
+			w.Mul(B, w)
+
+			A.Add(t, s)
+			B.Add(r, w)
+
+		} else {
+			// single-digit calculations failed to simluate any quotients
+			// do a standard Euclidean step
+			t.Rem(A, B)
+			A, B, t = B, t, A
+		}
 	}
 
-	return z.Lsh(u, k)
+	if len(B.abs) > 0 {
+		// standard Euclidean algorithm base case for B a single Word
+		if len(A.abs) > 1 {
+			// A is longer than a single Word
+			t.Rem(A, B)
+			A, B, t = B, t, A
+		}
+		if len(B.abs) > 0 {
+			// A and B are both a single Word
+			a1, a2 := A.abs[0], B.abs[0]
+			for a2 != 0 {
+				a1, a2 = a2, a1%a2
+			}
+			A.abs[0] = a1
+		}
+	}
+	*z = *A
+	return z
 }
 
 // Rand sets z to a pseudo-random number in [0, n) and returns z.
diff --git a/src/math/big/int_test.go b/src/math/big/int_test.go
index bc2eef5..e42917b 100644
--- a/src/math/big/int_test.go
+++ b/src/math/big/int_test.go
@@ -675,6 +675,37 @@
 	return x.Cmp(d) == 0
 }
 
+// euclidGCD is a reference implementation of Euclid's GCD
+// algorithm for testing against optimized algorithms.
+// Requirements: a, b > 0
+func euclidGCD(a, b *Int) *Int {
+
+	A := new(Int).Set(a)
+	B := new(Int).Set(b)
+	t := new(Int)
+
+	for len(B.abs) > 0 {
+		// A, B = B, A % B
+		t.Rem(A, B)
+		A, B, t = B, t, A
+	}
+	return A
+}
+
+func checkLehmerGcd(aBytes, bBytes []byte) bool {
+	a := new(Int).SetBytes(aBytes)
+	b := new(Int).SetBytes(bBytes)
+
+	if a.Sign() <= 0 || b.Sign() <= 0 {
+		return true // can only test positive arguments
+	}
+
+	d := new(Int).lehmerGCD(a, b)
+	d0 := euclidGCD(a, b)
+
+	return d.Cmp(d0) == 0
+}
+
 var gcdTests = []struct {
 	d, x, y, a, b string
 }{
@@ -708,38 +739,28 @@
 
 	D := new(Int).GCD(X, Y, a, b)
 	if D.Cmp(d) != 0 {
-		t.Errorf("GCD(%s, %s): got d = %s, want %s", a, b, D, d)
+		t.Errorf("GCD(%s, %s, %s, %s): got d = %s, want %s", x, y, a, b, D, d)
 	}
 	if x != nil && X.Cmp(x) != 0 {
-		t.Errorf("GCD(%s, %s): got x = %s, want %s", a, b, X, x)
+		t.Errorf("GCD(%s, %s, %s, %s): got x = %s, want %s", x, y, a, b, X, x)
 	}
 	if y != nil && Y.Cmp(y) != 0 {
-		t.Errorf("GCD(%s, %s): got y = %s, want %s", a, b, Y, y)
-	}
-
-	// binaryGCD requires a > 0 && b > 0
-	if a.Sign() <= 0 || b.Sign() <= 0 {
-		return
-	}
-
-	D.binaryGCD(a, b)
-	if D.Cmp(d) != 0 {
-		t.Errorf("binaryGcd(%s, %s): got d = %s, want %s", a, b, D, d)
+		t.Errorf("GCD(%s, %s, %s, %s): got y = %s, want %s", x, y, a, b, Y, y)
 	}
 
 	// check results in presence of aliasing (issue #11284)
 	a2 := new(Int).Set(a)
 	b2 := new(Int).Set(b)
-	a2.binaryGCD(a2, b2) // result is same as 1st argument
+	a2.GCD(X, Y, a2, b2) // result is same as 1st argument
 	if a2.Cmp(d) != 0 {
-		t.Errorf("binaryGcd(%s, %s): got d = %s, want %s", a, b, a2, d)
+		t.Errorf("aliased z = a GCD(%s, %s, %s, %s): got d = %s, want %s", x, y, a, b, a2, d)
 	}
 
 	a2 = new(Int).Set(a)
 	b2 = new(Int).Set(b)
-	b2.binaryGCD(a2, b2) // result is same as 2nd argument
+	b2.GCD(X, Y, a2, b2) // result is same as 2nd argument
 	if b2.Cmp(d) != 0 {
-		t.Errorf("binaryGcd(%s, %s): got d = %s, want %s", a, b, b2, d)
+		t.Errorf("aliased z = b GCD(%s, %s, %s, %s): got d = %s, want %s", x, y, a, b, b2, d)
 	}
 }
 
@@ -760,6 +781,10 @@
 	if err := quick.Check(checkGcd, nil); err != nil {
 		t.Error(err)
 	}
+
+	if err := quick.Check(checkLehmerGcd, nil); err != nil {
+		t.Error(err)
+	}
 }
 
 type intShiftTest struct {
diff --git a/src/math/big/rat.go b/src/math/big/rat.go
index f0f436e..b33fc69 100644
--- a/src/math/big/rat.go
+++ b/src/math/big/rat.go
@@ -422,7 +422,7 @@
 		neg := z.a.neg
 		z.a.neg = false
 		z.b.neg = false
-		if f := NewInt(0).binaryGCD(&z.a, &z.b); f.Cmp(intOne) != 0 {
+		if f := NewInt(0).lehmerGCD(&z.a, &z.b); f.Cmp(intOne) != 0 {
 			z.a.abs, _ = z.a.abs.div(nil, z.a.abs, f.abs)
 			z.b.abs, _ = z.b.abs.div(nil, z.b.abs, f.abs)
 			if z.b.abs.cmp(natOne) == 0 {

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Oct 25, 2017, 1:03:20 PM10/25/17

to Robert Griesemer, Brian Kessler, goph...@pubsubhelper.golang.org, Filippo Valsorda, Gobot Gobot, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

View Change

1 comment:

File src/math/big/int.go:
- Patch Set #7, Line 577: a2
  Does the algorithm guarantee that we never run into overflow (of a Word) here and on the next two lines?
  Can we document this?

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Oct 26, 2017, 1:33:56 PM10/26/17

to Robert Griesemer, goph...@pubsubhelper.golang.org, Filippo Valsorda, Gobot Gobot, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

Patch Set 6: Code-Review+2

Hi Brian;
Uri Mendlovic from Google who is familiar with Lehmer GCD took a look and he had the following comments:
The performance is O(num words ^ 2), which can be improved. This should help for numbers with tens of words or more.

I looked at the subquadeatic algorithms a bit and Niels Moller has a paper http://www.ams.org/journals/mcom/2008-77-261/S0025-5718-07-02017-0/S0025-5718-07-02017-0.pdf that references a payoff only around ~100 words so I thought it wasn't worth the additional complexity for a first improvement. We can definitely add that extension as a follow up.

There is a variant that manipulates the leading words (a1, a2) by aligning them to the MSBit. This replaces the division/multiplication with shifts and may be faster.

I believe he may be referencing Sorenson's k-ary GCD algorithm
ftp://ftp.cs.wisc.edu/pub/techreports/1990/TR979.pdf Which looks basically comparable in performance to Lehmer's algorithm but more complex to implement.

I saw that there is also another naive implementation for the case of extended GCD (when the coefficients is of interest). This implementation is much slower than both the old binary GCD and the purposed implementation. The new implementation can be modified to return the coefficient with small (x2) performance loss, and thus replacing the duplicate implementation.

Yes, I have the extension to the cofactors ready to go but it probably will be a couple of weeks before I can clean it up to submit because I'm in the middle of a move.

Would you prefer a single unified implementation? I was planning factoring out the shared single digit code, but having separate update and base cases for the extended algorithm to handle the extra bookeeping for cofactors.

----
Overall he agrees that the code is algorithmically correct. I'm going to approve this and check it in so we can move forward with incremental changes.
Thanks for doing this and bearing with me.
gri

The process was really great. Thanks for taking the time to thoroughly review, vet and improve the code.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Brian Kessler (Gerrit)

unread,

Oct 26, 2017, 1:46:42 PM10/26/17

to Robert Griesemer, goph...@pubsubhelper.golang.org, Filippo Valsorda, Gobot Gobot, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

Yes, the first line is just the remainder, so q*a2 <= a1 and a1 - q*a2 >= 0. u and v are positive and bounded by the size of the inputs (a1 and a2) so are always guaranteed to fit into a Word. Thus the products and sums on the following two lines will not overflow. Jebelean's paper discussed this as well. I can document this, but it will likely be a couple of weeks as I am in the middle of a move.

Feel free to ping me in mid November if this is still open.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Oct 26, 2017, 2:04:47 PM10/26/17

to Robert Griesemer, Brian Kessler, goph...@pubsubhelper.golang.org, Filippo Valsorda, Gobot Gobot, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

Patch Set 7:

Patch Set 6: Code-Review+2
Hi Brian;
Uri Mendlovic from Google who is familiar with Lehmer GCD took a look and he had the following comments:
The performance is O(num words ^ 2), which can be improved. This should help for numbers with tens of words or more.
I looked at the subquadeatic algorithms a bit and Niels Moller has a paper http://www.ams.org/journals/mcom/2008-77-261/S0025-5718-07-02017-0/S0025-5718-07-02017-0.pdf that references a payoff only around ~100 words so I thought it wasn't worth the additional complexity for a first improvement. We can definitely add that extension as a follow up.
There is a variant that manipulates the leading words (a1, a2) by aligning them to the MSBit. This replaces the division/multiplication with shifts and may be faster.
I believe he may be referencing Sorenson's k-ary GCD algorithm
ftp://ftp.cs.wisc.edu/pub/techreports/1990/TR979.pdf Which looks basically comparable in performance to Lehmer's algorithm but more complex to implement.
I saw that there is also another naive implementation for the case of extended GCD (when the coefficients is of interest). This implementation is much slower than both the old binary GCD and the purposed implementation. The new implementation can be modified to return the coefficient with small (x2) performance loss, and thus replacing the duplicate implementation.
Yes, I have the extension to the cofactors ready to go but it probably will be a couple of weeks before I can clean it up to submit because I'm in the middle of a move.
Would you prefer a single unified implementation? I was planning factoring out the shared single digit code, but having separate update and base cases for the extended algorithm to handle the extra bookeeping for cofactors.
----
Overall he agrees that the code is algorithmically correct. I'm going to approve this and check it in so we can move forward with incremental changes.
Thanks for doing this and bearing with me.
gri
The process was really great. Thanks for taking the time to thoroughly review, vet and improve the code.

Please take your time. The factored approach seems fine.

Note that the 1.10 freeze is around Nov. 1, so future work may have to wait until 1.11. We can still get it going and reviewed though.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Robert Griesemer (Gerrit)

unread,

Oct 27, 2017, 2:00:26 PM10/27/17

to Robert Griesemer, Brian Kessler, goph...@pubsubhelper.golang.org, Filippo Valsorda, Gobot Gobot, Emmanuel Odeke, Aliaksandr Valialkin, golang-co...@googlegroups.com

Patch Set 7:
Patch Set 7:
Patch Set 6: Code-Review+2
Hi Brian;
Uri Mendlovic from Google who is familiar with Lehmer GCD took a look and he had the following comments:
The performance is O(num words ^ 2), which can be improved. This should help for numbers with tens of words or more.
I looked at the subquadeatic algorithms a bit and Niels Moller has a paper http://www.ams.org/journals/mcom/2008-77-261/S0025-5718-07-02017-0/S0025-5718-07-02017-0.pdf that references a payoff only around ~100 words so I thought it wasn't worth the additional complexity for a first improvement. We can definitely add that extension as a follow up.
There is a variant that manipulates the leading words (a1, a2) by aligning them to the MSBit. This replaces the division/multiplication with shifts and may be faster.
I believe he may be referencing Sorenson's k-ary GCD algorithm
ftp://ftp.cs.wisc.edu/pub/techreports/1990/TR979.pdf Which looks basically comparable in performance to Lehmer's algorithm but more complex to implement.
I saw that there is also another naive implementation for the case of extended GCD (when the coefficients is of interest). This implementation is much slower than both the old binary GCD and the purposed implementation. The new implementation can be modified to return the coefficient with small (x2) performance loss, and thus replacing the duplicate implementation.
Yes, I have the extension to the cofactors ready to go but it probably will be a couple of weeks before I can clean it up to submit because I'm in the middle of a move.
Would you prefer a single unified implementation? I was planning factoring out the shared single digit code, but having separate update and base cases for the extended algorithm to handle the extra bookeeping for cofactors.
----
Overall he agrees that the code is algorithmically correct. I'm going to approve this and check it in so we can move forward with incremental changes.
Thanks for doing this and bearing with me.
gri
The process was really great. Thanks for taking the time to thoroughly review, vet and improve the code.
Please take your time. The factored approach seems fine.
Note that the 1.10 freeze is around Nov. 1, so future work may have to wait until 1.11. We can still get it going and reviewed though.

PS: Some comments from Uri Mendlovic:

The sub-quadratic variant may indeed provide significant acceleration only at 100 words, that depends on the implementation. 100 words are possible in practical usecases, for example RSA4096 numbers.
But this indeed can wait for future work. Consider adding a comment about the complexity.

The benefit of the shift-based variant is platform depended, specifically on the latency ratio of single word shift/multiplication. In modern CPUs the ratio is indeed small, so if Go doesn't add overhead to the multiplication the improvement will be small.

As for computing the coefficients, I thought this code could be put inside if statements. Probably there will be 3 such ifs so separating the implementations is reasonable, but will have some code duplication even if the single word manipulation is partially shared. I will be glad to have a look on the code when Brian have it ready.

View Change

To view, visit change 59450. To unsubscribe, or for help writing mail filters, visit settings.

Reply all

Reply to author

Forward