[go] math/big: comment division

58 views

Skip to first unread message

Russ Cox (Gerrit)

unread,

May 18, 2021, 9:41:55 PM5/18/21

to Robert Griesemer, Katie Hockman, goph...@pubsubhelper.golang.org, Russ Cox, golang-co...@googlegroups.com

Attention is currently required from: Robert Griesemer, Katie Hockman.

Russ Cox would like Robert Griesemer and Katie Hockman to review this change.

View Change

math/big: comment division

The comments in the code refer to Knuth and to Burnikel and Ziegler,
but Knuth's presentation is inscrutable, and our recursive division
code does not bear much resemblance to Burnikel and Ziegler's paper
(which is fine, ours is nicer).

Add a standalone explanation of division instead of referring to
difficult or not-directly-used references.

Change-Id: Ic1b35dc167fb29a69ee00e0b4a768ac9cc9e1324
---
M src/math/big/div.go
1 file changed, 599 insertions(+), 73 deletions(-)

diff --git a/src/math/big/div.go b/src/math/big/div.go
index 1330990..8142ec8 100644
--- a/src/math/big/div.go
+++ b/src/math/big/div.go
@@ -2,10 +2,494 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
+/*
+
+Multi-precision division. Here be dragons.
+
+Given u and v, where u is n+m digits, and v is n digits (with no leading zeros),
+the goal is to return quo, rem such that u = quo*v + rem, where 0 ≤ rem < v.
+That is, quo = ⌊u/v⌋ where ⌊x⌋ denotes the floor (truncation to integer) of x,
+and rem = u - quo·v.
+
+
+Long Division
+
+Division in a computer proceeds the same as long division in elementary school,
+but computers are not as good as schoolchildren at following vague directions,
+so we have to be much more precise about the actual steps and what can happen.
+
+We work from most to least significant digit of the quotient, doing:
+
+ • Guess a digit q, the number of v to subtract from the current
+   section of u to zero out the topmost digit.
+ • Check the guess by multiplying q·v and comparing it against
+   the current section of u, adjusting the guess as needed.
+ • Subtract q·v from the current section of u.
+ • Add q to the corresponding section of the result quo.
+
+When all digits have been processed, the final remainder is left in u
+and returned as rem.
+
+For example, here is a sketch of dividing 5 digits by 3 digits (n=3, m=2).
+
+                     q₂ q₁ q₀
+             _________________
+    v₂ v₁ v₀ ) u₄ u₃ u₂ u₁ u₀
+               ↓  ↓  ↓  |  |
+              [u₄ u₃ u₂]|  |
+            - [  q₂·v  ]|  |
+            ----------- ↓  |
+              [  rem  | u₁]|
+            - [    q₁·v   ]|
+               ----------- ↓
+                 [  rem  | u₀]
+               - [    q₀·v   ]
+                  ------------
+                    [  rem   ]
+
+Instead of creating new storage for the remainders and copying digits from u
+as indicated by the arrows, we use u's storage directly as both the source
+and destination of the subtractions, so that the remainders overwrite
+successive overlapping sections of u as the division proceeds, using a slice
+of u to identify the current section. This avoids all the copying as well as
+shifting of remainders.
+
+Division of u with n+m digits by v with n digits can in general produce
+at most m+1 digits, because:
+
+  • u < B^(n+m)               [B^(n+m) has n+m+1 digits]
+  • v ≥ B^(n-1)               [B^(n-1) is the smallest n-digit number]
+  • u/v < B^(n+m) / B^(n-1)   [divide bounds for u, v]
+  • u/v < B^(m+1)             [simplify]
+
+The first step is special: it takes the top n digits of u and divides them by
+the n digits of v, producing the first quotient digit and an n-digit remainder.
+In the example, q₂ = ⌊u₄u₃u₂ / v⌋.
+
+The first step divides n digits by n digits to ensure that it produces only a
+single digit.
+
+Each subsequent step appends the next digit from u to the remainder and divides
+those n+1 digits by the n digits of v, producing another quotient digit and a
+new n-digit remainder.
+
+Subsequent steps divide n+1 digits by n digits, an operation that in general
+might produce two digits. However, as used in the algorithm, that division is
+guaranteed to produce only a single digit. The dividend is of the form
+rem·B + d, where rem is a remainder from the previous step and d is a single
+digit, so:
+
+ • rem ≤ v - 1                 [rem is a remainder from dividing by v]
+ • rem·B ≤ v·B - B             [multiply by B]
+ • d ≤ B - 1                   [d is a single digit]
+ • rem·B + d ≤ v·B - 1         [add]
+ • rem·B + d < v·B             [change ≤ to <]
+ • (rem·B + d)/v < B           [divide by v]
+
+
+Guess and Check
+
+At each step we need to divide n+1 digits by n digits, but this is for the
+implementation of division by n digits, so we can't just invoke a division
+routine: we _are_ the division routine. Instead, we guess at the answer and
+then check it using multiplication. If the guess is wrong, we correct it.
+
+How can this guessing possibly be efficient? It turns out that the following
+statement (let's call it the Good Guess Guarantee) is true.
+
+If
+
+ • q = ⌊u/v⌋ where u is n+1 digits and v is n digits,
+ • q < B, and
+ • the topmost digit of v = vₙ₋₁ ≥ B/2,
+
+then q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ satisfies q ≤ q̂ ≤ q+2. (Proof below.)
+
+That is, if we know the answer has only a single digit and we guess an answer
+by ignoring the bottom n-1 digits of u and v, using a 2-by-1-digit division,
+then that guess is at least as large as the correct answer. It is also not
+too much larger: it is off by at most two from the correct answer.
+
+Note that in the first step of the overall division, which is an n-by-n-digit
+division, the 2-by-1 guess uses an implicit uₙ = 0.
+
+Note that using a 2-by-1-digit division here does not mean calling ourselves
+recursively. Instead, we use an efficient direct hardware implementation of
+that operation.
+
+Note that because q is u/v rounded down, q·v must not exceed u: u ≥ q·v.
+If a guess q̂ is too big, it will not satisfy this test. Viewed a different way,
+the remainder r̂ for a given q̂ is u - q̂·v, which must be positive. If it is
+negative, then the guess q̂ is too big.
+
+This gives us an way to compute q. First compute q̂ with 2-by-1-digit division.
+Then, while u < q̂·v, decrement q̂; this loop executes at most twice.
+
+
+Scaling Inputs
+
+The Good Guess Guarantee requires that the top digit of v (vₙ₋₁) be at least B/2.
+For example in base 10, ⌊172/19⌋ = 9, but ⌊18/1⌋ = 18: the guess is wildly off
+because the first digit 1 is smaller than B/2 = 5.
+
+We can ensure that v has a large top digit by multiplying u and v by the same
+small factor. Continuing the example, if we multiply 172 and 19 by 3, yielding
+the problem ⌊516/57⌋, the leading digit of v is now ≥ 5, and sure enough
+⌊51/5⌋ = 10 is much closer to the correct answer 9. It would be easier here
+to multiply by 4, because that can be done with a shift. Specifically, we can
+always count the number of leading zeros i in the first digit of v and then
+shift both u and v left by i bits.
+
+Having scaled u and v, the value ⌊u/v⌋ is unchanged, but the remainder will
+be scaled: 172 mod 19 is 1, but 516 mod 57 is 3. We have to divide the remainder
+by the scaling factor (shifting right i bits) when we finish.
+
+Note that these shifts happen before and after the entire division algorithm,
+not at each step in the per-digit iteration.
+
+Note the effect of scaling inputs on the size of the possible quotient.
+If u and v have n+m and n digits after scaling (u can gain a digit from
+scaling; v never does), then:
+
+  • u < B^(n+m)               [B^(n+m) has n+m+1 digits]
+  • v ≥ B^n / 2               [vₙ₋₁ ≥ B/2, so vₙ₋₁·B^(n-1) ≥ B^n/2]
+  • u/v < B^(n+m) / (B^n / 2) [divide bounds for u, v]
+  • u/v < 2 B^m               [simplify]
+
+The quotient can still have m+1 significant digits, but if so the top digit
+must be a 1. This provides a different way to handle the first digit of the
+result: compare the top n digits of u against v and fill in either a 0 or a 1.
+
+
+Refining Guesses
+
+Before we check whether u < q̂·v, we can adjust our guess to change it from
+q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ into the refined guess ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋.
+Although not mentioned above, the Good Guess Guarantee also promises that this
+3-by-2-digit division guess is more precise and at most most one away from the
+real answer q. The improvement from the 2-by-1 to the 3-by-2 guess can also be
+done without n-digit math.
+
+If we have a guess q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ and we want to see if it also equal to
+⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, we can use the same check we would for the full division:
+if uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂, then the guess is too large and should be reduced.
+
+Checking uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ < 0,
+and
+
+    uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ = (uₙuₙ₋₁·B + uₙ₋₂) - q̂·(vₙ₋₁·B + vₙ₋₂)
+                             [splitting off the bottom digit]
+                           = (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂
+                             [regrouping]
+
+The expression (uₙuₙ₋₁ - q̂·vₙ₋₁) is the remainder of uₙuₙ₋₁ / vₙ₋₁.
+If the initial guess returns both q̂ and its remainder r̂, then checking
+whether uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as checking r̂·B + uₙ₋₂ < q̂·vₙ₋₂.
+
+If we find that r̂·B + uₙ₋₂ < q̂·vₙ₋₂, then we can adjust the guess by
+decrementing q̂ and adding vₙ₋₁ to r̂. We repeat until r̂·B + uₙ₋₂ ≥ q̂·vₙ₋₂.
+(As before, this fixup is only needed at most twice.)
+
+Now that q̂ = ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, as mentioned above it is at most one
+away from the correct q, and we've avoided doing any n-digit math.
+(If we need the new remainder, it can be computed as r̂·B + uₙ₋₂ - q̂·vₙ₋₂.)
+
+The final check u < q̂·v and the possible fixup must be done at full precision.
+For random inputs, a fixup at this step is exceedingly rare: the 3-by-2 guess
+is not often wrong at all. But still we must do the check. Note that since the
+3-by-2 guess is off by at most 1, it can be convenient to perform the final
+u < q̂·v as part of the computation of the remainder r = u - q̂·v. If the
+subtraction underflows, decremeting q̂ and adding one v back to r is enough to
+arrive at the final q, r.
+
+That's the entirety of long division: scale the inputs, and then loop over
+each output position, guessing, checking, and correcting the next output digit.
+
+For a 2n-digit number divided by an n-digit number (the worst size-n case for
+division complexity), this algorithm uses n+1 iterations, each of which must do
+at least the 1-by-n-digit multiplication q̂·v. That's O(n) iterations of
+O(n) time each, so O(n²) time overall.
+
+
+Recursive Division
+
+For very large inputs, it is possible to improve on the O(n²) algorithm.
+Let's call a group of n/2 real digits a (very) “wide digit”. We can run the
+standard long division algorithm explained above over the wide digits instead of
+the actual digits. This will result in many fewer steps, but the math involved in
+each step is more work.
+
+Where basic long division uses a 2-by-1-digit division to guess the initial q̂,
+the new algorithm must use a 2-by-1-wide digit division, which is of course
+really an n-by-n/2-digit division. That's OK: if we implement n-digit division
+in terms of n/2-digit division, the recursion will terminate when the divisor
+becomes small enough to handle with standard long division or even with the
+2-by-1 hardware instruction.
+
+For example, here is a sketch of dividing 10 digits by 4, proceeding with
+wide digits corresponding to two regular digits. The first step, still special,
+must leave off a (regular) digit, dividing 5 by 4 and producing a 4-digit
+remainder less than v. The middle steps divide 6 digits by 4, guaranteed to
+produce two output digits each (one wide digit) with 4-digit remainders.
+The final step must use what it has: the 4-digit remainder plus one more,
+5 digits to divide by 4.
+
+                           q₆ q₅ q₄ q₃ q₂ q₁ q₀
+                _______________________________
+    v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀
+                  ↓  ↓  ↓  ↓  ↓  |  |  |  |  |
+                 [u₉ u₈ u₇ u₆ u₅]|  |  |  |  |
+               - [    q₆q₅·v    ]|  |  |  |  |
+               ----------------- ↓  ↓  |  |  |
+                    [    rem    |u₄ u₃]|  |  |
+                  - [     q₄q₃·v      ]|  |  |
+                  -------------------- ↓  ↓  |
+                          [    rem    |u₂ u₁]|
+                        - [     q₂q₁·v      ]|
+                        -------------------- ↓
+                                [    rem    |u₀]
+                              - [     q₀·v     ]
+                              ------------------
+                                   [    rem    ]
+
+An alternative would be to look ahead to how well n/2 divides into n+m and
+adjust the first step to use fewer digits as needed, making the first step
+more special to make the last step not special at all. For example, using the
+same input, we could choose to use only 4 digits in the first step, leaving
+a full wide digit for the last step:
+
+                           q₆ q₅ q₄ q₃ q₂ q₁ q₀
+                _______________________________
+    v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀
+                  ↓  ↓  ↓  ↓  |  |  |  |  |  |
+                 [u₉ u₈ u₇ u₆]|  |  |  |  |  |
+               - [    q₆·v   ]|  |  |  |  |  |
+               -------------- ↓  ↓  |  |  |  |
+                 [    rem    |u₅ u₄]|  |  |  |
+               - [     q₅q₄·v      ]|  |  |  |
+               -------------------- ↓  ↓  |  |
+                       [    rem    |u₃ u₂]|  |
+                     - [     q₃q₂·v      ]|  |
+                     -------------------- ↓  ↓
+                             [    rem    |u₁ u₀]
+                           - [     q₁q₀·v      ]
+                           ---------------------
+                                   [    rem    ]
+
+Either way, each step is a 3-by-2-wide-digit division approximated first by
+a 2-by-1-wide-digit division, just as we did for regular digits in long division.
+Because the actual answer we want is a 3-by-2-wide-digit division, instead of
+multiplying q̂·v directly during the fixup, we can use the quick refinement
+from long division (an n/2-by-n/2 multiply) to correct q to its actual value
+and also compute the remainder (as mentioned above), and then stop after that,
+never doing a full n-by-n multiply.
+
+Instead of using an n-by-n/2-digit division to produce n/2 digits, we can add
+(not discard) one more real digit, doing an (n+1)-by-(n/2+1)-digit division that
+produces n/2+1 digits. That single extra digit tightens the Good Guess Guarantee
+to q ≤ q̂ ≤ q+1 and lets us drop long division's special treatment of the first
+digit. These benefits are discussed more after the Good Guess Guarantee proof
+below.
+
+
+How Fast is Recursive Division?
+
+For a 2n-by-n-digit division, this algorithm runs a 4-by-2 long division over
+wide digits, producing two wide digits plus a possible leading regular digit 1,
+which can be handled without a recursive call. That is, the algorithm uses two
+full iterations, each using an n-by-n/2-digit division and an n/2-by-n/2-digit
+multiplication, along with a few n-digit additions and subtractions. The standard
+n-by-n-digit multiplication algorithm requires O(n²) time, making the overall
+algorithm require time T(n) where
+
+	T(n) = 2T(n/2) + O(n) + O(n²)
+
+which, by the Bentley-Haken-Saxe theorem, ends up reducing to T(n) = O(n²).
+This is not an improvement over regular long division.
+
+When the number of digits n becomes large enough, Karatsuba's algorithm for
+multiplication can be used instead, which takes O(n^log₂3) = O(n^1.6) time.
+That makes the overall recursive division algorithm take O(n^1.6) time as well,
+which is an improvement, but again only for large enough numbers.
+
+It is not critical to make sure that every recursion does only two recursive
+calls. While in general the number of recursive calls can change the time
+analysis, in this case doing three calls does not change the analysis:
+
+	T(n) = 3T(n/2) + O(n) + O(n^log₂3)
+
+ends up being T(n) = O(n^log₂3). Because the Karatsuba multiplication taking
+time O(n^log₂3) is itself doing 3 half-sized recursions, doing three for the
+division does not hurt the asymptotic performance. Of course, it is likely
+still faster in practice to do two.
+
+
+Proof of the Good Guess Guarantee
+
+Given numbers x, y, let us break them into high and low parts, where the low part
+is at most s and both the quotient x/y and the high part of y are at most t:
+
+	x = x₁·s + x₀    0 ≤ x₀ < s    x/y < t
+	y = y₁·s + y₀    0 ≤ y₀ < s    t/2 ≤ y₁ < t
+
+And consider the two truncated quotients:
+
+	q = ⌊x/y⌋
+	q̂ = ⌊x₁/y₁]
+
+We will prove that q ≤ q̂ ≤ q+2.
+
+For a scaling factor s, x₁ = ⌊x/s⌋ and y₁ = ⌊y/s⌋. The guarantee makes
+no real demands on s: it is simply the magnitude of the digits dropped from
+both x and y. The guarantee makes only limited demands on t: it must be
+large enough to hold the quotient x/y, and y₁ must have the same size.
+
+To apply to the earlier discussion of 2-by-1 guesses in long division,
+we would choose:
+
+	s  = Bⁿ⁻¹
+	t  = B
+	x  = u
+	x₁ = uₙuₙ₋₁
+	x₀ = uₙ₋₂...u₀
+	y  = v
+	y₁ = vₙ₋₁
+	y₀ = vₙ₋₂...u₀
+
+These simpler variables avoid repeating those longer expressions in the proof.
+
+Note also that, by definition, truncating division ⌊x/y⌋ satisfies
+
+	x/y - 1 < ⌊x/y⌋ ≤ x/y.
+
+This fact will be used a few times in the proofs.
+
+Proof that q ≤ q̂:
+
+    q̂·y₁ = ⌊x₁/y₁⌋·y₁                      [definition of q̂]
+          > (x₁/y₁ - 1)·y₁                 [x₁/y₁ - 1 < ⌊x₁/y₁⌋]
+          = x₁ - y₁                        [distribute y₁]
+
+	So q̂·y₁ > x₁ - y₁.
+	Since q̂·y₁ is an integer, q̂·y₁ ≥ x₁ - y₁ + 1.
+
+    q̂ - q = q̂ - ⌊x/y⌋                      [definition of q]
+           ≥ q̂ - x/y                       [⌊x/y⌋ < x/y]
+           = (1/y)·(q̂·y - x)               [factor out 1/y]
+           ≥ (1/y)·(q̂·y₁·s - x)            [y = y₁·s + y₀ ≥ y₁·s]
+           ≥ (1/y)·((x₁ - y₁ + 1)·s - x)   [above: q̂·y₁ ≥ x₁ - y₁ + 1]
+           = (1/y)·(x₁·s - v₁·s + s - x)   [distribute s]
+           = (1/y)·(s - x₀ - y₁·s)         [-x = -x₁·s - x₀]
+           > -y₁·s / y                     [x₀ < s, so s - x₀ < 0; drop it]
+           ≥ -1                            [y₁·s ≤ y]
+
+	So q̂ - q > -1.
+	Since q̂ - q is an integer, q̂ - q ≥ 0, or equivalently q ≤ q̂.
+
+Proof that q̂ ≤ q+2:
+
+	x₁/y₁ - x/y = ⌊x₁/y₁⌋ - x/y            [definition of q̂]
+                = x₁·s/y₁·s - x/y          [multiply left term by s/s]
+                ≤ x/y₁·s - x/y             [x₁s ≤ x]
+                = (x/y)·(y/y₁·s - 1)       [factor out x/y]
+                = (x/y)·((y - y₁·s)/y₁·s)  [move -1 into y/y₁·s fraction]
+                = (x/y)·(y₀/y₁·s)          [y - y₁·s = y₀]
+                = (x/y)·(1/y₁)·(y₀/s)      [factor out 1/y₁]
+                < (x/y)·(1/y₁)             [y₀ < s, so y₀/s < 1]
+                ≤ (x/y)·(2/t)              [y₁ ≥ t/2, so 1/y₁ ≤ 2/t]
+                < t·(2/t)                  [x/y < t]
+                = 2                        [t·(2/t) = 2]
+
+    So x₁/y₁ - x/y < 2.
+
+	q̂ - q = ⌊x₁/y₁⌋ - ⌊x/y⌋                [definition of q̂, q]
+	       ≤ x₁/y₁ - ⌊x/y⌋                 [⌊x₁/y₁⌋ ≤ x₁/y₁]
+	       < x₁/y₁ - (x/y - 1)             [⌊x/y⌋ > x/y - 1]
+	       = (x₁/y₁ - x/y) + 1             [regrouping]
+	       < 2 + 1                         [above: x₁/y₁ - x/y < 2]
+	       = 3
+
+	So q̂ - q < 3.
+	Since q̂ - q is an integer, q̂ - q ≤ 2.
+
+Note that when x/y < t/2, the bounds tighten to x₁/y₁ - x/y < 1 and therefore
+q̂ - q ≤ 1.
+
+Note also that in the general case 2n-by-n division where we don't know that
+x/y < t, we do know that x/y < 2t, yielding the bound q̂ - q ≤ 4. So we could
+remove the special case first step of long division as long as we allow the
+first fixup loop to run up to four times. (Using a simple comparison to decide
+whether the first digit is 0 or 1 is still more efficient, though.)
+
+Finally, note that when dividing three leading base-B digits by two (scaled),
+we have t = B² and x/y < B = t/B, a much tighter bound than x/y < t.
+This in turn yields the much tighter bound x₁/y₁ - x/y < 2/B. This means that
+⌊x₁/y₁⌋ and ⌊x/y⌋ can only differ when x/y is less than 2/B greater than an
+integer. For random x and y, the chance of this is 2/B, or, for large B,
+approximately zero. This means that after we produce the 3-by-2 guess in the
+long division algorithm, the fixup loop essentially never runs.
+
+In the recursive algorithm, the extra digit in (2·⌊n/2⌋+1)-by-(⌊n/2⌋+1)-digit
+division has exactly the same effect: the probability of needing a fixup is the
+same 2/B. Even better, we can allow the general case x/y < 2t and the fixup
+probability only grows to 4/B, still essentially zero.
+
+
+References
+
+There are no great references for implementing long division; thus this comment.
+Here are some notes about what to expect from the obvious references.
+
+Knuth Volume 2 (Seminumerical Algorithms) section 4.3.1 is the usual canonical
+reference for long division, but that entire series is highly compressed, never
+repeating a necessary fact and leaving important insights to the exercises.
+For example, no rationale whatsoever is given for the calculation that extends
+q̂ from a 2-by-1 to a 3-by-2 guess, nor why it reduces the error bound.
+The proof that the calculation even has the desired effect is left to exercises.
+The solutions to those exercises provided at the back of the book are entirely
+calculations, still with no explanation as to what is going on or how you would
+arrive at the idea of doing those exact calculations. Nowhere is it mentioned
+that this test extends the 2-by-1 guess into a 3-by-2 guess. The proof of the
+Good Guess Guarantee is only for the 2-by-1 guess and argues by contradiction,
+making it difficult to understand how modifications like adding another digit
+or adjusting the quotient range affects the overall bound.
+
+All that said, Knuth remains the canonical reference. It is dense but packed
+full of information and references, and the proofs are simpler than many other
+presentations. The proofs above are reworkings of Knuth's to remove the
+arguments by contradiction and add explanations or steps that Knuth omitted.
+But beware of errors in older printings. Take the published errata with you.
+
+Brinch Hansen's “Multiple-length Division Revisited: a Tour of the Minefield”
+starts with a blunt critique of Knuth's presentation (among others) and then
+presents a more detailed and easier to follow treatment of long division,
+including an implementation in Pascal. But the algorithm and implementation
+work entirely in terms of 3-by-2 division, which is much less useful on modern
+hardware than an algorithm using 2-by-1 division. The proofs are a bit too
+focused on digit counting and seem needlessly complex, especially compared to
+the ones given above.
+
+Burnikel and Ziegler's “Fast Recursive Division” introduced the key insight of
+implmenenting division by an n-digit divisor using recursive calls to division
+by an n/2-digit divisor, relying on Karatsuba multiplication to yield a
+sub-quadratic run time. However, the presentation decisions are made almost
+entirely for the purpose of simplifying the run-time analysis, rather than
+simplifying the presentation. Instead of a single algorithm that loops over
+quotient digits, the paper presents two mutually-recursive algorithms, for
+2n-by-n and 3n-by-2n. The paper also does not present any general (n+m)-by-n
+algorithm.
+
+The proofs in the paper are remarkably complex, especially considering that
+the algorithm is at its core just long division on wide digits, so that the
+usual long division proofs apply essentially unaltered.
+*/
+
 package big
 
 import "math/bits"
 
+// div returns q, r such that q = ⌊u/v⌋ and r = u%v = u - q·v.
+// It uses z and z2 as the storage for q and r.
 func (z nat) div(z2, u, v nat) (q, r nat) {
 	if len(v) == 0 {
 		panic("division by zero")
@@ -18,6 +502,8 @@
 	}
 
 	if len(v) == 1 {
+		// Short division: long optimized for a single-word divisor.
+		// In that case, the 2-by-1 guess is all we need at each step.
 		var r2 Word
 		q, r2 = z.divW(u, v[0])
 		r = z2.setWord(r2)
@@ -28,7 +514,9 @@
 	return
 }
 
-// q = (x-r)/y, with 0 <= r < y
+// divW returns q, r such that q = ⌊x/y⌋ and r = x%y = x - q·y.
+// It uses z as the storage for q.
+// Note that y is a single digit (Word), not a big number.
 func (z nat) divW(x nat, y Word) (q nat, r Word) {
 	m := len(x)
 	switch {
@@ -56,6 +544,8 @@
 	return divWVW(q, 0, x, d)
 }
 
+// divWVW overwrites z with ⌊x/y⌋, returning the remainder r.
+// The caller must ensure that len(z) = len(x).
 func divWVW(z []Word, xn Word, x []Word, y Word) (r Word) {
 	r = xn
 	if len(x) == 1 {
@@ -70,34 +560,33 @@
 	return r
 }
 
-// q = (uIn-r)/vIn, with 0 <= r < vIn
-// Uses z as storage for q, and u as storage for r if possible.
-// See Knuth, Volume 2, section 4.3.1, Algorithm D.
-// Preconditions:
-//    len(vIn) >= 2
-//    len(uIn) >= len(vIn)
-//    u must not alias z
+// div returns q, r such that q = ⌊uIn/vIn⌋ and r = uIn%vIn = uIn - q·vIn.
+// It uses z and u as the storage for q and r.
+// The caller must ensure that len(vIn) ≥ 2 (use divW otherwise)
+// and that len(uIn) ≥ len(vIn) (the answer is 0, uIn otherwise).
 func (z nat) divLarge(u, uIn, vIn nat) (q, r nat) {
 	n := len(vIn)
 	m := len(uIn) - n
 
-	// D1.
+	// Scale the inputs so vIn's top bit is 1 (see “Scaling Inputs” above).
+	// vIn is treated as a read-only input (it may be in use by another
+	// goroutine), so we must make a copy.
+	// uIn is copied to u.
 	shift := nlz(vIn[n-1])
-	// do not modify vIn, it may be used by another goroutine simultaneously
 	vp := getNat(n)
 	v := *vp
 	shlVU(v, vIn, shift)
-
-	// u may safely alias uIn or vIn, the value of uIn is used to set u and vIn was already used
 	u = u.make(len(uIn) + 1)
 	u[len(uIn)] = shlVU(u[0:len(uIn)], uIn, shift)
 
-	// z may safely alias uIn or vIn, both values were used already
+	// The caller should not pass aliased z and u, since those are
+	// the two different outputs, but correct just in case.
 	if alias(z, u) {
-		z = nil // z is an alias for u - cannot reuse
+		z = nil
 	}
 	q = z.make(m + 1)
 
+	// Use basic or recursive long division depending on size.
 	if n < divRecursiveThreshold {
 		q.divBasic(u, v)
 	} else {
@@ -106,19 +595,17 @@
 	putNat(vp)
 
 	q = q.norm()
+
+	// Undo scaling of remainder.
 	shrVU(u, u, shift)
 	r = u.norm()
 
 	return q, r
 }
 
-// divBasic performs word-by-word division of u by v.
-// The quotient is written in pre-allocated q.
-// The remainder overwrites input u.
-//
-// Precondition:
-// - q is large enough to hold the quotient u / v
-//   which has a maximum length of len(u)-len(v)+1.
+// divBasic implements long division as described above.
+// It overwrites q with ⌊u/v⌋ and overwrites u with the remainder r.
+// q must be large enough to hold ⌊u/v⌋.
 func (q nat) divBasic(u, v nat) {
 	n := len(v)
 	m := len(u) - n
@@ -126,45 +613,56 @@
 	qhatvp := getNat(n + 1)
 	qhatv := *qhatvp
 
-	// D2.
+	// Set up for divWW below, precomputing reciprocal argument.
 	vn1 := v[n-1]
 	rec := reciprocalWord(vn1)
+
+	// Compute each digit of quotient.
 	for j := m; j >= 0; j-- {
-		// D3.
+		// Compute the 2-by-1 guess q̂.
+		// The first iteration must invent a leading 0 for u.
 		qhat := Word(_M)
 		var ujn Word
 		if j+n < len(u) {
 			ujn = u[j+n]
 		}
+
+		// ujn ≤ vn1, or else q̂ would be more than one digit.
+		// For ujn == vn1, we set q̂ to the max digit M above.
+		// Otherwise, we compute the 2-by-1 guess.
 		if ujn != vn1 {
 			var rhat Word
 			qhat, rhat = divWW(ujn, u[j+n-1], vn1, rec)
 
-			// x1 | x2 = q̂v_{n-2}
+			// Refine q̂ to a 3-by-2 guess. See “Refining Guesses” above.
 			vn2 := v[n-2]
 			x1, x2 := mulWW(qhat, vn2)
-			// test if q̂v_{n-2} > br̂ + u_{j+n-2}
 			ujn2 := u[j+n-2]
-			for greaterThan(x1, x2, rhat, ujn2) {
+			for greaterThan(x1, x2, rhat, ujn2) { // x1x2 > r̂ u[j+n-2]
 				qhat--
 				prevRhat := rhat
 				rhat += vn1
-				// v[n-1] >= 0, so this tests for overflow.
+				// If r̂  overflows, then
+				// r̂ u[j+n-2]v[n-1] is now definitely > x1 x2.
 				if rhat < prevRhat {
 					break
 				}
+				// TODO(rsc): No need for a full mulWW.
+				// x2 += vn2; if x2 overflows, x1++
 				x1, x2 = mulWW(qhat, vn2)
 			}
 		}
 
-		// D4.
-		// Compute the remainder u - (q̂*v) << (_W*j).
-		// The subtraction may overflow if q̂ estimate was off by one.
+		// Compute q̂·v.
 		qhatv[n] = mulAddVWW(qhatv[0:n], v, qhat, 0)
 		qhl := len(qhatv)
 		if j+qhl > len(u) && qhatv[n] == 0 {
 			qhl--
 		}
+
+		// Subtract q̂·v from the current section of u.
+		// If it underflows, q̂·v > u, which we fix up
+		// by decrementing q̂ and adding v back.
 		c := subVV(u[j:j+qhl], u[j:], qhatv)
 		if c != 0 {
 			c := addVV(u[j:j+n], u[j:], v)
@@ -176,6 +674,8 @@
 			qhat--
 		}
 
+		// Save quotient digit.
+		// Caller may know the top digit is zero and not leave room for it.
 		if j == m && m == len(q) && qhat == 0 {
 			continue
 		}
@@ -185,30 +685,34 @@
 	putNat(qhatvp)
 }
 
-// greaterThan reports whether (x1<<_W + x2) > (y1<<_W + y2)
+// greaterThan reports whether the two digit numbers x1 x2 > y1 y2.
+// TODO(rsc): In contradiction to most of this file, x1 is the high
+// digit and x2 is the low digit. This should be fixed.
 func greaterThan(x1, x2, y1, y2 Word) bool {
 	return x1 > y1 || x1 == y1 && x2 > y2
 }
 
+// divRecursiveThreshold is the number of divisor digits
+// at which point divRecursive is faster than divBasic.
 const divRecursiveThreshold = 100
 
-// divRecursive performs word-by-word division of u by v.
-// The quotient is written in pre-allocated z.
-// The remainder overwrites input u.
-//
-// Precondition:
-// - len(z) >= len(u)-len(v)
-//
-// See Burnikel, Ziegler, "Fast Recursive Division", Algorithm 1 and 2.
+// divRecursive implements recursive division as described above.
+// It overwrites z with ⌊u/v⌋ and overwrites u with the remainder r.
+// z must be large enough to hold ⌊u/v⌋.
+// This function is just for allocating and freeing temporaries
+// around divRecursiveStep, the real implementation.
 func (z nat) divRecursive(u, v nat) {
-	// Recursion depth is less than 2 log2(len(v))
-	// Allocate a slice of temporaries to be reused across recursion.
+	// Recursion depth is (much) less than 2 log₂(len(v)).
+	// Allocate a slice of temporaries to be reused across recursion,
+	// plus one extra temporary not live across the recursion.
 	recDepth := 2 * bits.Len(uint(len(v)))
-	// large enough to perform Karatsuba on operands as large as v
 	tmp := getNat(3 * len(v))
 	temps := make([]*nat, recDepth)
+
 	z.clear()
 	z.divRecursiveStep(u, v, 0, tmp, temps)
+
+	// Free temporaries.
 	for _, n := range temps {
 		if n != nil {
 			putNat(n)
@@ -217,72 +721,92 @@
 	putNat(tmp)
 }
 
-// divRecursiveStep computes the division of u by v.
-// - z must be large enough to hold the quotient
-// - the quotient will overwrite z
-// - the remainder will overwrite u
+// divRecursiveStep is the actual implementation of recursive division.
+// It adds ⌊u/v⌋ to z and overwrites u with the remainder r.
+// z must be large enough to hold ⌊u/v⌋.
+// It uses temps[depth] (allocating if needed) as a temporary live across
+// the recursive call. It also uses tmp, but not live across the recursion.
 func (z nat) divRecursiveStep(u, v nat, depth int, tmp *nat, temps []*nat) {
+	// u is a subsection of the original and may have leading zeros.
+	// TODO(rsc): The v = v.norm() is useless and should be removed.
+	// We know (and require) that v's top digit is ≥ B/2.
 	u = u.norm()
 	v = v.norm()
-
 	if len(u) == 0 {
 		z.clear()
 		return
 	}
+
+	// Fall back to basic division if the problem is now small enough.
 	n := len(v)
 	if n < divRecursiveThreshold {
 		z.divBasic(u, v)
 		return
 	}
+
+	// Nothing to do if u is shorter than v (implies u < v).
 	m := len(u) - n
 	if m < 0 {
 		return
 	}
 
-	// Produce the quotient by blocks of B words.
-	// Division by v (length n) is done using a length n/2 division
-	// and a length n/2 multiplication for each block. The final
-	// complexity is driven by multiplication complexity.
+	// We consider B digits in a row as a single wide digit.
+	// (See “Recursive Division” above.)
+	//
+	// TODO(rsc): rename B to Wide, to avoid confusion with _B,
+	// which is something entirely different.
+	// TODO(rsc): Look into whether using ⌈n/2⌉ is better than ⌊n/2⌋.
 	B := n / 2
 
 	// Allocate a nat for qhat below.
 	if temps[depth] == nil {
-		temps[depth] = getNat(n)
+		temps[depth] = getNat(n) // TODO(rsc): Can be just B+1.
 	} else {
 		*temps[depth] = temps[depth].make(B + 1)
 	}
 
+	// Compute each wide digit of the quotient.
+	//
+	// TODO(rsc): Change the loop to be
+	//	for j := (m+B-1)/B*B; j > 0; j -= B {
+	// which will make the final step a regular step, letting us
+	// delete what amounts to an extra copy of the loop body below.
 	j := m
 	for j > B {
-		// Divide u[j-B:j+n] by vIn. Keep remainder in u
-		// for next block.
+		// Divide u[j-B:j+n] (3 wide digits) by v (2 wide digits).
+		// First make the 2-by-1-wide-digit guess using a recursive call.
+		// Then extend the guess to the full 3-by-2 (see “Refining Guesses”).
 		//
-		// The following property will be used (Lemma 2):
-		// if u = u1 << s + u0
-		//    v = v1 << s + v0
-		// then floor(u1/v1) >= floor(u/v)
-		//
-		// Moreover, the difference is at most 2 if len(v1) >= len(u/v)
-		// We choose s = B-1 since len(v)-s >= B+1 >= len(u/v)
+		// For the 2-by-1-wide-digit guess, instead of doing 2B-by-B-digit,
+		// we use a (2B+1)-by-(B+1) digit, which handles the possibility that
+		// the result has an extra leading 1 digit as well as guaranteeing
+		// that the computed q̂ will be off by at most 1 instead of 2.
+
+		// s is the number of digits to drop from the 3B- and 2B-digit chunks.
+		// We drop B-1 to be left with 2B+1 and B+1.
 		s := (B - 1)
-		// Except for the first step, the top bits are always
-		// a division remainder, so the quotient length is <= n.
+
+		// uu is the up-to-3B-digit section of u we are working on.
 		uu := u[j-B:]
 
+		// Compute the 2-by-1 guess q̂, leaving r̂ in uu[s:B+n].
 		qhat := *temps[depth]
 		qhat.clear()
 		qhat.divRecursiveStep(uu[s:B+n], v[s:], depth+1, tmp, temps)
 		qhat = qhat.norm()
-		// Adjust the quotient:
-		//    u = u_h << s + u_l
-		//    v = v_h << s + v_l
-		//  u_h = q̂ v_h + rh
-		//    u = q̂ (v - v_l) + rh << s + u_l
-		// After the above step, u contains a remainder:
-		//    u = rh << s + u_l
-		// and we need to subtract q̂ v_l
-		//
-		// But it may be a bit too large, in which case q̂ needs to be smaller.
+
+		// Extend to a 3-by-2 quotient and remainder.
+		// Because divRecursiveStep overwrote the top part of uu with
+		// the remainder r̂, the full uu already contains the equivalent
+		// of r̂·B + uₙ₋₂ from the “Refining Guesses” discussion.
+		// Subtracting q̂·vₙ₋₂ from it will compute the full-length remainder.
+		// If that subtraction underflows, q̂·v > u, which we fix up
+		// by decrementing q̂ and adding v back, same as in long division.
+
+		// TODO(rsc): Instead of subtract and fix-up, this code is computing
+		// q̂·vₙ₋₂ and decrementing q̂ until that product is ≤ u.
+		// But we can do the subtraction directly, as in the comment above
+		// and in long division, because we know that q̂ is wrong by at most one.
 		qhatv := tmp.make(3 * n)
 		qhatv.clear()
 		qhatv = qhatv.mul(qhat, v[:s])
@@ -309,6 +833,8 @@
 		j -= B
 	}
 
+	// TODO(rsc): Rewrite loop as described above and delete all this code.
+
 	// Now u < (v<<B), compute lower bits in the same way.
 	// Choose shift = B-1 again.
 	s := B - 1

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Russ Cox (Gerrit)

unread,

May 19, 2021, 6:53:55 AM5/19/21

to Russ Cox, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Robert Griesemer, Katie Hockman.

Russ Cox uploaded patch set #2 to this change.

View Change

math/big: comment division

The comments in the code refer to Knuth and to Burnikel and Ziegler,
but Knuth's presentation is inscrutable, and our recursive division
code does not bear much resemblance to Burnikel and Ziegler's paper
(which is fine, ours is nicer).

Add a standalone explanation of division instead of referring to
difficult or not-directly-used references.

Change-Id: Ic1b35dc167fb29a69ee00e0b4a768ac9cc9e1324
---

M src/math/big/natdiv.go


1 file changed, 599 insertions(+), 73 deletions(-)

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Filippo Valsorda (Gerrit)

unread,

Jun 2, 2021, 11:13:56 AM6/2/21

to Russ Cox, goph...@pubsubhelper.golang.org, Filippo Valsorda, Go Bot, Katie Hockman, Robert Griesemer, golang-co...@googlegroups.com

Attention is currently required from: Robert Griesemer, Katie Hockman, Russ Cox.

Patch set 2:Code-Review +2

View Change

1 comment:

Patchset:
- Patch Set #2:
  Haven't checked that the comments match the algorithm line by line, but they generally make sense, help with understanding the file, seem at the very least more correct than the old ones, and don't change the code, so LGTM.

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Katie Hockman (Gerrit)

unread,

Jun 2, 2021, 1:37:43 PM6/2/21

to Russ Cox, goph...@pubsubhelper.golang.org, Filippo Valsorda, Go Bot, Robert Griesemer, golang-co...@googlegroups.com

Attention is currently required from: Robert Griesemer, Russ Cox.

View Change

2 comments:

Patchset:
- Patch Set #2:
  Sorry for the delay in reviewing this. I planned on looking at it this week or next. 👍
File src/math/big/natdiv.go:
- Patch Set #2, Line 125: an
  a

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Katie Hockman (Gerrit)

unread,

Jun 3, 2021, 11:18:39 AM6/3/21

to Russ Cox, goph...@pubsubhelper.golang.org, Filippo Valsorda, Go Bot, Robert Griesemer, golang-co...@googlegroups.com

Attention is currently required from: Robert Griesemer, Russ Cox.

View Change

10 comments:

Patchset:
- Patch Set #2:
  Sending over a few comments. I'm about halfway through reading it, and will hopefully get to the rest this afternoon.
File src/math/big/natdiv.go:
- Patch Set #2, Line 60: B
  You haven't yet defined B. Can you add a note somewhere above this section (ie. B is the base of a digit)?
- Patch Set #2, Line 106: uₙ₋₁
  This might just be strange gerrit formatting, but the vertical alignment is a bit off. It looks a bit like this is saying vₙ-1 in this view
- Patch Set #2, Line 126: most twice
  (Proof below.) ?
- Patch Set #2, Line 135:
```
same
small factor
```
  what small factor?
- Patch Set #2, Line 152: v never does
  Maybe elaborate why? It's at least not immediately apparent to me. I'm assuming we can make that assumption based on vₙ₋₁ ≥ B/2, but maybe a quick sentence explaining would be nice
- Patch Set #2, Line 169: most most
  just one most
- Patch Set #2, Line 223: 2-by-1-wide digit
  2-by-1-wide-digit
- Patch Set #2, Line 255: alternative
  Can you make it even more clear that we don't do this right now? e.g. "An alternative, which this code doesn't do, would be to look ahead..."
- Patch Set #2, Line 330: at most t
  nit: "x/y is at most t" is counter to "x/y < t", since according to that, x/y can't actually be t. Same with y1. Perhaps reword to "less than t".

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Katie Hockman (Gerrit)

unread,

Jun 4, 2021, 2:36:00 PM6/4/21

to Russ Cox, goph...@pubsubhelper.golang.org, Filippo Valsorda, Go Bot, Robert Griesemer, golang-co...@googlegroups.com

Attention is currently required from: Robert Griesemer, Russ Cox.

Patch set 2:Run-TryBot +1Code-Review +2Trust +1

View Change

6 comments:

File src/math/big/natdiv.go:
- Patch Set #2, Line 310: Karatsuba's algorithm
  Worth giving a summary of Karatsuba multiplication here, or just leave it up to the reader/references?
- Patch Set #2, Line 338: ]
  nit: looks like this is a square bracket, ], instead of a floor, ⌋
- Patch Set #2, Line 369: q̂·y₁ = ⌊x₁/y₁⌋·y₁ [definition of q̂]
  Consider adding one line above this which repeats "q̂ = ⌊x₁/y₁⌋", since it looks like this is actually the definition of q̂ multiplied by y₁ on both sides
  Same with the others below, just restating the starting equation helps reduce the cognitive load of the following steps imo
- Patch Set #2, Line 381: v
  y
- Patch Set #2, Line 386: >
  How did you get from "q̂ - q ≥ -1" (greater than or equal) in the final step, to "q̂ - q > -1" (greater than) here?
- Patch Set #2, Line 756: rename B to Wide, to avoid confusion with _B,
  yes please 👍

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Marios Nikolaou (Gerrit)

unread,

Jun 5, 2021, 12:33:34 AM6/5/21

to Russ Cox, goph...@pubsubhelper.golang.org, Katie Hockman, Filippo Valsorda, Go Bot, Robert Griesemer, golang-co...@googlegroups.com

Attention is currently required from: Robert Griesemer, Russ Cox.

View Change

1 comment:

File src/math/big/natdiv.go:
- Patch Set #2, Line 473: implmenenting
  implementing

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Russ Cox (Gerrit)

unread,

Jun 9, 2021, 10:52:48 AM6/9/21

to Russ Cox, goph...@pubsubhelper.golang.org, Marios Nikolaou, Katie Hockman, Filippo Valsorda, Go Bot, Robert Griesemer, golang-co...@googlegroups.com

Attention is currently required from: Marios Nikolaou, Robert Griesemer, Katie Hockman.

View Change

15 comments:

File src/math/big/natdiv.go:
- Patch Set #2, Line 60: B
  You haven't yet defined B. Can you add a note somewhere above this section (ie. […]
  Done

- Patch Set #2, Line 106: uₙ₋₁

- This might just be strange gerrit formatting, but the vertical alignment is a bit off. […]
  Indeed. How good it looks depends on how well the particular Unicode font was put together. I don't really understand why Gerrit's font seems to put the subscript n and the subscript 1 on different baselines.

- Patch Set #2, Line 126: most twice

- (Proof below. […]
  Done

- Patch Set #2, Line 135:
```
same
small factor
```
  what small factor?

- Done. (The wording was confusing. Changed to "the right amount", where that gets elaborated after the example.)

- Patch Set #2, Line 152: v never does

- Maybe elaborate why? It's at least not immediately apparent to me. […]
  Done

- Patch Set #2, Line 169: most most
  just one most

- Done

- Patch Set #2, Line 223: 2-by-1-wide digit
  2-by-1-wide-digit

- Done

- Patch Set #2, Line 255: alternative

- Can you make it even more clear that we don't do this right now? e.g. […]
  Done (text added after the example).

- Patch Set #2, Line 310: Karatsuba's algorithm

- Worth giving a summary of Karatsuba multiplication here, or just leave it up to the reader/reference […]
  Added a pointer to func karatsuba in nat.go but I will leave the rest of the algorithm to that comment or Google.

- Patch Set #2, Line 330: at most t

- nit: "x/y is at most t" is counter to "x/y < t", since according to that, x/y can't actually be t. […]
  Done

- Patch Set #2, Line 338: ]
  nit: looks like this is a square bracket, ], instead of a floor, ⌋

- Done

- Patch Set #2, Line 369: q̂·y₁ = ⌊x₁/y₁⌋·y₁ [definition of q̂]

- Consider adding one line above this which repeats "q̂ = ⌊x₁/y₁⌋", since it looks like this is actual […]
  Added that to the [explanation], thanks.

- Patch Set #2, Line 381: v
  y

- Done

- Patch Set #2, Line 386: >

- How did you get from "q̂ - q ≥ -1" (greater than or equal) in the final step, to "q̂ - q > -1" (grea […]
  q̂ - q = ... ≥ ... = ... ≥ ... ≥ ... = ... = ... > ... ≥ -1
  collapses to q̂ - q > -1 because of the > at the next to last step.

- Patch Set #2, Line 473: implmenenting
  implementing

- Done

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Russ Cox (Gerrit)

unread,

Jun 9, 2021, 10:52:48 AM6/9/21

to Russ Cox, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Marios Nikolaou, Robert Griesemer, Katie Hockman.

Russ Cox uploaded patch set #3 to this change.

View Change

math/big: comment division

The comments in the code refer to Knuth and to Burnikel and Ziegler,
but Knuth's presentation is inscrutable, and our recursive division
code does not bear much resemblance to Burnikel and Ziegler's paper
(which is fine, ours is nicer).

Add a standalone explanation of division instead of referring to
difficult or not-directly-used references.

Change-Id: Ic1b35dc167fb29a69ee00e0b4a768ac9cc9e1324
---
M src/math/big/natdiv.go

1 file changed, 611 insertions(+), 73 deletions(-)

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Katie Hockman (Gerrit)

unread,

Jun 9, 2021, 11:02:41 AM6/9/21

to Russ Cox, goph...@pubsubhelper.golang.org, Marios Nikolaou, Filippo Valsorda, Go Bot, Robert Griesemer, golang-co...@googlegroups.com

Attention is currently required from: Marios Nikolaou, Robert Griesemer, Russ Cox.

Patch set 3:Code-Review +2Trust +1

View Change

2 comments:

File src/math/big/natdiv.go:
- Patch Set #2, Line 125: an
  a
  (just bumping this up)

- Patch Set #2, Line 369: q̂·y₁ = ⌊x₁/y₁⌋·y₁ [definition of q̂]

- Added that to the [explanation], thanks.
  Yeah that looks better, thanks!

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Russ Cox (Gerrit)

unread,

Jun 9, 2021, 11:08:46 AM6/9/21

to Russ Cox, goph...@pubsubhelper.golang.org, Go Bot, Marios Nikolaou, Katie Hockman, Filippo Valsorda, Robert Griesemer, golang-co...@googlegroups.com

Attention is currently required from: Marios Nikolaou, Robert Griesemer.

View Change

1 comment:

File src/math/big/natdiv.go:

- Patch Set #2, Line 125: an

- (just bumping this up)
  Done, thanks.

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Russ Cox (Gerrit)

unread,

Jun 9, 2021, 11:08:57 AM6/9/21

to Russ Cox, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Marios Nikolaou, Robert Griesemer.

Russ Cox uploaded patch set #4 to this change.

View Change

math/big: comment division

The comments in the code refer to Knuth and to Burnikel and Ziegler,
but Knuth's presentation is inscrutable, and our recursive division
code does not bear much resemblance to Burnikel and Ziegler's paper
(which is fine, ours is nicer).

Add a standalone explanation of division instead of referring to
difficult or not-directly-used references.

Change-Id: Ic1b35dc167fb29a69ee00e0b4a768ac9cc9e1324
---
M src/math/big/natdiv.go
1 file changed, 611 insertions(+), 73 deletions(-)

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Russ Cox (Gerrit)

unread,

Jun 9, 2021, 11:09:22 AM6/9/21

to Russ Cox, goph...@pubsubhelper.golang.org, golang-...@googlegroups.com, Go Bot, Marios Nikolaou, Katie Hockman, Filippo Valsorda, Robert Griesemer, golang-co...@googlegroups.com

Russ Cox submitted this change.

View Change

Approvals:
  Filippo Valsorda: Looks good to me, approved
  Katie Hockman: Looks good to me, approved; Trusted
  Russ Cox: Trusted; Run TryBots

math/big: comment division

The comments in the code refer to Knuth and to Burnikel and Ziegler,
but Knuth's presentation is inscrutable, and our recursive division
code does not bear much resemblance to Burnikel and Ziegler's paper
(which is fine, ours is nicer).

Add a standalone explanation of division instead of referring to
difficult or not-directly-used references.

Change-Id: Ic1b35dc167fb29a69ee00e0b4a768ac9cc9e1324

Reviewed-on: https://go-review.googlesource.com/c/go/+/321078
Trust: Russ Cox <r...@golang.org>
Trust: Katie Hockman <ka...@golang.org>
Run-TryBot: Russ Cox <r...@golang.org>
Reviewed-by: Katie Hockman <ka...@golang.org>
Reviewed-by: Filippo Valsorda <fil...@golang.org>


---
M src/math/big/natdiv.go
1 file changed, 611 insertions(+), 73 deletions(-)

diff --git a/src/math/big/natdiv.go b/src/math/big/natdiv.go
index 1330990..882bb6d 100644
--- a/src/math/big/natdiv.go
+++ b/src/math/big/natdiv.go
@@ -2,10 +2,506 @@

+Division of u with n+m digits by v with n digits (in base B) can in general
+produce at most m+1 digits, because:

+This gives us a way to compute q. First compute q̂ with 2-by-1-digit division.
+Then, while u < q̂·v, decrement q̂; this loop executes at most twice, because
+q̂ ≤ q+2.


+
+
+Scaling Inputs
+
+The Good Guess Guarantee requires that the top digit of v (vₙ₋₁) be at least B/2.
+For example in base 10, ⌊172/19⌋ = 9, but ⌊18/1⌋ = 18: the guess is wildly off
+because the first digit 1 is smaller than B/2 = 5.
+

+We can ensure that v has a large top digit by multiplying both u and v by the
+right amount. Continuing the example, if we multiply both 172 and 19 by 3, we
+now have ⌊516/57⌋, the leading digit of v is now ≥ 5, and sure enough


+⌊51/5⌋ = 10 is much closer to the correct answer 9. It would be easier here
+to multiply by 4, because that can be done with a shift. Specifically, we can
+always count the number of leading zeros i in the first digit of v and then
+shift both u and v left by i bits.
+
+Having scaled u and v, the value ⌊u/v⌋ is unchanged, but the remainder will
+be scaled: 172 mod 19 is 1, but 516 mod 57 is 3. We have to divide the remainder
+by the scaling factor (shifting right i bits) when we finish.
+
+Note that these shifts happen before and after the entire division algorithm,
+not at each step in the per-digit iteration.
+
+Note the effect of scaling inputs on the size of the possible quotient.

+In the scaled u/v, u can gain a digit from scaling; v never does, because we
+pick the scaling factor to make v's top digit larger but without overflowing.
+If u and v have n+m and n digits after scaling, then:


+
+  • u < B^(n+m)               [B^(n+m) has n+m+1 digits]
+  • v ≥ B^n / 2               [vₙ₋₁ ≥ B/2, so vₙ₋₁·B^(n-1) ≥ B^n/2]
+  • u/v < B^(n+m) / (B^n / 2) [divide bounds for u, v]
+  • u/v < 2 B^m               [simplify]
+
+The quotient can still have m+1 significant digits, but if so the top digit
+must be a 1. This provides a different way to handle the first digit of the
+result: compare the top n digits of u against v and fill in either a 0 or a 1.
+
+
+Refining Guesses
+
+Before we check whether u < q̂·v, we can adjust our guess to change it from
+q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ into the refined guess ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋.
+Although not mentioned above, the Good Guess Guarantee also promises that this

+3-by-2-digit division guess is more precise and at most one away from the real
+answer q. The improvement from the 2-by-1 to the 3-by-2 guess can also be done
+without n-digit math.

+the new algorithm must use a 2-by-1-wide-digit division, which is of course

+Today, the code in divRecursiveStep works like the first example. Perhaps in
+the future we will make it work like the alternative, to avoid a special case
+in the final iteration.

+(Karatsuba multiplication is implemented in func karatsuba in nat.go.)


+That makes the overall recursive division algorithm take O(n^1.6) time as well,
+which is an improvement, but again only for large enough numbers.
+
+It is not critical to make sure that every recursion does only two recursive
+calls. While in general the number of recursive calls can change the time
+analysis, in this case doing three calls does not change the analysis:
+
+	T(n) = 3T(n/2) + O(n) + O(n^log₂3)
+
+ends up being T(n) = O(n^log₂3). Because the Karatsuba multiplication taking
+time O(n^log₂3) is itself doing 3 half-sized recursions, doing three for the
+division does not hurt the asymptotic performance. Of course, it is likely
+still faster in practice to do two.
+
+
+Proof of the Good Guess Guarantee
+

+Given numbers x, y, let us break them into the quotients and remainders when
+divided by some scaling factor S, with the added constraints that the quotient
+x/y and the high part of y are both less than some limit T, and that the high
+part of y is at least half as big as T.
+
+	x₁ = ⌊x/S⌋        y₁ = ⌊y/S⌋
+	x₀ = x mod S      y₀ = y mod S
+
+	x  = x₁·S + x₀    0 ≤ x₀ < S    x/y < T
+	y  = y₁·S + y₀    0 ≤ y₀ < S    T/2 ≤ y₁ < T


+
+And consider the two truncated quotients:
+
+	q = ⌊x/y⌋
+	q̂ = ⌊x₁/y₁⌋
+
+We will prove that q ≤ q̂ ≤ q+2.
+

+The guarantee makes no real demands on the scaling factor S: it is simply the
+magnitude of the digits cut from both x and y to produce x₁ and y₁.
+The guarantee makes only limited demands on T: it must be large enough to hold
+the quotient x/y, and y₁ must have roughly the same size.


+
+To apply to the earlier discussion of 2-by-1 guesses in long division,
+we would choose:
+

+	S  = Bⁿ⁻¹
+	T  = B


+	x  = u
+	x₁ = uₙuₙ₋₁
+	x₀ = uₙ₋₂...u₀
+	y  = v
+	y₁ = vₙ₋₁
+	y₀ = vₙ₋₂...u₀
+
+These simpler variables avoid repeating those longer expressions in the proof.
+
+Note also that, by definition, truncating division ⌊x/y⌋ satisfies
+
+	x/y - 1 < ⌊x/y⌋ ≤ x/y.
+
+This fact will be used a few times in the proofs.
+
+Proof that q ≤ q̂:
+

+	q̂·y₁ = ⌊x₁/y₁⌋·y₁                      [by definition, q̂ = ⌊x₁/y₁⌋]
+	     > (x₁/y₁ - 1)·y₁                  [x₁/y₁ - 1 < ⌊x₁/y₁⌋]


+	     = x₁ - y₁                         [distribute y₁]
+
+	So q̂·y₁ > x₁ - y₁.
+	Since q̂·y₁ is an integer, q̂·y₁ ≥ x₁ - y₁ + 1.
+

+	q̂ - q = q̂ - ⌊x/y⌋                      [by definition, q = ⌊x/y⌋]
+	      ≥ q̂ - x/y                        [⌊x/y⌋ < x/y]


+	      = (1/y)·(q̂·y - x)                [factor out 1/y]

+	      ≥ (1/y)·(q̂·y₁·S - x)             [y = y₁·S + y₀ ≥ y₁·S]
+	      ≥ (1/y)·((x₁ - y₁ + 1)·S - x)    [above: q̂·y₁ ≥ x₁ - y₁ + 1]
+	      = (1/y)·(x₁·S - y₁·S + S - x)    [distribute S]
+	      = (1/y)·(S - x₀ - y₁·S)          [-x = -x₁·S - x₀]
+	      > -y₁·S / y                      [x₀ < S, so S - x₀ < 0; drop it]
+	      ≥ -1                             [y₁·S ≤ y]


+
+	So q̂ - q > -1.
+	Since q̂ - q is an integer, q̂ - q ≥ 0, or equivalently q ≤ q̂.
+
+Proof that q̂ ≤ q+2:
+

+	x₁/y₁ - x/y = x₁·S/y₁·S - x/y          [multiply left term by S/S]
+	            ≤ x/y₁·S - x/y             [x₁S ≤ x]
+	            = (x/y)·(y/y₁·S - 1)       [factor out x/y]
+	            = (x/y)·((y - y₁·S)/y₁·S)  [move -1 into y/y₁·S fraction]
+	            = (x/y)·(y₀/y₁·S)          [y - y₁·S = y₀]
+	            = (x/y)·(1/y₁)·(y₀/S)      [factor out 1/y₁]
+	            < (x/y)·(1/y₁)             [y₀ < S, so y₀/S < 1]
+	            ≤ (x/y)·(2/T)              [y₁ ≥ T/2, so 1/y₁ ≤ 2/T]
+	            < T·(2/T)                  [x/y < T]
+	            = 2                        [T·(2/T) = 2]


+
+	So x₁/y₁ - x/y < 2.
+

+	q̂ - q = ⌊x₁/y₁⌋ - q                    [by definition, q̂ = ⌊x₁/y₁⌋]
+	      = ⌊x₁/y₁⌋ - ⌊x/y⌋                [by definition, q = ⌊x/y⌋]


+	      ≤ x₁/y₁ - ⌊x/y⌋                  [⌊x₁/y₁⌋ ≤ x₁/y₁]
+	      < x₁/y₁ - (x/y - 1)              [⌊x/y⌋ > x/y - 1]
+	      = (x₁/y₁ - x/y) + 1              [regrouping]
+	      < 2 + 1                          [above: x₁/y₁ - x/y < 2]
+	      = 3
+
+	So q̂ - q < 3.
+	Since q̂ - q is an integer, q̂ - q ≤ 2.
+

+Note that when x/y < T/2, the bounds tighten to x₁/y₁ - x/y < 1 and therefore


+q̂ - q ≤ 1.
+
+Note also that in the general case 2n-by-n division where we don't know that

+x/y < T, we do know that x/y < 2T, yielding the bound q̂ - q ≤ 4. So we could


+remove the special case first step of long division as long as we allow the
+first fixup loop to run up to four times. (Using a simple comparison to decide
+whether the first digit is 0 or 1 is still more efficient, though.)
+
+Finally, note that when dividing three leading base-B digits by two (scaled),

+we have T = B² and x/y < B = T/B, a much tighter bound than x/y < T.


+This in turn yields the much tighter bound x₁/y₁ - x/y < 2/B. This means that
+⌊x₁/y₁⌋ and ⌊x/y⌋ can only differ when x/y is less than 2/B greater than an
+integer. For random x and y, the chance of this is 2/B, or, for large B,
+approximately zero. This means that after we produce the 3-by-2 guess in the
+long division algorithm, the fixup loop essentially never runs.
+
+In the recursive algorithm, the extra digit in (2·⌊n/2⌋+1)-by-(⌊n/2⌋+1)-digit
+division has exactly the same effect: the probability of needing a fixup is the

+same 2/B. Even better, we can allow the general case x/y < 2T and the fixup

+implementing division by an n-digit divisor using recursive calls to division


+by an n/2-digit divisor, relying on Karatsuba multiplication to yield a
+sub-quadratic run time. However, the presentation decisions are made almost
+entirely for the purpose of simplifying the run-time analysis, rather than
+simplifying the presentation. Instead of a single algorithm that loops over
+quotient digits, the paper presents two mutually-recursive algorithms, for
+2n-by-n and 3n-by-2n. The paper also does not present any general (n+m)-by-n
+algorithm.
+
+The proofs in the paper are remarkably complex, especially considering that
+the algorithm is at its core just long division on wide digits, so that the
+usual long division proofs apply essentially unaltered.
+*/
+
 package big
 
 import "math/bits"
 
+// div returns q, r such that q = ⌊u/v⌋ and r = u%v = u - q·v.
+// It uses z and z2 as the storage for q and r.
 func (z nat) div(z2, u, v nat) (q, r nat) {
 	if len(v) == 0 {
 		panic("division by zero")

@@ -18,6 +514,8 @@


 	}
 
 	if len(v) == 1 {
+		// Short division: long optimized for a single-word divisor.
+		// In that case, the 2-by-1 guess is all we need at each step.
 		var r2 Word
 		q, r2 = z.divW(u, v[0])
 		r = z2.setWord(r2)

@@ -28,7 +526,9 @@


 	return
 }
 
-// q = (x-r)/y, with 0 <= r < y
+// divW returns q, r such that q = ⌊x/y⌋ and r = x%y = x - q·y.
+// It uses z as the storage for q.
+// Note that y is a single digit (Word), not a big number.
 func (z nat) divW(x nat, y Word) (q nat, r Word) {
 	m := len(x)
 	switch {

@@ -56,6 +556,8 @@


 	return divWVW(q, 0, x, d)
 }
 
+// divWVW overwrites z with ⌊x/y⌋, returning the remainder r.
+// The caller must ensure that len(z) = len(x).
 func divWVW(z []Word, xn Word, x []Word, y Word) (r Word) {
 	r = xn
 	if len(x) == 1 {

@@ -70,34 +572,33 @@

@@ -106,19 +607,17 @@


 	putNat(vp)
 
 	q = q.norm()
+
+	// Undo scaling of remainder.
 	shrVU(u, u, shift)
 	r = u.norm()
 
 	return q, r
 }
 
-// divBasic performs word-by-word division of u by v.
-// The quotient is written in pre-allocated q.
-// The remainder overwrites input u.
-//
-// Precondition:
-// - q is large enough to hold the quotient u / v
-//   which has a maximum length of len(u)-len(v)+1.
+// divBasic implements long division as described above.
+// It overwrites q with ⌊u/v⌋ and overwrites u with the remainder r.
+// q must be large enough to hold ⌊u/v⌋.
 func (q nat) divBasic(u, v nat) {
 	n := len(v)
 	m := len(u) - n

@@ -126,45 +625,56 @@

@@ -176,6 +686,8 @@


 			qhat--
 		}
 
+		// Save quotient digit.
+		// Caller may know the top digit is zero and not leave room for it.
 		if j == m && m == len(q) && qhat == 0 {
 			continue
 		}

@@ -185,30 +697,34 @@

@@ -217,72 +733,92 @@

@@ -309,6 +845,8 @@


 		j -= B
 	}
 
+	// TODO(rsc): Rewrite loop as described above and delete all this code.
+
 	// Now u < (v<<B), compute lower bits in the same way.
 	// Choose shift = B-1 again.
 	s := B - 1

3 is the latest approved patch-set.
The change was submitted with unreviewed changes in the following files:

The name of the file: src/math/big/natdiv.go
Insertions: 2, Deletions: 2.

```
@@ -124:125, +124:125 @@
-  This gives us an way to compute q. First compute q̂ with 2-by-1-digit division.
+  This gives us a way to compute q. First compute q̂ with 2-by-1-digit division.

@@ -183:184, +183:184 @@
-  	                        = (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂
+  	                      = (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂

```

To view, visit change 321078. To unsubscribe, or for help writing mail filters, visit settings.

Reply all

Reply to author

Forward

0 new messages