Fast big integer division

Branimir Maksimovic

unread,

Apr 12, 2010, 4:03:03 PM4/12/10

to

Starting point is:
http://www.x86asm.net/articles/working-with-big-numbers-using-x86-instructions/

So case when one of numbers fits in register is trivial.
Multiplications (karatzuba variant)
y1 = x1*2^32 + n1
y2 = x2*2^32 + n2

y1*y2 = (x1*2^32 + n1) * (x2*2^32+n2)
= x1*2^32*x2*2^32 + x1*n2*2^32 + x2*n1*2^32 + n1*n2

= x1*x2 * 2^32 * 2^32 + (x1*n2+x2*n1)*2^32 + n1*n2

x1*n2 (trivial) x2*n1 (trivial) n1*n2 (trivial)

So this is easy. x1, x2 are big n1,n2 < 2^32
recurse until x1 or x2 <2^32
exponentiation is easy as it is shifting.

Problem is how to do integer division.
I have found newton-rafson method of interpolation,
so you can aproximate division of x1/x2 to x1*1/x2
but that requires floating / fixed point math.
Do someone knows method for fast big integer division
which does not requires floating / fixed point math?
Obvioux method is just to count how many times
x2 fits x1 but that's n^2 and is slow
for really big ints.

Thanks

--
http://maxa.homedns.org/

Sometimes online sometimes not

Nathan Baker

unread,

Apr 12, 2010, 4:16:16 PM4/12/10

to

"Branimir Maksimovic" <bm...@nospicedham.hotmail.com> wrote in message
news:20100412220303.02038de7@maxa...

In "The Art of Assembly Language Programming," Randall devoted an entire
chapter to Multiprecision Arithmetic:

http://homepage.mac.com/randyhyde/webster.cs.ucr.edu/www.artofasm.com/Windows/HTML/AdvancedArithmetic.html#998265

Nathan.

H. Peter Anvin

unread,

Apr 12, 2010, 10:15:48 PM4/12/10

to

On 04/12/2010 01:03 PM, Branimir Maksimovic wrote:
> Do someone knows method for fast big integer division
> which does not requires floating / fixed point math?

Fixed-point is just integer arithmetic with the binary point in a
different place.

-hpa

Robert Redelmeier

unread,

Apr 12, 2010, 11:26:36 PM4/12/10

to

In alt.lang.asm Branimir Maksimovic <bm...@nospicedham.hotmail.com> wrote in part:

> Problem is how to do integer division.

Of course it is. The difficulty of factoring large number
is the basis of public key crypto.

> I have found newton-rafson method of interpolation, so you can
> aproximate division of x1/x2 to x1*1/x2 but that requires floating
> / fixed point math. Do someone knows method for fast big integer
> division which does not requires floating / fixed point math?
> Obvioux method is just to count how many times x2 fits x1 but
> that's n^2 and is slow for really big ints.

There is always the method of long division -- shift, compare
& subtract. About n^1.7

-- Robert

vic

unread,

Apr 13, 2010, 12:04:01 AM4/13/10

to

Branimir Maksimovic wrote:
> Starting point is:
> http://www.x86asm.net/articles/working-with-big-numbers-using-x86-instructions/

> Problem is how to do integer division.

> I have found newton-rafson method of interpolation,
> so you can aproximate division of x1/x2 to x1*1/x2
> but that requires floating / fixed point math.
> Do someone knows method for fast big integer division
> which does not requires floating / fixed point math?
> Obvioux method is just to count how many times
> x2 fits x1 but that's n^2 and is slow
> for really big ints.
> Thanks

Use Newton's method - this comparison says it is actually faster than
the common "schoolboy math" method:

http://www.treskal.com/kalle/exjobb/original-report.pdf

Here is the idea: compute not 1/n, but (2^k)/n for some large k

Let x = (2^k)/n + e, where e is a small error term.

Then n*x2 = (2^2k)/n + (2^(k+1))*e + n*e2

and (2^(k+1))*x - n*x2 = 2^(2k)/n - n*e2

Dividing by 2^k (which is very easy if we choose k to be a multiple of
32) gives :

2x - (n*x2)/(2^k)

approximates (2^k)/n with a smaller error than x does. Iterate until
convergence, which will be rapid if your first estimate for (2^k)/n is
accurate (if 2^m is the largest power of 2 less than or equal to n, then
use 2^(k-m) as a first estimate).

Then p/n = p * (2^k/n) / (2^k)

Vic

H. Peter Anvin

unread,

Apr 13, 2010, 12:11:34 AM4/13/10

to

Where do you get 1.7 from? It really seems O(n^2) to me, or perhaps
more accurately O(nm) with n and m being the lengths of the dividend and
divisor, respectively.

-hpa

Terje Mathisen

unread,

Apr 13, 2010, 12:59:41 AM4/13/10

to

Branimir Maksimovic wrote:
> Problem is how to do integer division. I have found newton-rafson
> method of interpolation, so you can aproximate division of x1/x2 to
> x1*1/x2 but that requires floating / fixed point math.

First of all, the canonical answer is to look at how the Gnu
Multi-Precision package does it!

The obvious method is also close to optimal:

Start with a guess for what y=1/x2 should be, then run log(n) iterations
of N-R to get n bits of precision.

> Do someone knows method for fast big integer division which does not
> requires floating / fixed point math?

The only halfway tricky part is that initial guess, the obvious method
is to approximate x2 with a double precision fp value (simply combine
the leading ~64 bits or so), then calculate the fp reciprocal and
convert back to.

The guess takes O(1) time, then the N-R stages work with twice as many
bits during each iteration, so the sum is simply twice the time of the
final stage, i.e. still O(log(n)).

The final multiplication by the calculated reciprocal is also O(log(n)).

Using this approach a division is only about 3 times slower than a
multiplication.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Alexei A. Frounze

unread,

Apr 13, 2010, 2:27:42 AM4/13/10

to

On Apr 12, 9:59 pm, Terje Mathisen <"terje.mathisen at

And then, I guess, the resultant rounding for the integer case (is it
just 1 ULP if everything's done right?) may be corrected by
multiplying the quotient by the divisor and looking at the difference
between that product and the dividend.

Alex

Terje Mathisen

unread,

Apr 13, 2010, 3:59:09 AM4/13/10

to

Alexei A. Frounze wrote:
> On Apr 12, 9:59 pm, Terje Mathisen<"terje.mathisen at

>> The final multiplication by the calculated reciprocal is also O(log(n)).
>>
>> Using this approach a division is only about 3 times slower than a
>> multiplication.
>
> And then, I guess, the resultant rounding for the integer case (is it
> just 1 ULP if everything's done right?) may be corrected by
> multiplying the quotient by the divisor and looking at the difference
> between that product and the dividend.

Assuming the input numbers are exact integers with n bits, you only need
to calculate the reciprocal with n+2 bits or more, then round that up
(i.e. increment the bottom bit) before doing the multiplication.

The result will be slightly higher than the true result, but the error
must be less than an ulp (i.e. less than 1.0), so you can simply
truncate the final answer at the decimal point and it will be exact!

This is exactly the same algorithm which most compilers use these days
to replace constant division by a reciprocal multiplication and a shift.

Robert Redelmeier

unread,

Apr 13, 2010, 8:07:54 AM4/13/10

to

In alt.lang.asm H. Peter Anvin <h...@zytor.com> wrote in part:

>> There is always the method of long division -- shift, compare
>> & subtract. About n^1.7
>>
>
> Where do you get 1.7 from? It really seems O(n^2) to me,
> or perhaps more accurately O(nm) with n and m being the
> lengths of the dividend and divisor, respectively.

On ~one-half the compares, you bail early (~half-way).
Maybe that should be n^1.5.

-- Robert

Terje Mathisen

unread,

Apr 13, 2010, 9:34:17 AM4/13/10

to

If you want to implement this, you should at least try something like
SRT, i.e. an algorithm that gets you more than a single bit/iteration!

The logical starting point is simply to get ~32 bits or ~53
bits/iteration by doing a reciprocal multiplication of the top part of
the remainder, getting an estimate, then back-multiply and subtract.

OTOH, any kind of long division will still be O(n^2), just with slightly
different constant terms.

Wolfgang Ehrhardt

unread,

Apr 13, 2010, 12:26:37 PM4/13/10

to

On Tue, 13 Apr 2010 15:34:17 +0200, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:

>Robert Redelmeier wrote:
>> In alt.lang.asm H. Peter Anvin<h...@zytor.com> wrote in part:
>>>> There is always the method of long division -- shift, compare
>>>> & subtract. About n^1.7
>>>>
>>>
>>> Where do you get 1.7 from? It really seems O(n^2) to me,
>>> or perhaps more accurately O(nm) with n and m being the
>>> lengths of the dividend and divisor, respectively.
>>
>> On ~one-half the compares, you bail early (~half-way).
>> Maybe that should be n^1.5.
>
>If you want to implement this, you should at least try something like
>SRT, i.e. an algorithm that gets you more than a single bit/iteration!
>
>The logical starting point is simply to get ~32 bits or ~53
>bits/iteration by doing a reciprocal multiplication of the top part of
>the remainder, getting an estimate, then back-multiply and subtract.
>
>OTOH, any kind of long division will still be O(n^2), just with slightly
>different constant terms.

This is not quite correct, if you mean with "long division" unbalanced
2m by m division; See e.g. C. Burnikel, J. Ziegler: Fast Recursive
Division. MPI Informatik, Forschungsbericht MPI-I-98-1-022 (1998) or
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.565>

But this is normally not done with assembler, at least not the high
level recursive algorithm.

Wolfgang

Terje Mathisen

unread,

Apr 13, 2010, 3:14:51 PM4/13/10

to

Wolfgang Ehrhardt wrote:
> On Tue, 13 Apr 2010 15:34:17 +0200, Terje Mathisen<"terje.mathisen at

>> OTOH, any kind of long division will still be O(n^2), just with slightly
>> different constant terms.
>
> This is not quite correct, if you mean with "long division" unbalanced
> 2m by m division; See e.g. C. Burnikel, J. Ziegler: Fast Recursive

Of course not!

When I wrote "long div" I meant any algorithm which is similar to what
we all learned to use with pen & paper.

Most recursive (Karatsuba?) algorithms get close to O(n*log(n)), while
the fastest for really big numbers will all use FFT-based
multiplications. (Also O(n*log(n))

> Division. MPI Informatik, Forschungsbericht MPI-I-98-1-022 (1998) or
> <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.565>
>
> But this is normally not done with assembler, at least not the high
> level recursive algorithm.

:-)

io_x

unread,

Apr 14, 2010, 2:48:46 PM4/14/10

to

"Branimir Maksimovic" <bm...@nospicedham.hotmail.com> ha scritto nel messaggio
news:20100412220303.02038de7@maxa...

>
> Starting point is:
> http://www.x86asm.net/articles/working-with-big-numbers-using-x86-instructions/
>
> So case when one of numbers fits in register is trivial.
> Multiplications (karatzuba variant)
> y1 = x1*2^32 + n1
> y2 = x2*2^32 + n2
>
> y1*y2 = (x1*2^32 + n1) * (x2*2^32+n2)
> = x1*2^32*x2*2^32 + x1*n2*2^32 + x2*n1*2^32 + n1*n2
>
> = x1*x2 * 2^32 * 2^32 + (x1*n2+x2*n1)*2^32 + n1*n2
>
> x1*n2 (trivial) x2*n1 (trivial) n1*n2 (trivial)
>
> So this is easy. x1, x2 are big n1,n2 < 2^32
> recurse until x1 or x2 <2^32
> exponentiation is easy as it is shifting.

pheraps it is better i shut up because my rutines are not so fast

i use the paper mult mod 0FFFFFFFFh
[so the base of the number is 0FFFFFFFFh]
with the "paper mult" means the way someone multiply 2 decimal
numbers in the paper with the pencil.

not understand karasuba above

> Problem is how to do integer division.
> I have found newton-rafson method of interpolation,
> so you can aproximate division of x1/x2 to x1*1/x2
> but that requires floating / fixed point math.

> Do someone knows method for fast big integer division
> which does not requires floating / fixed point math?

yes i use paper div mod 0FFFFFFFFh
[so the base of the number is 0FFFFFFFFh]
how did i write these? i don't know, it seems all going well
for miracle, but i know can be errors
in these (mult and div) i wrote

Branimir Maksimovic

unread,

Apr 14, 2010, 3:52:43 PM4/14/10

to

On Wed, 14 Apr 2010 20:48:46 +0200
"io_x" <a...@b.c.invalid> wrote:

> "Branimir Maksimovic" <bm...@nospicedham.hotmail.com> ha scritto nel
> messaggio news:20100412220303.02038de7@maxa...
> >
> > Starting point is:
> > http://www.x86asm.net/articles/working-with-big-numbers-using-x86-instructions/
> >
> > So case when one of numbers fits in register is trivial.
> > Multiplications (karatzuba variant)
> > y1 = x1*2^32 + n1
> > y2 = x2*2^32 + n2
> >
> > y1*y2 = (x1*2^32 + n1) * (x2*2^32+n2)
> > = x1*2^32*x2*2^32 + x1*n2*2^32 + x2*n1*2^32 + n1*n2
> >
> > = x1*x2 * 2^32 * 2^32 + (x1*n2+x2*n1)*2^32 + n1*n2
> >
> > x1*n2 (trivial) x2*n1 (trivial) n1*n2 (trivial)
> >
> > So this is easy. x1, x2 are big n1,n2 < 2^32
> > recurse until x1 or x2 <2^32
> > exponentiation is easy as it is shifting.
>
> pheraps it is better i shut up because my rutines are not so fast
>
> i use the paper mult mod 0FFFFFFFFh
> [so the base of the number is 0FFFFFFFFh]
> with the "paper mult" means the way someone multiply 2 decimal
> numbers in the paper with the pencil.

Same thing is karatsuba metod.
y is represented as x time some base exponentiated to some degree
in this case 2^32.
This is only needed if you multiply two large numbers.
If one fits in register no problem.

>
> not understand karasuba above

Simple just decompose number with dividend * 2^32 + remainder
(you divide by 2^32 in order to find number which is just shifting)
Recursion is there to simplify number until is small enough
as it is x1*x2 * 2^32 * 2^32 part. So you decompose multiplication
by recursion as x1 or x2 will eventually fit in single register.

This is for multiplication. There is enough posts here so I know
now how to do division.

Thanks to all which replied. Things are clear now.

Thank you.