Computing multiplicative inverses

Colin Plumb

unread,

Apr 6, 1994, 5:31:16 AM4/6/94

to

I got to thinking some more, and wrote down the follwoing ideas. I'm
posting it in the hopes it'll be useful to someone. The Newton's iteration
thing is, IMHO, really nifty.

Robert D. Silverman

unread,

Apr 6, 1994, 12:07:11 PM4/6/94

to

In article <1994Apr6.0...@mnemosyne.cs.du.edu> co...@nyx10.cs.du.edu (Colin Plumb) writes:
:I got to thinking some more, and wrote down the follwoing ideas. I'm

:posting it in the hopes it'll be useful to someone. The Newton's iteration
:thing is, IMHO, really nifty.
:
:More on finding multiplicative inverses
:
:The general method for finding multiplicative inverses (the Extended
:Euclidean Algorithm) involves repeated division. This is annoyingly

Negative. Use Lehmer's method. (see Knuth Vol. 2) it involves
NO division at all; only shifts and add/subtract.
--
Bob Silverman
These are my opinions and not MITRE's.
Mitre Corporation, Bedford, MA 01730
"You can lead a horse's ass to knowledge, but you can't make him think"

Steve Tate

unread,

Apr 6, 1994, 1:18:12 PM4/6/94

to

Colin Plumb (co...@nyx10.cs.du.edu) wrote:
> I got to thinking some more, and wrote down the follwoing ideas. I'm
> posting it in the hopes it'll be useful to someone. The Newton's iteration
> thing is, IMHO, really nifty.

I only scanned through your posting (quite a long Usenet posting... don't
you have a job?? :-) ), but what I saw seemed similar to the following
reference --- if it's not the same, I think you would still find the
following paper interesting (as well as the references that it contains):

M. Mnuk. "A div(n) depth boolean circuit for smooth modular inverse",
Information Processing Letters, Vol. 38, No. 3, May 17 1991, p. 153ff.

A "smooth modular inverse" is one in which all prime factors of the
modulus are "small", which is what you were looking at, I think.

--
Steve Tate --- s...@cs.unt.edu | "A mind is like a parachute; it only
Dept. of Computer Sciences | works when it's open."
University of North Texas | --- Duke Chronicle
Denton, TX 76201 |

Graham Toal

unread,

Apr 6, 1994, 2:06:21 PM4/6/94

to

::More on finding multiplicative inverses

::
::The general method for finding multiplicative inverses (the Extended
::Euclidean Algorithm) involves repeated division. This is annoyingly
:
:Negative. Use Lehmer's method. (see Knuth Vol. 2) it involves
:NO division at all; only shifts and add/subtract.

I don't know if this is relevant to the current discussion, but it's
a nice hack I thought more people would be interested in knowing if
they haven't come across it before.

Way back when, I used to do a little video-game programming on 8-bit
micros. We used all sorts of tricks for getting speed up. Lots of
stuff would be table driven, for instance, but every now and then
you had to do something like a multiply over a range that was too
big for a table. So we'd use the old shift-and-add trick. (Obviously
I'm talking about multiplication by a constant here)

for instance, n * 10 was equivalent to n * 8 + n * 2 which was
n << 3 + n << 1 which was ((n << 2) + n) << 1) which all worked out
very neatly on a processor that could shift by constants quickly, or
add, but not multiply.

Anyway, why am I telling you this, when it's stuff everyone knows?
Well, the same trick isn't quite so obvious for *division* by a
constant!

How we did it was like this: first of all, assume 16 bit ints (you
can scale this up yourself later; in fact we *were* using 16 bit ints
in our space games.) Let's say we want to divide by 10 instead of
multiplying by it... well, n / 10 is n / 10 * 65536/65536 which is
(n * 65536/10) / 65536 - so we'd code up a shift-and-add multiply
routine for multiplying by 6553, then do the divide just by extracting
the top two result bytes from the 4-byte product...

Now, the more astute of you will realise that this doesn't actually
give 100% accurate results near the boundaries, and the rounding error
doesn't always go the way you'd expect, but for what we used it for
it was a useful and very fast technique. I just thought I'd mention
it on the offchance that it might be useful for someone else out there
in some application or other...

G

Colin Plumb

unread,

Apr 6, 1994, 11:41:35 PM4/6/94

to

I ought to add to that post just a few details:

Instead of x += x - a*x*x, you can make the iteration step
x *= (2-a*x) and save one addition per step, reducing it to
two multiplies and one add.

You can also use a third-order recurrence,
t = a*x - 1;
x *= t*t - t - 1;

For an additional two adds and a multiply, this triples the number of
valid bits. If you want, you can use this to take a 3-bit estimate to
a 9-bit estimate and avoid the bit twiddling needed to make a 4-bit
estimate to save an iteration. It depends on your multiply speed.
--
-Colin

Colin Plumb

unread,

Apr 7, 1994, 3:42:24 AM4/7/94

to

In article <2numnf$q...@linus.mitre.org>,

Robert D. Silverman <b...@gauss.mitre.org> wrote:
>:Euclidean Algorithm) involves repeated division. This is annoyingly
>
>Negative. Use Lehmer's method. (see Knuth Vol. 2) it involves
>NO division at all; only shifts and add/subtract.

Um, you've misremembered. Knuth credits the binary method (shifts and
subtracts) to J. Stein in 1961 (J. Comp. Phys. 1 (1967), 397-405).
Lehmer's method (AMM 45 (1938), 227-233) uses the leading digits to
reduce the number of multiple-precision steps, but still uses the classic
divide method, and produces the same quotients.

Exercise 34 (using LEhmer-like methods on the binary algorithm) is credited
to R.W. Gosper.

I'll also have to work through Exercise 35, extending the binary algorithm
to an extended one.

Since you're showing me where I'm stuck everywhere, can you remind me
of any fast high-precision division (ith remainder) algorithms? All
this reduces the need for it considerably, but it's still a useful
primitive. The best I can think of right now for computing a/b is to
use Newton's method to form an estimate of 1/b to sufficiently many
decimal places, then multiply by a to form a quotient estimate, multiply
by b and subtract to form a remainder estimate, and do final corrections.

I thought I remembered something a bit more direct, that didn't require
final corrections, but I can't find the reference now.
--
-Colin

Colin Plumb

unread,

Apr 7, 1994, 3:51:15 AM4/7/94

to

In article <1994040617...@an-teallach.com>,

Graham Toal <gt...@an-teallach.com> wrote:
>Anyway, why am I telling you this, when it's stuff everyone knows?
>Well, the same trick isn't quite so obvious for *division* by a
>constant!
>
>How we did it was like this: first of all, assume 16 bit ints (you
>can scale this up yourself later; in fact we *were* using 16 bit ints
>in our space games.) Let's say we want to divide by 10 instead of
>multiplying by it... well, n / 10 is n / 10 * 65536/65536 which is
>(n * 65536/10) / 65536 - so we'd code up a shift-and-add multiply
>routine for multiplying by 6553, then do the divide just by extracting
>the top two result bytes from the 4-byte product...

This general technique is analyzed in "P.D. Barrett, "Implementing the
Rivest Shamir and Adleman public key encryption algorithm on a standard
digital signal processor," Advances in Cryptology: Proc. Crypto '86,
Lecture Notes in Computer Science 263, Springer-Verlag 1987. (This is
out of another bibliography I have, in the Crypto '93 proceedings,
so it may refer to other papers for the bulk of the work.)

Basically, yes, this works. Apparently something similar is done in the
HP Precision compilers to do division by constants.
--
-Colin

Kurt Huwig

unread,

Apr 11, 1994, 5:25:21 PM4/11/94

to

I used a different method to avoid a long-division. I use the public
exponents 3,5,7,11 and 13, they are all smaller than 16 therefore fit in a
nibble (because I use an 8-bit-uProcessor). No I perform the euclidian
algorithm, dividing by 3 first. If the rest is 0, I'll try 5 and so on.
Just when the modulus is a multiple of 3*5*7*11*13=15015 I have to choose
other primes. If the rest is 1, then I am nearly finished, just calculating
the additive inverse.
In the other cases (remainder > 1) I use a tabular with the factors arising
from the rest of the euclidian algorithm:

xxx = yyy * 13 + 7
13 = 1 * 7 + 6
7 = 1 * 6 + 1

-> 1=...= yyy * -3 * 13 (+ zzz * xxx)
yyy * -3 is the wanted multiplicative inverse for 13.

xxx = modulus
(I've got some (!) problems with my arrow keys, I beg pardon...)

And if you want a big public exponent, exchange it with the secret one, or
multiplikate the public exponent with a big random number and use the modulus
to hide it...
Using this method and an RSA-ASIC for DM 100,- I calculate a random RSA-
keypair within 1-2 seconds on an 8051 uProcessor (similar to an 8086 without
segments), with 8 bit Accumulator and 1 million cycles/second.

I also read parts of the book of Knuth and some others (funniest one was a
speedup using redundant conversions of the divisor, that needed just a 4-bit
lookahead to determine one bit of the result and then one bit more for one
bit more of the divisor). But why this work, if you just need an RSA-key???

Kurt
I think I make myself perfectly clear:

Step 1: Find the plan! Kurt Huwig Universitaet Saarbruecken
Step 2: Save the world!
Step 3: Get out of my house! ku...@stud.uni-sb.de

Let's get crackin'!