Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Searching for primes

2 views

Skip to first unread message

Colin Plumb

unread,

Apr 3, 1994, 6:46:26 PM4/3/94

Fast sieving for prime numbers

I've been working on efficient generation of "strong" primes. That is,
primes where p-1 has a large prime factor. You need to perform a similar
operation in DSS, when you need a 160-bit prime q, and a longer prime p,
for which p-1 is a multiple of q.

So here are some useful techniques:

First of all, trial division by small primes is a very useful thing.
Basically, each prime p will weed out 1/p of the candidates. If this
can be done at a cost of about one multiply, it's significantly
thousand times faster than a Fermat test to the base 2, the cheapest
pseudo-primality test.

Obviously the smaller the prime, the more lucrative the trial division.
(I assume you remove multiples of 2, and maybe other small primes,
directly.)

Testing a 512-bit candidate prime on a 32-bit machine (16 words long)
requires 512 modular squarings. The multiplies are by 2, so can be
simplified to a negligible contribution. Each modular squaring
requires 16*17/2 multiplies to form the product and 16*16 multiplies to
perform modular reduction (assuming Montgomery multiplication; other
algorithms are more expensive). That's a total of (17+32)*8 = 49 * 8 =
392 single-precision multiplies per squaring, and it takes 512 of them
to perform a Fermat test, for a total of about 20,000 multiplies in
total. Plus other overhead. On a 16-bit machine, it's worse by a
factor of 4.

While you want to do further tests to make sure with a high degree of
certainty, for performance estimation purposes, a Fermat test to the
base 2 catches essentially all non-primes.

So consider dealing with a candidate number. Our options are to do a
trial division by p, with cost c1, or not, before proceeding to a more
certain test (further division and/or a stronger test) with cost c2.

If we do the trial division, 1/p of the time the candidate will be
divisible, so the cost to consider the number will be c1. 1-1/p of the
time, it will not prove divisible, so the cost will be c1+c2. The
average is c1+(1-1/p)*c2.

If we do not do it, the cost is simply c2.

We should not do the test when:

c2 < c1+(1-1/p)*c2

<=> p*c2 < p*c1 + (p-1)*c2

<=> c2 < p*c1

<=> p > c2/c1

Given that c2/c1 is 20,000 or more, it is most efficient to do a LOT of
trial division (with all primes up to 20,000 or more!) before falling
back on a Fermat test. (BTW, does anyone know of a test intermediate
in power and expense?)

So now the problem arises, how to do trial division most efficiently.
Well, there's an efficient algorithm for numbers of the form b+k, where
b is a large base and k is a single-precision offset. Searching k
sequentially is acceptable for cryptographic uses, and you can use a
strong random pattern if you'd like instead.

What you do is form a table of remainders, r[i] = b mod p[i], for each
prime in the trial division sieve. Then you examine r[i]+k mod p[i].
If this is 0, the number is divisible and non-primality has been
proved. If non-zero, continue and keep checking.

As long as r[i] + k (which is bounded above by p[i] + k) fits within one
machine word, this requires just one division.

However, on many machines division is a little bit expensive. There is
a way to replace this with multiplication if you like.

Let the word size be b = 2^k. k is typically 16 or 32. p[i] is odd.
Thus, gcd(p[i], b) = 1, and p[i] has a multiplicative inverse mod b.
Call this inverse v[i].

y = x*v[i] mod b is the unique number such that x = y*p[i] mod b. If
y*p[i] < b, then y*p[i] mod b = y*p[i], and y = x/p[i]. This happens
only if x is a multiple of p[i] less than b. In these cases, y =
x*v[i] = x/p[i], and is bounded by 0 <= y <= floor((b-1)/p[i]).

You can easily precompute d[i] = floor((b-1)/p[i]), and thus reduce the
test to see if p[i] divides x to x*v[i] mod b <= d[i]. "mod b" is
trivial to evaluate, so it's just a multiply and a compare to do a
"trial division".

(To compute multiplicative inverses mod b, see my earlier posting on
the subject.)

Now, this works fine as long as the range to be searched for primes
fits into a word size. But if the candidates to be evaluated are
congruent modulo a 160-bit prime (or evan a larger one), there are
problems.

Well, okay, let's consider them. We want to consider numbers of the
form b+k*delta, not just b+k. The condition being tested for is

b + k*delta == 0 (mod p[i]).

<=> r[i] + k*delta == 0 (mod p[i])

Assuming delta is relatively prime to p[i] (a good bet in every
situation I know of - p[i] is usually a large prime itself), then it
has an inverse D[i] mod p[i]. Multiplying both sides by D[i], you have

b*D[i] + k*delta*D[i] == 0 (mod p[i])
<=> b*D[i] + k == 0 (mod p[i])
<=> r[i]*D[i] + k == 0 (mod p[i])

H'm! You can just adjust the base value r[i] = b mod p[i] based on the
delta and then search through a large range of possible k's.

That's quite a nice way to do things.

Actually, there's one more thing to consider. There's the cost of
setting up these tables. For the small primes, which receive a lot of
traffic in the form of candidate primes, as much precomputation as is
possible is worthwhile. But as you get higher, a given prime may see
fewer candidates, so is it worth while? It's one multi/single
precision divide to form the initial base value, several divides to
compute the multiplicative inverse (I have the average case behaviour
of the Euclidean algorithm here somewhere - it's pretty good), another
divide to find the limit (b-1)/p[i], and another multi-precision divide
to deal with the delta.

So taking the 512-bit case above, that's 16 divides plus some change,
and another 16 for a delta. Call it a bit less than 40 in the worst
case.

I need to see how many trial divisions this cost is amortized over
before you find a prime. Well, PGP provides a useful example. When it
generates primes, it prints a . every time a candidate passes the trial
division sieve, and a + every time it passes a stronger
pseudo-primality test. The number of .'s is a good indication of the
number of candidates that must be examined by a trial divisor high in
the table. In the 512-bit range, 40 to 80 .'s before you get to the
end are typical of what I've seen, with outliers in both directions.
So that means that each of the trial divisions get charged with 1 to
1/2 of a division due to setup costs. If a division is a few times the
cost of a multiply, that reduces the value of the highest prime to stop
at from around 20,000 to, say, 5,000. The additional setup to use
multiplies instead of divides is a handful of divides (maybe 6), so
unless they're virtually the same cost, this is an amortized 1/10 of a
division per trial division multiply, and it's a rare machine on which
divisions are no more than 10% more expensive than multiplies.

Projects to be performed in the future are getting actual timings of
these operations on various platforms and checking these theoretical
calculations against reality. Who know, maybe on a high-end
workstation, It's cheaper to load one prime and do a division than to
load an inverse, multiply, and load a comparison value, since memory
bandwidth is the critical resource. But the tables are read sequentially,
so it shouldn't be bad.

Would anyone care to take a crack at improving this analysis?
--
-Colin

Robert D. Silverman

unread,

Apr 5, 1994, 1:54:40 PM4/5/94

In article <1994Apr3.2...@mnemosyne.cs.du.edu> co...@nyx10.cs.du.edu (Colin Plumb) writes:
:Fast sieving for prime numbers

:
:I've been working on efficient generation of "strong" primes. That is,
:primes where p-1 has a large prime factor. You need to perform a similar

(1) Please define large. Please tell us why you are concerned about whether p-1
has a large prime factor. I suspect I know the reason. If my suspicion is
true, your concerns are not relevent.
(2) The reason for "strong" prime generation does not exist anymore.
When people were suggesting use of 256 bit keys for RSA there was
a remote possibility that the Pollard P-1 algorithm might be used
to attack a key. Today it is impossible.

To begin with, that in itself was a red herring. If the key is 256
bits, the factors are around 120 bits. For P-1 to be effective,
ALL the prime factors of p-1 must be less than about 10^8, with
perhaps a single additional factor less than 10^14. (assuming an
FFT extension to the algorithm). The chances of this happening are
vanishingly small. The largest factor EVER found with P-1 has 34 decimal
digits. This is 109 bits; still 10 bits smaller than one gets with
a 256-bit key.

For larger keys, the chance that p-1 has no prime factor greater
than 10^14 is virtually nonexistent.

:operation in DSS, when you need a 160-bit prime q, and a longer prime p,

:for which p-1 is a multiple of q.

:

rest deleted....

(3) Finally, the rest of what you propose is unnecessary.

ECM renders the P-1 algorithm obsolete. And finding a "strong" prime
by your definition does not protect from an ECM attack getting lucky.
It is about as likely that an ECM attack will succeed as the chance that
a random 100-digit prime can be found by P-1.

It is possible to generate large primes very fast in such a way that
you can prove they are prime very quickly. Take two moderate primes
p and q [moderate means p is (say) 20-30 digits and q is ~80 digits].
Now simply look at numbers of the form rpq + 1 and rpq - 1 for small
to moderate r. Consider only those r such that rpq + 1 and rpq -1
are not divisible by small primes. If you suspect one of these is prime,
then it is trivial to construct a primality proof since you have a
complete factorization of p-1 or p+1.

If p ~ 20 digits and q ~80 digits isn't good enough, use slightly
larger. The potential values of r which should be tried can be
constructed as a wheel.

:So here are some useful techniques:

:
:First of all, trial division by small primes is a very useful thing.
:Basically, each prime p will weed out 1/p of the candidates. If this

Negative. Each prime does NOT weed out 1/p. For example, if you sieve
by 5, you must remember that 1/3 of the numbers you hit have already been
sieved out by 3. The fraction is really: (1-1/2)(1-1/3)(1-1/5).....(1-1/p)
By Mertens' theorem this is approximately exp(-gamma)/log p.

:can be done at a cost of about one multiply, it's significantly

:thousand times faster than a Fermat test to the base 2, the cheapest

One is usually doing these divisions on multi-precise numbers. They
are significantly more expensive than one multiply.

:pseudo-primality test.

:
:Obviously the smaller the prime, the more lucrative the trial division.

Secondly, even if one wants to do this, one should use a SIEVE, not
trial division.

--
Bob Silverman
These are my opinions and not MITRE's.
Mitre Corporation, Bedford, MA 01730
"You can lead a horse's ass to knowledge, but you can't make him think"

Colin Plumb

unread,

Apr 6, 1994, 5:28:08 AM4/6/94

In article <2ns8l0$i...@linus.mitre.org>,
Robert D. Silverman <b...@gauss.mitre.org> wrote:

Oh, my, Bob Silverman! I'm in distinguished company.

>In article <1994Apr3.2...@mnemosyne.cs.du.edu> co...@nyx10.cs.du.edu (Colin Plumb) writes:
>:Fast sieving for prime numbers
>:
>:I've been working on efficient generation of "strong" primes. That is,
>:primes where p-1 has a large prime factor. You need to perform a similar
>
>(1) Please define large. Please tell us why you are concerned about whether
> p-1 has a large prime factor. I suspect I know the reason. If my
> suspicion is true, your concerns are not relevent.

What you suspect is partially true, which is that the old literature
all recommends it. But the issue which brought it up for me was
DSS, which requires that 160-bit q divide 512-to-1024-bit p-1.

So I looked at PGP's "strong primes" code which does something similar and
thus came up with the terminology you saw.

(Someone else mentioned that the Blum, Blum and Shub generator
has certain properties only when (p-1)/2 and maybe (p-3)/4 are prime,
but I haven't double-checked that. Some related thinking can be applied
to that case, but it's not the same.)

I had heard that it was unnecessary to bother with such testing these days
for RSA or discrete log primes, but thank you for confirming it so strongly.
Could I discuss it with you enough to understand it to the point where I
can explain it to others successfully?

> ECM renders the P-1 algorithm obsolete. And finding a "strong" prime
> by your definition does not protect from an ECM attack getting lucky.
> It is about as likely that an ECM attack will succeed as the chance that
> a random 100-digit prime can be found by P-1.

I'm sorry, but ECM? Elliptic Curve something?

>It is possible to generate large primes very fast in such a way that
>you can prove they are prime very quickly. Take two moderate primes
>p and q [moderate means p is (say) 20-30 digits and q is ~80 digits].
>Now simply look at numbers of the form rpq + 1 and rpq - 1 for small
>to moderate r. Consider only those r such that rpq + 1 and rpq -1
>are not divisible by small primes. If you suspect one of these is prime,
>then it is trivial to construct a primality proof since you have a
>complete factorization of p-1 or p+1.

Yes, I've heard of this. Thank you for refershing my memory.

>If p ~ 20 digits and q ~80 digits isn't good enough, use slightly
>larger. The potential values of r which should be tried can be
>constructed as a wheel.
>

I have to translate everything into bits. That's 70 bits and 270 bits.
Which provides primes up to 340 bits, or up to 380 assuming 30-digit
p and 10-bit r. My stnadard benchmark is 512 bits for RSA and 1024
for Diffie-Hellman primes, so that's a bit small, but yes, I've seen
such techniques. (Although I'm not sure on how to apply them based
on a factorization of p+1 rahter than p-1)

>:First of all, trial division by small primes is a very useful thing.
>:Basically, each prime p will weed out 1/p of the candidates. If this
>
>Negative. Each prime does NOT weed out 1/p. For example, if you sieve
>by 5, you must remember that 1/3 of the numbers you hit have already been
>sieved out by 3. The fraction is really: (1-1/2)(1-1/3)(1-1/5).....(1-1/p)
>By Mertens' theorem this is approximately exp(-gamma)/log p.

Um... yes, each prime does weed out 1/p. That's what you wrote in
your equation and what I said, but maybe English is causing our
problems. Each prime p weeds out 1/p of the numbers *which get that far*.
Testing by 3 passes 2/3 and testing by 5 passes 4/5, so testing by both
passes 8/15, which is what I meant. I wanted the probability that, if
a number was a candidate prime being tested for divisibility by p, it
would be found divisible. Since I'm assuming a non-parallel machine,
trial division will stop at that point.

>:can be done at a cost of about one multiply, it's significantly
>:thousand times faster than a Fermat test to the base 2, the cheapest
>
>One is usually doing these divisions on multi-precise numbers. They
>are significantly more expensive than one multiply.

I was talking about doing one division first and then cacheing the
remainder to use in searching a range. Again, I apologize if I confused
you with my discussion of trial division. Trial division is the
*effect* I'm getting, but the implementation is much more efficient.

>:pseudo-primality test.
>:
>:Obviously the smaller the prime, the more lucrative the trial division.
>
>Secondly, even if one wants to do this, one should use a SIEVE, not
>trial division.

Um... I don't quite understand the distinction. Obviously, the implementation
will be highly optimized, but from a mathematical point of view, it's
exactly equivalent to trial division. If a sieve (and things like the
MPQS have convinced me that it's not obvious what the term means) is
something specific that is more efficient than trial division ==> If
trial division is implemented in such a way that it costs one single-
precision integer multiply per trial <==, then I'm *very* interested.

I asked, you may recall, if there was something more effective than trial
division but less expensive that a Fermat test to the base 2.

You can cut out some of the first trial divisions by, say, starting
on a multiple of 2*3*5*7 = 210 and using a table of the number relative
to that position that are divisible by none of the above (there are
1*2*4*6 = 48, so assuming you were clever enough the first time to
at least try only odd numbers, this eliminates testing for 35 numbers
in the range 1..209 which are divisible by 3, 14 of the remaining 70
which are divisible by 5, and 8 of the remaining 56 which are divisible
by 7, and bothering to test the remaining 48, so it saves 105+70+56 = 231
trial divisions per span of 210 numbers to search, or just over 2 trial
divisions per number tested the more naive way. I'd have to do some
fiddling with the average number of trials per number to see just how
much of a saving that is. Making use of the prime table itself for
this table (you have to add 1 and remove the low primes, but that's not
hard) would let you go up higher, since the table goes at least to
210*11 = 2310 and maybe 2310*13 = 30030.

Again, I emphasize that "trial division" does not describe the
implementation; after some precomputation (which essentially is a trial
division), the cost to attempt one division is one machine-word
multiplication. I thought I described that technique.

Anyway, now that you've torn this parat, maybe you can take a crack
at my next piece on computing multiplicative inverses...

Thanks for your comments!
--
-Colin

Robert D. Silverman

unread,

Apr 6, 1994, 12:01:43 PM4/6/94

In article <1994Apr6.0...@mnemosyne.cs.du.edu> co...@nyx10.cs.du.edu (Colin Plumb) writes:
:In article <2ns8l0$i...@linus.mitre.org>,

:Robert D. Silverman <b...@gauss.mitre.org> wrote:
:
:Oh, my, Bob Silverman! I'm in distinguished company.
:
:>In article <1994Apr3.2...@mnemosyne.cs.du.edu> co...@nyx10.cs.du.edu (Colin Plumb) writes:
:>:Fast sieving for prime numbers
:>:
:>:I've been working on efficient generation of "strong" primes. That is,
:>:primes where p-1 has a large prime factor. You need to perform a similar
:>
:>(1) Please define large. Please tell us why you are concerned about whether
:> p-1 has a large prime factor. I suspect I know the reason. If my
:> suspicion is true, your concerns are not relevent.
:
:What you suspect is partially true, which is that the old literature
:all recommends it. But the issue which brought it up for me was
:DSS, which requires that 160-bit q divide 512-to-1024-bit p-1.

No problem. But 160 bits is still way to big for Pollard P-1 to work,
so you need not worry. [i.e. if your prime p has p-1 divisible by
a 160 bit prime, P-1 will NEVER find it].

stuff deleted...

:>:First of all, trial division by small primes is a very useful thing.

:>:Basically, each prime p will weed out 1/p of the candidates. If this
:>
:>Negative. Each prime does NOT weed out 1/p. For example, if you sieve
:>by 5, you must remember that 1/3 of the numbers you hit have already been
:>sieved out by 3. The fraction is really: (1-1/2)(1-1/3)(1-1/5).....(1-1/p)
:>By Mertens' theorem this is approximately exp(-gamma)/log p.
:
:Um... yes, each prime does weed out 1/p. That's what you wrote in
:your equation and what I said, but maybe English is causing our
:problems. Each prime p weeds out 1/p of the numbers *which get that far*.

This is correct. 2 weeds out half. 3 now weeds out
1/3 *of the remainder*, but only an additional 1/6 of the original set.
I took your "weed out 1/p" to mean that each prime p weeds out an
additional 1/p from the original set of candidates and NOT 1/p from
the candidates that still remain... "Fuzzy English"......
I took "each weeds out 1/p" to mean that after sieving out 2,3,5,...P,
one had eliminated 1/2 + 1/3 + 1/5 +1/7 + ... 1/P of the original
candidates. And that is not true.

:>:can be done at a cost of about one multiply, it's significantly

:>:thousand times faster than a Fermat test to the base 2, the cheapest
:>
:>One is usually doing these divisions on multi-precise numbers. They
:>are significantly more expensive than one multiply.
:
:I was talking about doing one division first and then cacheing the
:remainder to use in searching a range. Again, I apologize if I confused
:you with my discussion of trial division. Trial division is the
:*effect* I'm getting, but the implementation is much more efficient.

Please say exactly what you are doing. If I take the candidate list
{n, n+1, n+2, n+3, n+4 ..... n+h} and divide n by 2,3,5,7....
looking for a zero remainder, then divide n+1 by 2,3,5,7... etc.
I am doing trial division.

If I compute h1 = n mod 2 , h2 = n mod 3, h3 = n mod 5, etc.
then set up an array of locations x[0], x[1], x[2], ... x[h] and
starting at x[2-h1] I put a '1' in x[2-h1], x[2-h1+2], x[2-h1+4]... and
a 1 in x[3-h2], x[3-h2+3], x[3-h2+6]... etc. I am SIEVING out those
n+i in my list that are divisible by 2,3,5,...

This does not involve trial division.

The total cost for this is O(h loglog P) where P is the largest
candidate divisor. You can reduce this using Brent's "wheel" version
of the sieve to essentially O(h(1 + epsilon)).

:>:Obviously the smaller the prime, the more lucrative the trial division.

:>
:>Secondly, even if one wants to do this, one should use a SIEVE, not
:>trial division.
:Um... I don't quite understand the distinction. Obviously, the implementation

See above

:Again, I emphasize that "trial division" does not describe the

:implementation; after some precomputation (which essentially is a trial
:division), the cost to attempt one division is one machine-word
:multiplication. I thought I described that technique.

When done with a sieve, the cost to test ALL the numbers in the
range [n, n+h] for divisibility by p is h/p. That is why sieve
based factoring algorithms are so effective.

One the other hand, your cost to test all the numbers for divisibility
by p [assuming 1 mult = 1 division attempt] is h multiplications.
Multiplication takes at least several cycles for just one of them.
The sieve procedure only takes ONE cycle to put x[i] = 1.

Compare k*h (where k is #cycles/mult) with h/p.

:Anyway, now that you've torn this parat, maybe you can take a crack

:at my next piece on computing multiplicative inverses...

When I see it...

Basically to compute 1/a mod p takes O(log p) multiplications mod p
when using a variation of Euclid's algorithm. If p is a multi-precise
number (very big, say), there are FFT techniques to reduce the cost,
or Lehmer's binary method. See Knuth Vol. 2

Robert I. Eachus

unread,

Apr 6, 1994, 7:17:54 AM4/6/94

In article <1994Apr6.0...@mnemosyne.cs.du.edu> co...@nyx10.cs.du.edu (Colin Plumb) writes:

> (Someone else mentioned that the Blum, Blum and Shub generator
> has certain properties only when (p-1)/2 and maybe (p-3)/4 are prime,
> but I haven't double-checked that. Some related thinking can be applied
> to that case, but it's not the same.)

Probably me. Blum, Blum, and Shub depends on using an N = P * Q,
where P and Q are of the form 4p+3, with p, 2p+1, and 4p+3 all prime.
The proofs of the properties referred to (polynomially-perfect if
factoring is hard, etc.) all depend on that. (For example, if you
can determine a factor of p*q, you obviously can factor N.)

Robert I. Eachus

with Standard_Disclaimer;
use Standard_Disclaimer;
function Message (Text: in Clever_Ideas) return Better_Ideas is...

Colin Plumb

unread,

Apr 7, 1994, 3:25:19 AM4/7/94

In article <2numd7$q...@linus.mitre.org>,

Robert D. Silverman <b...@gauss.mitre.org> wrote:
>In article <1994Apr6.0...@mnemosyne.cs.du.edu> co...@nyx10.cs.du.edu (Colin Plumb) writes:
>:In article <2ns8l0$i...@linus.mitre.org>,
>:Robert D. Silverman <b...@gauss.mitre.org> wrote:
>No problem. But 160 bits is still way to big for Pollard P-1 to work,
>so you need not worry. [i.e. if your prime p has p-1 divisible by
>a 160 bit prime, P-1 will NEVER find it].

Oh, yes, I'm aware of that.

>This is correct. 2 weeds out half. 3 now weeds out
>1/3 *of the remainder*, but only an additional 1/6 of the original set.
>I took your "weed out 1/p" to mean that each prime p weeds out an
>additional 1/p from the original set of candidates and NOT 1/p from
>the candidates that still remain... "Fuzzy English"......
>I took "each weeds out 1/p" to mean that after sieving out 2,3,5,...P,
>one had eliminated 1/2 + 1/3 + 1/5 +1/7 + ... 1/P of the original
>candidates. And that is not true.

Well, obviously. 1/2+1/3+1/5 = 31/30. There aren't many primes left after
that.

>Please say exactly what you are doing. If I take the candidate list
>{n, n+1, n+2, n+3, n+4 ..... n+h} and divide n by 2,3,5,7....
>looking for a zero remainder, then divide n+1 by 2,3,5,7... etc.
>I am doing trial division.

That is what PGP does and I was thinking of doing.
I'm too lazy to serive the math, but doing trial division by every
prime less than 20,000 (2261 of them) involves an average of 75.923
trials per number. And 2.833% of the numbers pass all these trial
divisions.

>If I compute h1 = n mod 2 , h2 = n mod 3, h3 = n mod 5, etc.
>then set up an array of locations x[0], x[1], x[2], ... x[h] and
>starting at x[2-h1] I put a '1' in x[2-h1], x[2-h1+2], x[2-h1+4]... and
>a 1 in x[3-h2], x[3-h2+3], x[3-h2+6]... etc. I am SIEVING out those
>n+i in my list that are divisible by 2,3,5,...
>
>This does not involve trial division.

H'm... you're right, for high primes (where trial division is inefficent
because any given test weeds out so few possibilities), this is much more
efficient. The annoying thing is that you have to decide on how big to
make your sieve ahead of time; you can't just continue until you
succeed. And the low numbers (2, 3, 5) use up a lot of time.

Now, you could combine things. Make a table of the offsets modulo 210 that
are not divisible by 2, 3, 5 or 7. Then sieve by 11 and up, then take the
remainder mod 210 and use the table of offsets to guide your scan of
the sieve looking for possibilities. That way, you can avoid the most
time-consuning early sieve steps. And the cost of making the sieve very
large would not be so great.

You can easily skip even storing the even numbers in the sieve. You
can avoid multiples of 3 as well by breaking the table into two and
storing 6n+1 in one and 6n-1 in the other. Avoiding multiples of 5 as
well requires breaking the table into 8, one for each of
30n+{1,7,11,13,17,19,23,29}. That cuts the number of values stored from
10/30 to 8/30. It probably isn't worthwhile for the complexity.

>The total cost for this is O(h loglog P) where P is the largest
>candidate divisor. You can reduce this using Brent's "wheel" version
>of the sieve to essentially O(h(1 + epsilon)).

Can you describe this variation? Not that log log P is that big a penalty...

>
>When done with a sieve, the cost to test ALL the numbers in the
>range [n, n+h] for divisibility by p is h/p. That is why sieve
>based factoring algorithms are so effective.

The only annoying this is that (as far as I can tell so far), you have to
decide on h before starting. If you're loking for one prime, that gets
a bit wasteful.

On the other hand, the average number of times you mark any given number
as composite is 2.55 (I wonder if that sequence $\sum_i^\infty 1/p_i$
converges? If so, does it converge to e or something lower?), so even the
first few doesn't make it too bd...

I'll have to code this up and compare speed. Thank you for getting me
out of a conceptual rut! Also, by picking a range, finding all the
primes in that range, and choosing one at random (well, really, you try
candidates at random), I think that would slightly improve the
randomness of the primes. For example, the second of a pair of twin
primes would become much more likely.

>:Anyway, now that you've torn this parat, maybe you can take a crack
>:at my next piece on computing multiplicative inverses...

>When I see it...

>Basically to compute 1/a mod p takes O(log p) multiplications mod p
>when using a variation of Euclid's algorithm. If p is a multi-precise
>number (very big, say), there are FFT techniques to reduce the cost,
>or Lehmer's binary method. See Knuth Vol. 2

Wow, I'm embarassed. I knew about the binary algorithm (heck, I reinvented
it once when I forgot Euclid's), but I had put it out of my mind.
And it seems that despite much browsing through Knuth, I had never stopped
and looked at Lehmer's method, which is DAMN nifty. (Everyone reading,
have a look at Knuth Vol 2 section 4.5.2, algorithm L. Discussion starts
on page 327 in the second edition.)

I don't see the multiplication variation you mention or the FFT
techniques. Assuming that the FFT is only useful for the same range
that appreciates Schoenhage-Strassen multiplication, I think I can
disregard that, but if the variation that uses multiplications isn't
the binary one, then I'd like to hear about it. Do you mind explaining
it?

>--
>Bob Silverman
>These are my opinions and not MITRE's.
>Mitre Corporation, Bedford, MA 01730
>"You can lead a horse's ass to knowledge, but you can't make him think"

Well, I think you're making me think. But thank you very much for getting
me unstuck from ruts I've been stuck in.

(If I also optimize modulo calculations by combining small primes into
a number less than the wordsize, I can significantly reduce that cost,
too.)

I hope folks will tell me if this discussion frifts from signal into noise.
--
-Colin

Mike Duvos

unread,

Apr 7, 1994, 10:10:13 AM4/7/94

b...@gauss.mitre.org (Robert D. Silverman) writes:

>:can be done at a cost of about one multiply, it's significantly
>:thousand times faster than a Fermat test to the base 2, the cheapest

>One is usually doing these divisions on multi-precise numbers. They
>are significantly more expensive than one multiply.

This is not necessarily true. Trial division to weed out small prime
factors is usually done with a table of remainders and an offset from
some starting value. So although the number being checked is
multiprecision, the division (or multiply) is a single precision operation.

--
Mike Duvos $ PGP 2.3a Public Key available $
m...@netcom.com $ via Finger. $

Mark R Kaehny

unread,

Apr 7, 1994, 10:55:58 PM4/7/94

>
>The only annoying this is that (as far as I can tell so far), you have to
>decide on h before starting. If you're loking for one prime, that gets
>a bit wasteful.

This is simple - you know about how big a number you are looking for --
simply use one of the estimates of the number of primes < a number N --
PI(N). A simple one is (N/ln(N)). so probability of a random number N
of a given size being prime is 1/ln(N) (roughly). calculate h appropriately
from this.

>And it seems that despite much browsing through Knuth, I had never stopped
>and looked at Lehmer's method, which is DAMN nifty. (Everyone reading,
>have a look at Knuth Vol 2 section 4.5.2, algorithm L. Discussion starts
>on page 327 in the second edition.)
>

For random numbers I found using Lehmer's GCD using the product of the
first several primes > 2 was actually faster than trial division on
some architectures (RISC's generally). This is in the case where
sieving is useless since I am given the particular number. Some small
trial division followed by Strong Pseudo-prime test weeded things out the
fastest for up to & over 100 digit numbers ...

Mark Kaehny
kae...@csd4.csd.uwm.edu

7500...@vax1.dcu.ie

unread,

Apr 12, 1994, 2:33:19 PM4/12/94

In article <2odko3$j...@qualcomm.com>, ka...@unix.ka9q.ampr.org (Phil Karn) writes:
> I'm following this discussion with interest because I'm currently
> generating some prime moduli for use with Diffie-Hellman. (DH is now
> included in RSAREF, so I'm using it as the basis of an experimental IP
> security protocol).
>
> My understanding of the criteria for a DH modulus p is that both p and
> (p-1)/2 should be prime, i.e, p should be a "strong" prime.
>
> I know that strong primes are no longer thought to be required for RSA
> key generation, but I understand that they're still a good idea for
> DH, which depends on the discrete logarithm problem rather than
> factoring (see LaMacchia's and Odlyzko's 1991 paper on discrete
> logarithms, URL http://martigny.ai.mit.edu/~bal/field.ps).
>
> Is this still the current consensus?
>
> Phil
>

The very practical Pollard rho method for finding discrete logs has
a complexity of O(sqrt(p_L)), where p_L is the largest factor of p-1. So
choosing (p-1)/2 prime seems like a good idea.

To speed things up a bit, try making the prime (p-1)/2 = 1 mod 4. Now you can
conveniently use g=2 as a primitive root. Also use 160 bit exponents. The
complexity of finding discrete logs using most methods is a function of
the size of the modulus, not the exponent. The best way to attack a small
exponent is to use Pollards's Kangaroos, which has complexity 2^80 for a
160-bit exponemt (I think. See J.M. Pollard's brilliant 1978 paper in
Math Comp. Vol 32, p918-924)

Michael Scott

Phil Karn

unread,

Apr 12, 1994, 4:05:23 AM4/12/94

0 new messages