Merging Robert Gerbicz' perfect power code

Bill Hart

unread,

Jun 20, 2009, 9:05:22 PM6/20/09

to mpir-dev

I decided to try and write an mpn version of Robert's perfpow.c code.

I've discovered some functions which we'll need at the mpn level to do this:

mpn_remove
mpn_powm

These already exist at the mpz level, and they just need to be moved
down. They are already implemented as mpn functions essentially, so
it's trivial to create these functions.

Does anyone want to volunteer to do this? It will take me at least
another day to rewrite the mpn_perfect_power_p function.

Bill.

Bill Hart

unread,

Jun 20, 2009, 9:28:06 PM6/20/09

to mpir-dev, gerrob

Robert, in your perfect power test, at some point you have:

bound_for_p = numbits/trial_division_depth_in_bits; /* previously we
sieved up to 2^10 = 1024 */

/* primepi is an upper bound for number of primepowers up to bound_for_p */
primepi = (unsigned long) ((double) 2.0*bound_for_p/log(bound_for_p));

This looks wrong to me. Specifically bound_for_p looks like it should
be numbits - trial_division_depth_in_bits. Then primepi looks like a
wrong approximation. Can you let me know what the correct expressions
should be or explain a little why the above should be correct?

Does this affect the runtime of your code for large integers?

Bill.

2009/6/21 Bill Hart <goodwi...@googlemail.com>:

Bill Hart

unread,

Jun 20, 2009, 10:36:13 PM6/20/09

to mpir-dev, gerrob

Ignore me, your bound is absolutely correct. That's a very neat trick actually.

Bill Hart

unread,

Jun 21, 2009, 12:45:03 PM6/21/09

to mpir-dev, gerrob

I've attached a first attempt at an mpn version of Robert Gerbicz'
fast perfect power test code, minus some functions I need to make it
work. It doesn't compile yet, and I have probably introduced oodles of
bugs to the code. However I'm going to stop working on it for now,
until we can get some of the functions described below, implemented at
the mpn level in MPIR.

My impression of the code is that this is a *really* clever algorithm
and should beat pretty much anything out there when coded efficiently.
Therefore it is really important that we get this into MPIR.

I did correct what I think was a minor bug. On what is now line 328 I
replaced smallprimes[j]*smallprimes[j] < i + bound_for_p with
smallprimes[j]*smallprimes[j] < i + sq, which I think is correct.
Robert can you verify this!

I also replaced the primes q with just unsigned longs. I can imagine
one day someone having hardware to test numbers n of N = 10*2^40
*bits* for whether they are perfect powers. As the space requirement
for this function is approximately N log N, this seems to be a
sensible bound for the forseeable future. To perform the test in this
case we need primes up to 2^40. At one point we look for primes of the
form q = p*2k+1, starting with q = 2*p + 1. But we only generate
primes q = p*2k + 1 for as long as p^k <= bound_for_p < B (i.e. fits
in a limb). In general q will be much smaller than p^k, and thus if
the latter fits in a limb, so does q. Therefore I changed quite a bit
of the code to use single limbs instead of mpz_t's. I think this will
be OK, but if Robert could also verify this, I would be more
comfortable.

Anyhow, we now need the following functions:

* b = powm(a, e, m) which sets b = a^e mod m, where a, b, e and m are
all mp_limb_t's
(we should pull a function out of FLINT long_extras module for this).

* mpn_root(r, ap, an, p) returns 1 is {ap, an} is a p-th power (else
0), setting r to the root if it is
(actually we don't need the root itself)
Currently we have mpz_root which basically uses mpn_rootrem which
computes the root and
remainder and checks if the remainder is 0. This is probably
sufficient for this code which does
O(1) root computations. (In general one can do much better than
mpz_root to test if something
is a root. One only needs to work with an/p bits and compute a root.
Then check the p-th power
of the bottom limb of this root gives the correct bottom limb of
{ap, an}. If not it is not a root. Of
course then we need to perform a complete root with remainder check).

* mpn_remove_1(bp, ap, an, p) remove the highest possible power of p
(which is 1 limb) from
{ap, an} and return the exponent.
Currently we have mpz_remove, but nothing optimised for the case
when p is 1 limb.

* multimod_reduce_ui(mp_limb_t * res, ap, an, mp_limb_t * primes, count)
Given an array of count primes, each of which fits in a limb, reduce
{ap, an} by each of the primes
in the array and return the residues in the array of limbs res.
Robert did this with a tree, but it can be done slightly more
efficiently. It turns out that you can do
the bottom layer or two of the tree using umul_ppmm when multiplying
up, and then udiv_qrnnd for
the last layer when doing the reductions. This gives a substantial
speedup in practice.
It is important that this function allows aliasing between primes
and res to save space.

Some optimisations for later might include:

* Factor out the prime sieve. MPIR could have a globally defined array
of primes, computed up to some
limit, which could be extended as needed. It seems silly to compute
such a list of primes every time
mpn_perfect_power_p is called. We need to be careful about doing
this however, as global objects are
not threadsafe.

* Have an optimised version of mpn_perfect_power_p which takes a
precomputed "tree", then make
mpn_perfect_power_p just a wrapper for this function which keeps
track of some global precomputed tree
so that if someone calls mpn_perfect_power_p many times, the tree
does not need to be computed
multiple times. Again this needs to be done carefully for thread
safety reasons. This was handled in
FLINT with thread local storage, but that is not available on Sun or
Cygwin and it also means the
precomputation needs to be done per thread, which is unsatisfactory.

* It's not clear to me that we need to take the "tree" out as far as
we do. We should probably check at
each step that we haven't already exceeded the size of the original
n when multiplying up. The reason
is that reduction is trivial once our modulus is bigger than n
itself. But this is an optimisation that can be
used in the multimodular reduction code.

* An additional copy of the input is made (n --> u) if no shifting is
required, to avoid destroying the input
to this function. This copy might be avoided somehow. It's probably
not a big deal, as it only happens
one time in 64 (or 32 on a 32 bit machine), and it is certainly not
where the complexity in the algorithm
lies, so I left it for now.

Bill.

2009/6/21 Bill Hart <goodwi...@googlemail.com>:

perfpow.c

Reply all

Reply to author

Forward