Factorization of big integers is taking too long

Hans W Borchers

unread,

Mar 13, 2015, 1:20:16 PM3/13/15

to julia...@googlegroups.com

I got interested in factorizing some larger integers such as N = 3^100 + 2 .
In all tries, factor(N) did not return and had to be interrupted:

    julia> N = big(3)^100 + 2
    julia> factor(N)
    ^CERROR: interrupt
     in finalizer at ./base.jl:126
     in + at gmp.jl:243
     in factor at primes.jl:111

It is calling GMP, but the GMP software cannot be the reason as this works
with the GMP package in R and returns the factorization within seconds:

    R> library(gmp)
    R> N <- bigz(3)^100
    R> factorize(N)
    Big Integer ('bigz') object of length 3:
    [1] 31721   246451584544723     65924521656039679831393482841
    R> system.time(factorize(N))
     user system elapsed
    3.738   0.000   3.730

Is this a bug? Did I do something wrong?
The first factor, 31721, is not even large. Mathematical software such as
GAP or PARI/GP will factorize this in much less than a second.

PS: Versioninfo
    Julia Version 0.3.6; System: Linux (x86_64-linux-gnu)
    CPU: Intel(R) Core(TM) i3-3217U CPU @ 1.80GHz; WORD_SIZE: 64
    BLAS: libblas.so.3; LAPACK: liblapack.so.3
    LIBM: libopenlibm; LLVM: libLLVM-3.3

Jake Bolewski

unread,

Mar 13, 2015, 1:25:14 PM3/13/15

to julia...@googlegroups.com

This is falling back to factor() for generic integers, so the GMP method does not looked to be wrapped. The generic version will be terribly slow for bigints. Would be easy to add if you would like to submit a Pull Request.

julia> N = big(3)^100 + 2

515377520732011331036461129765621272702107522003

julia> @which factor(N)

factor{T<:Integer}(n::T<:Integer) at primes.jl:79

Link to implementation

Steven G. Johnson

unread,

Mar 13, 2015, 2:46:21 PM3/13/15

to julia...@googlegroups.com

On Friday, March 13, 2015 at 1:25:14 PM UTC-4, Jake Bolewski wrote:

This is falling back to factor() for generic integers, so the GMP method does not looked to be wrapped. The generic version will be terribly slow for bigints. Would be easy to add if you would like to submit a Pull Request.

At first glance, I'm not seeing a built-in factorization method in GMP. Googling "factorization gmp" turns up a bunch of algorithms that people have built on top of GMP, but nothing in GMP itself. (We should see what algorithm/code R is using.)

Simon Byrne

unread,

Mar 13, 2015, 3:54:37 PM3/13/15

to julia...@googlegroups.com

The R gmp docs say that they use the Pollard Rho algorithm, and there is an implementation of it in the GMP demos directory:

https://gmplib.org/repo/gmp/file/2d027c920892/demos/factorize.c

So presumably they're using that code?

simon

Hans W Borchers

unread,

Mar 13, 2015, 4:38:18 PM3/13/15

to julia...@googlegroups.com

That's right; I looked into the source of the 'gmp' package,
the C code in there is the same as the demo code you are linking to.
It uses Pollard's Rho and Miller Rabin for primality testing.
That code is under GPL license.

As Miller Rabin is used for Julia's isprime() function, it would be nice
implement Pollard Rho in pure Julia and demo how fast it can be.

But consider that there will be improvements and newer versions
of this algorithm that should be taken into account.

Hans W Borchers

unread,

Mar 15, 2015, 8:25:52 AM3/15/15

to julia...@googlegroups.com

I thought about writing my own `factorize` function. Testing the "argument checking" of `factor` I ended with the following events:

    julia> factor(0)
    ERROR: number to be factored must be positive
     in factor at primes.jl:79

    julia> factor("0")
    ERROR: `factor` has no method matching factor(::ASCIIString)

what is to be expected, I think, but the following killed my Julia process:

    julia> factor('0')

    signal (11): Segmentation fault
    rem at promotion.jl:174

    Segmentation fault (core dumped)

I assume this happens because characters are also (almost) integers.

Anyway, it should under no circumstances kill the whole process.
This is Julia 0.3.6. Does this behaviour still occur under version 0.4?

Simon Byrne

unread,

Mar 15, 2015, 8:33:15 AM3/15/15

to julia...@googlegroups.com

You're right the issue is that Char <: Integer, but segmentation
faults still shouldn't happen. I actually see a stack overflow in
0.3.6 (on OS X).

In 0.4, it is simply a MethodError, as Char is no longer a subtype of Integer.

-simon

Tim Holy

unread,

Mar 15, 2015, 9:19:47 AM3/15/15

to julia...@googlegroups.com

It's more than a question of which algorithm to use: the immutability of
numbers means that every single bignum addition or multiplication requires
allocation. So currently, julia is going to have a hard time competing with a
hand-rolled method that allocates a cache of numbers to (re)use as scratch
space for calculations. You can do this if you're willing to manage all these
temporaries manually.

The best solution would be to make julia smart enough to do that "behind the
scenes." This is a very interesting but nontrivial problem (with potential
costs if it's "close but not smart enough").

--Tim

Hans W Borchers

unread,

Mar 15, 2015, 9:39:23 AM3/15/15

to julia...@googlegroups.com

Tim, sorry, I do not understand what you mean here.
Do you argue that it is not possible to write a relatively fast implementation of, e.g., Pollard's Rho algorithm in Julia.

Tim Holy

unread,

Mar 15, 2015, 9:54:51 AM3/15/15

to julia...@googlegroups.com

I don't know the algorithm, but my point is that
c = a*b
will, for a BigInt, allocate memory. Because Numbers are immutable and BigInts
are Numbers, you can't write an algorithm mult!(c, a, b) that stores the
result in a pre-allocated "scratch space" c. Hence, you're doomed to
inefficiency compared to an implementation in C that uses such tricks. Don't get
me wrong, in the context of the overall Julia ecosystem our current design is
right one (immutability is a really awesome and important property), but there
are downsides.

But you don't have to live with the status quo. If you need efficient BigInt
calculations, perhaps a MutableBigInts.jl package would be in order. You could
define an OpaqueBigInt type (which would _not_ be a subtype of Number) and
implement mult!(c, a, b), add!(c, a, b), etc for that type. You could
basically copy the bigint implementation in base. gmp.jl is only 500 lines of
code and the changes might be very minimal, so you may be able to knock this
out in an afternoon :-). That is, unless making them not-numbers has too many
negative consequences. (You may find yourself creating additional methods of
other functions.)

Best,
--Tim

andy hayden

unread,

Mar 16, 2015, 1:01:28 AM3/16/15

to julia...@googlegroups.com

Perhaps this (factor of large big int) is a good perf/test case for https://github.com/JuliaLang/julia/pull/10084

Bill Hart

unread,

Mar 16, 2015, 2:14:00 AM3/16/15

to julia...@googlegroups.com

Factoring large integers is an open research problem. The issue is definitely nothing to do with Julia's handling of bignums, or its use of GMP (there is no competitive factoring algorithm in GMP).

Fixing that problem properly is on the order of about 120,000 lines of C code, if you want to be truly up-to-date.

Here is a list of algorithms you need to factor big numbers fast:

trial division

perfect power testing

lehmer

pollard rho

SQUFOF

p-1

p+1

elliptic curve method

quadratic sieve (self initialising, multiple polynomial, small, large and double large prime variants)

general number field sieve (active open research area)

then you need to be able to check for primality reliably to certify that you have actually fully factored your numbers, for which you will want

a lookup table for small primes

trial division

strong probable prime test

Lucas test

Baillie-PSW

Miller-Rabin

Pocklington-Lehmer p-1 test

Morrison's p+1 test (+ Brillhart, Lehmer, Selfridge improvements)

APR-CL or ECPP

You might like to look at the msieve library and the GMP-ECM library. That will take most of the burden off you. If you want to factor numbers bigger than about 100 decimal digits, you need something like CADO-NFS. If you want to factor numbers bigger than 200 digits you need a miracle.

Hopefully flint will also have something relatively competitive by the end of the summer, though it isn't specialised just on those problems. We are still missing APR-CL, qsieve, ECM which we plan to add over the next few months to a year, and we don't have any plans for a general number field sieve at this point.

Note that the primality testing in GMP is for probable prime testing only. It sometimes (exceedingly rarely) says composite numbers are prime. It does no primality proving (i.e. it has no APR-CL or ECPP).

Hans W Borchers

unread,

Mar 16, 2015, 2:42:38 AM3/16/15

to julia...@googlegroups.com

Thanks, I know. When I need to factor large numbers I can use specialized software, and for mid-sized problems I can easily live with PARI/GP (or perhaps Flint in the future). And I can live with probabilistic prime testing.

All that is not my problem. I would like to have factor() in Julia work as fast as in R or Python, and certainly not try to factor big integers with trial division. Elliptic curve or quadratic sieve methods may come later.

Bill Hart

unread,

Mar 16, 2015, 2:45:41 AM3/16/15

to julia...@googlegroups.com

I'm not sure what you mean by "big". Without ECM or QS you won't efficiently factor numbers bigger than about 20 digits (except in random cases where you happen to get lucky).

Bill.

Bill Hart

unread,

Mar 16, 2015, 2:56:48 AM3/16/15

to julia...@googlegroups.com

To make a concrete suggestion that might help, Julia might benefit from One Line Factor and Lehman's algorithm (misspelled in my previous post). There are C implementations here:

https://github.com/wbhart/flint2/blob/trunk/ulong_extras/factor_one_line.c

https://github.com/wbhart/flint2/blob/trunk/ulong_extras/factor_lehman.c

These are efficient up to around numbers of 44 bits or so. They have heuristic complexity O(n^1/3).

If you want to go up to 64 bits, you probably want SQUFOF with a fallback to Lehmer. That gives you at least heuristic complexity O(n^1/4):

https://github.com/wbhart/flint2/blob/trunk/ulong_extras/factor_SQUFOF.c

Feel free to derive BSD or MIT licensed versions of these for Julia.

Bill.

On 16 March 2015 at 07:42, Hans W Borchers <hwbor...@gmail.com> wrote:

Bill Hart

unread,

Mar 16, 2015, 3:03:39 AM3/16/15

to julia...@googlegroups.com

I should mention that if you want to use the factor_one_line.c file unmodified under a BSD or MIT license, I'd have to seek permission from the other copyright holder (who is unlikely to withhold permission). Reimplementing them in Julia is definitely fine.

Hans W Borchers

unread,

Mar 16, 2015, 4:15:41 AM3/16/15

to julia...@googlegroups.com

Bill, I think it has become clear what I mean. I agree that is unreasonable to exactly define up to what limit a general-purpose scientific software should be able to factorize numbers, for example. But I feel, Julia should be able to act in about the same range as other comparable systems, like R or Python.

With its relatively simple demo version of Pollard's Rho, GMP is capable of factorizing Fermat's number F6 = 2^(2^7) + 1 = 59649589127497217 * 5704689200685129054721 within 200 secs (on a slow computer). More specialized systems like Python/SAGE or PARI/GP do F8 (with 77 digits) in less than a second (with hand-made C code, I guess).

I think factorization in the range of 30-50 digits could be possible with Julia. I will write my own version of Pollard (with Brent's cycle-finding approach) and see how far I get. If Julia sticks with trial division, fine with me. Thanks, at the moment I'm not interested in C versions.

Bill Hart

unread,

Mar 16, 2015, 5:27:21 AM3/16/15

to julia...@googlegroups.com

Bear in mind that you can get (very) lucky with Pollard's Rho. It won't perform comparably for all numbers of the same size.

lapeyre....@gmail.com

unread,

Mar 19, 2015, 6:26:10 PM3/19/15

to julia...@googlegroups.com

Tim is correct in a sense. I translated some big int code a while ago, I think it was the pollard rho method (don't quite remember the details) and the inability to reuse storage for a bigint caused a big performance hit. (I thought about some workaround, but unfortunately I don't remember what it was or if I even tried it)

Stefan wrote a simple stop-gap factor() routine and noted in a comment that it needs to be improved. But Bill Hart is correct: It can't do 3^100 + 2 no matter how efficient bigint arithmetic is.

The Maxima routines and the perl module by Dana Jacobsen are good, but they would require translating and asking the authors to liberalize the license.

I wrapped the msieve c library.

https://github.com/jlapeyre/PrimeSieve.jl

It does 3^100 + 2 in less than a second on my machine

Hans W Borchers

unread,

Mar 20, 2015, 3:01:45 AM3/20/15

to julia...@googlegroups.com

As Bill noted, it is known that Pollard Rho does not "perform comparably for
all numbers of the same size." Still I got interested in trying it out myself.
I wrote a simple implementation and got this result:

    julia> n = big(3)^100 + 2
    julia> @time factorize(n)
    elapsed time: 74.539066901 seconds (5360852768 bytes allocated, 23.12% gc time)
    Dict{BigInt,Int64} with 3 entries:
      246451584544723          => 1
      31721                    => 1
      65924521656039679831393… => 1

So I, as Bill would say, "was lucky." Of course, this is not of real use, but
I learned some things, and I learned that Julia with BigInt may not yet be
ready for number theory applications.

Bill Hart

unread,

Mar 20, 2015, 3:08:52 AM3/20/15

to julia...@googlegroups.com

If you are interested in some relatively serious number theory with Julia, there is the Nemo library we are writing in Julia.

https://github.com/wbhart/Nemo.jl

Unfortunately it is between versions at the moment (see the rewrite branch, and not everything is even pushed to that). We simply call the factor in Pari/GP at the moment for factoring integers. By the end of the summer we hope to have something much better.

Bill.

Andy Hayden

unread,

Mar 20, 2015, 3:16:13 AM3/20/15

to julia...@googlegroups.com

Hans, it would be interesting to see how your Pollard Rho example compares on the pooling bigint branch https://github.com/JuliaLang/julia/pull/10084. It ought to allocate far less memory...

janvanoort

unread,

Nov 4, 2015, 2:25:43 PM11/4/15

to julia...@googlegroups.com

That's interesting. I have been working for a week now on a somewhat improved
version of Pollard's Rho, by "feeding" it with something else than a random
number, in order to take away the non-deterministic behaviour. My
implementation is written in Java 8 ( have had no time to port it to Julia,
yet ), and factors 3^100+2 in about 800 milliseconds an a Xeon E3-1265L with
8 MB cache, on Linux ( Ubuntu Server ). If anyone's interested, feel free to
ping me.

--
View this message in context: http://julia-programming-language.2336112.n4.nabble.com/Factorization-of-big-integers-is-taking-too-long-tp15925p30844.html
Sent from the Julia Users mailing list archive at Nabble.com.

Bill Hart

unread,

Nov 4, 2015, 3:10:26 PM11/4/15

to julia-users

Do you mean you are factoring something other than random numbers, or do you mean your Pollard Rho is completely deterministic?

It's not a good measure of performance to time just one factorisation with Pollard Rho, since you could just pick the parameters such that it essentially succeeds immediately. You should give times for a range of numbers.

Bill.

janvanoort

unread,

Nov 4, 2015, 3:39:49 PM11/4/15

to julia...@googlegroups.com

You are right. What I mean, is that my Pollard Rho is completely deterministic.

Times for ranges of numbers:

bitlength < 20

100 µs - 300 µs

20 < bit length < 40

around 3 ms

random numbers with bitlength ~1024

3 ms - 800 ms, depending on size of smallest prime factor

( for this post, in order to check, I just made it factor the Mersenne number 2^1117-1; it came up with the factor 53617 in 860 ms )

2015-11-04 20:57 GMT+01:00 Julia Users mailing list [via Julia Programming Language] <[hidden email]>:

Do you mean you are factoring something other than random numbers, or do you mean your Pollard Rho is completely deterministic?

It's not a good measure of performance to time just one factorisation with Pollard Rho, since you could just pick the parameters such that it essentially succeeds immediately. You should give times for a range of numbers.

Bill.

On 4 November 2015 at 18:55, janvanoort <[hidden email]> wrote:
That's interesting. I have been working for a week now on a somewhat improved
version of Pollard's Rho, by "feeding" it with something else than a random
number, in order to take away the non-deterministic behaviour. My
implementation is written in Java 8 ( have had no time to port it to Julia,
yet ), and factors 3^100+2 in about 800 milliseconds an a Xeon E3-1265L with
8 MB cache, on Linux ( Ubuntu Server ). If anyone's interested, feel free to
ping me.

--
View this message in context: http://julia-programming-language.2336112.n4.nabble.com/Factorization-of-big-integers-is-taking-too-long-tp15925p30844.html
Sent from the Julia Users mailing list archive at Nabble.com.

If you reply to this email, your message will be added to the discussion below:
http://julia-programming-language.2336112.n4.nabble.com/Factorization-of-big-integers-is-taking-too-long-tp15925p30867.html

To unsubscribe from Factorization of big integers is taking too long, click here.
NAML

View this message in context: Re: Factorization of big integers is taking too long

Bill Hart

unread,

Nov 4, 2015, 3:47:21 PM11/4/15

to julia-users

It would be better with Pollard Rho to list the times depending on the smallest factor.

It would be interesting to compare timings for some known numbers such as Fermat numbers (F_n = 2^(2^n) + 1).

Here are some timings from a Google summer of code project we had this year:

F_6 (18446744073709551617)

pollard rho : 0.001583 seconds

ECM : 0.012700 seconds

F_7 (340282366920938463463374607431768211457)

pollard rho : 113.476116 seconds

ECM : 0.613939 seconds

F_8 (115792089237316195423570985008687907853269984665640564039457584007913129639937)

pollard rho : 47.584017 seconds

ECM : 0.292101 seconds

It's possible he improved the timings a little later, but these should serve as a guide anyway.

Bill.

janvanoort

unread,

Nov 4, 2015, 4:17:22 PM11/4/15

to julia...@googlegroups.com

Thanks for the input. My deterministic Pollard Rho does:

F6: 2.0 milliseconds

F7: failure

F8: 6.98 seconds

2015-11-04 21:34 GMT+01:00 Julia Users mailing list [via Julia Programming Language] <[hidden email]>:

It would be better with Pollard Rho to list the times depending on the smallest factor.

It would be interesting to compare timings for some known numbers such as Fermat numbers (F_n = 2^(2^n) + 1).

Here are some timings from a Google summer of code project we had this year:

F_6 (18446744073709551617)
pollard rho : 0.001583 seconds
ECM : 0.012700 seconds

F_7 (340282366920938463463374607431768211457)
pollard rho : 113.476116 seconds
ECM : 0.613939 seconds

F_8 (115792089237316195423570985008687907853269984665640564039457584007913129639937)
pollard rho : 47.584017 seconds
ECM : 0.292101 seconds

It's possible he improved the timings a little later, but these should serve as a guide anyway.

Bill.

http://julia-programming-language.2336112.n4.nabble.com/Factorization-of-big-integers-is-taking-too-long-tp15925p30871.html

To unsubscribe from Factorization of big integers is taking too long, click here.
NAML

Reply all

Reply to author

Forward