storing big integers as prime factors

39 views
Skip to first unread message

Thomas Ligon

unread,
Aug 6, 2022, 10:08:35 AM8/6/22
to sympy
Is there any way to do this, perhaps just by saying that I want to keep prime factors instead of decimal digits?
Background: I am writing software (using Python and SymPy) to calculate the coefficients of a power series. By using Rational(0) to initialize sums and Rational(1,2) to avoid getting a floating point, I have results that are always rational, specifically quotients of large integers and have reached the point where the integers have about 50 digits. I have a (fairly complex) recursive function that calculates the coefficient of x**(n+1) when the coefficients of x**n are known. When calculating the sum of two rational numbers, this most likely calculates the greatest common denominator, something which is sure to be hard. If all of the integers were always stored in terms of their prime factors instead of decimal digits, it would be much more efficient.
Is there such a feature out there somewhere?

Oscar Benjamin

unread,
Aug 6, 2022, 12:15:44 PM8/6/22
to sympy
Integer gcd is not that slow. Factorising large integers into primes is though.

If you represent integers in terms of their prime factorisation then
how can you efficiently add two integers?
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/7171be2f-67a7-4d48-8eac-5d37571677c4n%40googlegroups.com.

Aaron Meurer

unread,
Aug 7, 2022, 1:41:59 AM8/7/22
to sy...@googlegroups.com
As Oscar pointed out, it's not clear that this would actually be more efficient. However, some things that could help, based on what you've said, are

- Use Poly and any relevant Poly methods to represent and manipulate polynomial expressions
- Make sure gmpy2 is installed. If it is, SymPy will automatically use it for much faster integer operations, especially when using Poly.
- Rewrite your recursive function to operate non-recursively. Recursive functions are slower in Python than normal loops. 

Aaron Meurer

--

thomas...@gmail.com

unread,
Aug 7, 2022, 3:37:28 AM8/7/22
to sy...@googlegroups.com
If the numbers are represented as prime factors to begin with, finding the gcd is easier. Then, calculating the sum requires multiplying out the numerators, which should be easier than factoring the decimal representation. But that's just my relatively naïve observation.

The power series I am dealing with was invented in a paper that is about 150 years old and the author calculated some of the coefficients up to a power of 9 and used the prime-factor representation for the denominator and the decimal representation for the numerator. One of the larger coefficients has a value of 2114557858/(2**12*3**6*5*3). I don't know how much paper he needed to calculate that one. Most subsequent papers calculate to a power of 5.

My software can calculate to a power of 23 and then reaches the limits of my PC. Looking at the resource monitor tells me that I can probably benefit a lot from more RAM, but I haven't done that yet.

Based on your observations, I think the best way forward is the simple solution (more RAM), since we haven't received a response indicating that another solution is easy.

Tom
(Dr. Thomas S. Ligon)
thomas...@gmail.com
Frohnloher Str. 6a
81475 Muenchen
Germany
Tel. +49(89)74575075
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAHVvXxTbrJwOfx8X%3DFHdZ8e-J3w0_qAt6iWBxW2POzJ9%3DNb9zw%40mail.gmail.com.

thomas...@gmail.com

unread,
Aug 7, 2022, 3:55:23 AM8/7/22
to sy...@googlegroups.com

Thanks to both of you for the recommendations. I will follow them.

 

Tom

(Dr. Thomas S. Ligon)

thomas...@gmail.com

Frohnloher Str. 6a
81475 Muenchen
Germany
Tel. +49(89)74575075

 

From: sy...@googlegroups.com <sy...@googlegroups.com> On Behalf Of Aaron Meurer
Sent: Sunday, August 7, 2022 7:42 AM
To: sy...@googlegroups.com
Subject: Re: [sympy] storing big integers as prime factors

 

As Oscar pointed out, it's not clear that this would actually be more efficient. However, some things that could help, based on what you've said, are

Oscar Benjamin

unread,
Aug 7, 2022, 8:08:56 AM8/7/22
to sympy
On Sun, 7 Aug 2022 at 08:37, <thomas...@gmail.com> wrote:
>
> If the numbers are represented as prime factors to begin with, finding the gcd is easier. Then, calculating the sum requires multiplying out the numerators, which should be easier than factoring the decimal representation. But that's just my relatively naïve observation.

The prime factorisation of a sum of two integers does not necessarily
have any relationship to the prime factorisation of the summands.
Consider 8 + 5 = 13 which in terms of prime factorisations looks like:

{2, 2, 2} + {5} = {13}

How can you compute the prime factorisation of the result (13) without
multiplying out the other factorisations?

If you don't need to add two integers and you only need integer
multiplication, exact division, gcd then it could be possible to do
something more efficient with the prime factorisations. If you have to
compute the result of an addition explicitly and then factorise it
then this is going to be very inefficient as soon as the numbers get
large.

If you have some reason to think that your factorisations will always
share lots of prime factors in common then maybe it is possible to
implement addition more efficiently without needing to factorise large
integers. In general though you should avoid any approach that
requires factorising large integers since that is a famously hard
problem (gcd is much easier).

> The power series I am dealing with was invented in a paper that is about 150 years old and the author calculated some of the coefficients up to a power of 9 and used the prime-factor representation for the denominator and the decimal representation for the numerator. One of the larger coefficients has a value of 2114557858/(2**12*3**6*5*3). I don't know how much paper he needed to calculate that one. Most subsequent papers calculate to a power of 5.

For a computer those integers are not really large. Below you talk
about running out of RAM which implies that the numbers must be a lot
bigger than this.

> My software can calculate to a power of 23 and then reaches the limits of my PC. Looking at the resource monitor tells me that I can probably benefit a lot from more RAM, but I haven't done that yet.
>
> Based on your observations, I think the best way forward is the simple solution (more RAM), since we haven't received a response indicating that another solution is easy.

If the numbers are really so large that you are running out of RAM
then getting more RAM probably won't help that much. I expect that
even if you do get another term in the series the next term would fill
your new larger RAM.

If the prime factorisation approach can work then it would potentially
deliver a much better improvement. One way to implement the prime
factorisation idea in SymPy would just be to use a symbol for each
prime factor. Assuming you only need multiplication, division and
cancellation of fractions that would work fine: SymPy's Mul already
represents that naturally (although I'd probably use monomials from
polys for this).

--
Oscar

thomas...@gmail.com

unread,
Aug 8, 2022, 3:26:30 AM8/8/22
to sy...@googlegroups.com

In order to install gmpy2 I ran pip install and added an import statement. Do I need anything else?

 

Tom

(Dr. Thomas S. Ligon)

thomas...@gmail.com

Frohnloher Str. 6a
81475 Muenchen
Germany
Tel. +49(89)74575075

 

From: sy...@googlegroups.com <sy...@googlegroups.com> On Behalf Of Aaron Meurer
Sent: Sunday, August 7, 2022 7:42 AM
To: sy...@googlegroups.com
Subject: Re: [sympy] storing big integers as prime factors

 

As Oscar pointed out, it's not clear that this would actually be more efficient. However, some things that could help, based on what you've said, are

Oscar Benjamin

unread,
Aug 8, 2022, 5:47:46 AM8/8/22
to sympy
On Mon, 8 Aug 2022 at 08:26, <thomas...@gmail.com> wrote:
>
> In order to install gmpy2 I ran pip install and added an import statement. Do I need anything else?

If gmpy2 is installed then it will be used automatically by the polys
module for internal calculations. It is not used by Rational so
whether it gets used depends on exactly how you do the calculations.

If you are really just doing calculations with pure rational numbers
then the most efficient thing would probably be to use gmpy2's mpq
type directly (i.e. no need for SymPy at all)

--
Oscar
Reply all
Reply to author
Forward
0 new messages