Arbitrary Precision Problem Persists

77 views
Skip to first unread message

Amy Valhausen

unread,
Apr 9, 2016, 2:56:01 PM4/9/16
to sympy
In the thread ; https://groups.google.com/forum/#!topic/sympy/eUfW6C_nHdI

I was seeking feedback and help with the problem of type ;

( 1.414213562^6000) % 400)

After reviewing all the excellent collaboration at the above link, Im feeling
a little lost and overwhelmed.  So many good suggestions were made and
also a lot of bugs, and bottlenecks discovered - Im not sure which approach
is best for what I am hoping to solve?

Ive been reviewing the problems with the different approaches, including precision loss
with use of floats, some of the suggested functions, etc.

Oscar mentioned the solution of using a string value and storing it as a possibility?

My hope is to arrive at a method that returns lossless precision, if I dont have to sacrifice speed then
thats great but if preserving accuracy comes at a cost of speed thats ok with me.

I'd like also to get the highest precision available, if possible past the 15 digit limit, the larger the better.

What do you guys think is the best code solution for ;
given any rational  between 1.0x and 2, for example "1.414213562"
being raised to a power as high as 6000
with modulus returned of 400

( 1.414213562^6000) % 400)    ?

So that the greatest and most accurate precision is kept?

I have been testing out some of the code ideas such as ;

from sympy import mpmath
mpmath.mp.dps = 821
x = mpmath.mpf('1.414213562') ** 6000

Thanks Oscar, Aaron and everyone!  I have a few questions to ask, but first wanted to say this entire thread has been so meaningful as a learning process for me!

I am an intermediate VB programmer and newbie to python and this single thread you have all been so gracefully collaborating together on has taught me more and helped me to understand the issues surrounding this than any other source online I have researched and I am saving this entire thread for future use and review!

I have been trying to keep pace with all the excellent insights and suggestions as well as potential bottlenecks.

casevh

unread,
Apr 9, 2016, 5:04:16 PM4/9/16
to sympy
Using Python's included Fraction type, you can calculate ((n**6000) % 400) exactly and then convert the exact result into a float.

>>> from fractions import Fraction
>>> float((Fraction("1.414213562")**6000) % 400)
314.3479803094588

The Fraction type isn't very fast but the calculation is exact.

However, if your value for n changes a small amount, your result will change by a large amount.

>>> float((Fraction("1.414213562")**6000) % 400)
314.3479803094588
>>> float((Fraction("1.4142135623")**6000) % 400)
82.204377195306
>>> float((Fraction("1.41421356237")**6000) % 400)
164.74604114158947
>>> float((Fraction("1.414213562373")**6000) % 400)
394.37793561523273

Your value of n looks suspiciously like an approximation to sqrt(2). If that's correct, the result would be:

>>> pow(2,3000,400)
176

Since we are using an approximation, the least significant digits (the ones retained by the mod operation) are incorrect. How many digits in the approximation of sqrt(2) are needed to produce a stable result?

(Note: I'm the maintainer of gmpy2, a module for very fast arbitrary precision arithmetic. If it is present on your system, it will be automatically used by mpmath. I'll use gmpy2 directly since I'm more familiar with it.)

>>> import gmpy2
>>> gmpy2.get_context().precision=10000
>>> s=str(gmpy2.sqrt(2))
>>> float((gmpy2.mpq(s[:900])**6000) % 400)
397.6970528425263
>>> float((gmpy2.mpq(s[:901])**6000) % 400)
317.77259005351806
>>> float((gmpy2.mpq(s[:902])**6000) % 400)
381.7876974957164
>>> float((gmpy2.mpq(s[:903])**6000) % 400)
381.7876974957164
>>> float((gmpy2.mpq(s[:904])**6000) % 400)
370.4278485701384
>>> float((gmpy2.mpq(s[:905])**6000) % 400)
48.15587878502279
>>> float((gmpy2.mpq(s[:906])**6000) % 400)
345.64468558337177
>>> float((gmpy2.mpq(s[:907])**6000) % 400)
154.42196588552062
>>> float((gmpy2.mpq(s[:908])**6000) % 400)
175.2996939157355
>>> float((gmpy2.mpq(s[:909])**6000) % 400)
175.82163711649088
>>> float((gmpy2.mpq(s[:910])**6000) % 400)
175.9782200767175
>>> 

You need over 900 accurate digits in the approximation of the sqrt(2) to get the expected answer.

casevh

Amy Valhausen

unread,
Apr 9, 2016, 6:36:10 PM4/9/16
to sympy
Hi Case,

thanks for your reply - in the thread cited ; https://groups.google.com/forum/#!topic/sympy/eUfW6C_nHdI

The impression I get (please keep in mind Im a newbie), was that most
of the gentleman came to the conclusion that floats of any kind used for this problem would result in data loss and inaccurate results, Oscar seemed to think that numeric values would have to be stored as strings to prevent this.  The issue of using 900 plus digits was also covered so I understand that part.

As I mentioned in the cited link above, I want to be able to use any rational number between 1.0x to 2 as the base, I picked a truncated value of sqrt(2) as an example only.

The value you mentioned of 176 for ;

( 1.414213562^6000) % 400)

Was one of several values reported by folks working on this issue at the link ;
https://groups.google.com/forum/#!topic/sympy/eUfW6C_nHdI

But they determined that it, along with other several values like 32, etc....
were wrong... some seemed to think that the value of 271 was closer,
but there were so many different approaches used, functions, data types
used that all gave different answers it left me a bit perplexed in the end
to grasp which of the methods was most accurate.  Towards the end of
the thread https://groups.google.com/forum/#!topic/sympy/eUfW6C_nHdI
Oscar and Aaron seem to have worked out what I think is a better overall approach but I cant understand or put together what they mean enough to
write it as code and experiment with....

Amy Valhausen

unread,
Apr 9, 2016, 6:39:24 PM4/9/16
to sympy
They also told me and described in length how using the float will cause data loss
as it cuts values off when they get too big, there was a discussion of
float32, float96, float 128 issues - someone says supposedly a float 128
could solve the problem but then this cant actually be used as float128
isnt actually float128 on most systems, it just reverts to the system ceiling
which is low on my system as I am running win xp on a 32 bit system - but
they kept saying regardless, that using float will not work for what I need
- it wont be accurate enough?

casevh

unread,
Apr 10, 2016, 2:04:57 AM4/10/16
to sympy


On Saturday, April 9, 2016 at 3:36:10 PM UTC-7, Amy Valhausen wrote:
Hi Case,

thanks for your reply - in the thread cited ; https://groups.google.com/forum/#!topic/sympy/eUfW6C_nHdI

The impression I get (please keep in mind Im a newbie), was that most
of the gentleman came to the conclusion that floats of any kind used for this problem would result in data loss and inaccurate results, Oscar seemed to think that numeric values would have to be stored as strings to prevent this.  The issue of using 900 plus digits was also covered so I understand that part.
 
Floating point arithmetic has several quirks that come into play when solving this type of problem. I will skip some of the really obscure details but I hope this helpful.

Floating-point representation

Most computer arithmetic is performed using binary, or radix-2, arithmetic. Most human arithmetic is performed using decimal, or radix-10, arithmetic. When you work with the value 1.4142, you are specifying a decimal value. When a decimal value is converted to a binary floating point number, most values cannot be stored exactly. It the same as trying to store 1/3 in a calculator - the value that is stored is close but not exact. There are a few implications:

1) The use of the Decimal type can sometimes help since it can store the value exactly. Decimal("1.4142") is stored exactly.

2) Notice that I used a string to initialize the Decimal instance. It is a common error to see Decimal(1.4142). The Python interpreter first convert the characters 1.4142 into a binary, not-quite-exact, approximation, then Decimal() converts this approximation. The conversion is exact, but it is the exact version of a slightly wrong value.

>>> Decimal("1.4142")
Decimal('1.4142')
>>> Decimal(1.4142)
Decimal('1.4141999999999999015898310972261242568492889404296875')

When working with arbitrary-precision, it is important to delay conversion to a floating-point format as long as possible. You don't want to change from one radix to another (i.e. from binary to decimal format) or increase the precision. Each conversion will (almost always) create a very small error.

Floating-point values

Floating-point numbers are are stored in a condensed format. Instead of storing all the possible digits (for decimal format) or bits (for binary format), only the first few digits/bits are stored. The remaining digits/bits are just replaced by 0 and a count is kept of the number of zeros. If the count of zeros is too large and positive, the number is eventually replaced by Infinity. If the count is large and negative, the zeros are effectively placed before the digits. If the count becomes too negative, the number is eventually replaced by 0.

The most common floating point format uses 64 bits. The precision (the number of leading bits that are kept) is 53. This is approximately 15 decimal digits. The exponent (the count of the zeros) is slightly more than 1000. This is approximately 308 decimal digits. This format in a standard known as IEEE-754. There are other formats that use smaller limits (i.e. float32) or larger limits (i.e. float128)

Calculating ((n ** 6000) % 400) encounters both the limits of floating point values. The value of 1.4142**6000 is approximately 1.1614417884357118e+903. The value of the exponent exceeds the limits of a float64 so a value of Infinity (or OverflowError) is returned by Python.

The other problem is that mod cares about the least significant bits but the condensed format only keeps the most significant bits and the least significant bits are replaced by 0.

To get accurate results using floating point arithmetic, we need to use a format that has a sufficiently large exponent range to avoid the Infinity/OverflowError and sufficient precision to not replace the least significant bits with zero. A precision greater than 903 decimal digits (or 903*math.log2(10) = 3000 bits) is the needed so ensure that some of the least significant information is still present.

There are several common data types that can utilize arbitrary precision and large exponents ranges.

1) The decimal.Decimal type included with Python.
2) The mpmath.mpf type that is part of the mpmath module and is used by sympy.
3) The gmpy2.mpfr type that is part of the gmpy2 module. 

Here is an example using gmpy2. The result is rounded to 53 bits.

>>> import gmpy2
>>> gmpy2.get_context().precision=3100
>>> a=(gmpy2.mpfr("1.4142")**6000) % 400
>>> gmpy2.round2(a, 53)
mpfr('271.04818100863093')

In this example, the value 1.4142 was kept as a string until it was converted directly to extended precision type. Then the result was rounded to 53 bits.

Another way to calculate (1.4142**6000) % 400 is to delay the conversion to a floating point until after the calculation is done. This done by using rational, or fractional, arithmetic. The value 1.4142 is exactly 7071/5000. The fraction 7071/5000 is raised to the 6000 power exactly and then % 400 calculates the remainder exactly. Since the numerator and denominator have unlimited size, the values can be computed exactly. The final remainder is then converted to a float.

>>> float((Fraction("1.4142")**6000) % 400)
271.04818100863093

The two values agree.

So why don't we just use rational arithmetic? As the numerator and denominator become larger, the calculations slow down. Floating point arithmetic has consistent performance. 

casevh 

Oscar Benjamin

unread,
Apr 10, 2016, 5:12:54 PM4/10/16
to sympy
On 9 April 2016 at 19:56, Amy Valhausen <amy.vau...@gmail.com> wrote:
> In the thread ; https://groups.google.com/forum/#!topic/sympy/eUfW6C_nHdI
>
> I was seeking feedback and help with the problem of type ;
>
> ( 1.414213562^6000) % 400)
>
> After reviewing all the excellent collaboration at the above link, Im
> feeling
> a little lost and overwhelmed. So many good suggestions were made and
> also a lot of bugs, and bottlenecks discovered - Im not sure which approach
> is best for what I am hoping to solve?

Yes the thread did bring up a few issues and there were some bugs. I
think the reason for those bugs is that sympy's Float algebra is less
well used and tested than its other features. Modulo division with
floats for example seems an unusual usuage to me. That's because for a
problem like the one you have posed it's usually best not to use
float.

> Ive been reviewing the problems with the different approaches, including
> precision loss
> with use of floats, some of the suggested functions, etc.

The short answer is to just use rational numbers unless you can't for
some reason. So if 1.4142 is supposed to be an exact rational number
then:

Rational("1.4142")**6000 % 400

This will give you the exact answer is a rational number.
Unfortunately the rational number has a really big numerator and
denominator (print it to see) so this isn't necessarily a useful form
for viewing the output. To see the first say 50 decimal digits just
use evalf():

>>> ((Rational("1.4142")**6000) % 400).evalf(50)
271.04818100863092181593910948838929251832458036415

> Oscar mentioned the solution of using a string value and storing it as a
> possibility?

That was a suggestion for how the implementation of sympy could
change. I wasn't suggesting that to you to use. For your problem you
can pass the decimal string to Rational. But as already pointed out in
this thread it must be Rational("1.4142") and not Rational(1.4142).

> My hope is to arrive at a method that returns lossless precision, if I dont
> have to sacrifice speed then
> thats great but if preserving accuracy comes at a cost of speed thats ok
> with me.
>
> I'd like also to get the highest precision available, if possible past the
> 15 digit limit, the larger the better.

So see above. You can compute the result exactly as a rational number
(effectively having infinite precision). However this isn't always a
useful form so you can also evaluate it with evalf and ask for as many
digits as you desire. The exact answer can be written in decimal but
it needs 24003 digits so you can print it with:

>>> ((Rational("1.4142")**6000) % 400).evalf(24100)

As you can imagine though 24003 digits is usually too many to be
useful so it really depends what you're trying to do.

--
Oscar
Reply all
Reply to author
Forward
0 new messages