square roots algo

jacob navia

unread,

Jan 11, 2021, 9:39:39 AM1/11/21

to

Hi math wizards!

I have developed an extended precision math package with 448 bits. The
basic operations are written all in assembly, and I am now approx twice
as fast as MPFR.

BUT...

(yes there is always a "but" :-)

MPFR does more square roots than me per millisecond!

Surely MPFR has a better algo than my pathetic newton iteration... I was
at 1720 square roots/ms, and MPFR is at 6899. I have tried to improve it
with Halley's method and now I am at 2200/ms, still MUCH TOO SLOW.

Can you point me to some method that would be more appropiate?

Thanks in advance!

P.S.
Here are the timings for the 4 operations:

MPFR
The four operations with 5000000 iterations. Mon Jan 11 15:32:50 2021
Addition : 2.2837e-05 ms. (43788.58869) per ms
Subtraction : 1.39152e-05 ms. (71863.86110) per ms
Multiplication: 4.5159e-05 ms. (22143.98016) per ms
Mult. (int) : 1.59318e-05 ms. (62767.54667) per ms
Division : 0.00011619 ms. (8606.57784) per ms
Division(int) : 4.0373e-05 ms. (24769.02881) per ms

My software:
The four operations with 5000000 iterations. Mon Jan 11 15:35:28 2021
Addition : 1.00136e-05 ms. (99864.18471) per ms
Subtraction : 9.862e-06 ms. (101399.31048) per ms
Multiplication: 2.06244e-05 ms. (48486.25899) per ms
Mult. (int) : 6.87133e-06 ms. (145532.16261) per ms
Squaring : 2.18158e-05 ms. (45838.33735) per ms
Division : 0.000120662 ms. (8287.59960) per ms
Division(int) : 4.5838e-05 ms. (21815.96056) per ms

Note that I also use newton iteration for division... I am almost as
slow as MPFR...

jacob navia

unread,

Jan 11, 2021, 9:41:18 AM1/11/21

to

Sorry forgot to tell: all timings were done in an Apple M1 chip
(mac-mini, ARM64 architecture)

Richard Fateman

unread,

Jan 11, 2021, 3:27:05 PM1/11/21

to

> > MPFR does more square roots than me per millisecond!
> >
> > Surely MPFR has a better algo than my pathetic newton iteration...

Its algorithm is described here
https://www.mpfr.org/algorithms.pdf page 13. it refers to integer sqrt, which is described somewhere else...

And the source code is public.
I don't know about 448 bits, but the general idea is probably to distinguish
even and odd exponents, take the integer square root of the significand using
an extravagantly accurate but cheap to compute initial approximation to the
integer square root, and then iterate a fixed number of times. (no need to test for convergence)
This may be the wrong news group to ask this question.

RJF

jacob navia

unread,

Jan 15, 2021, 9:49:42 AM1/15/21

to

Thanks for your answer. I have tried several methods:
1 simple newton approx
2 More sophisticated root finding method (also newton approx)

They all work but are slow.

I started browsing the mpfr code and it is based on gmp, so their
methods aren't appliczable to me since I use a fixed number of bits
(448). But browsing their code I found the code for their log function,
and they use the AGM (arithmetic, geometric mean), what I used long ago
but was stuck with problems in the last bits. I am using

log (x): let W = (x-1)/(x+1) then

log(x)/2 = w + (w^3)/3 + (w^5)/5 + (w^7)/7 ...

but I need around 80 iterations to get to 448 bits precision... I
thought of somehow accelerating this by calculating symbollically the
next step, then deriving a faster formula for convergence but I do not
know enough maths to do that.

Thanks for your answer
jacob

James Cloos

unread,

Jan 15, 2021, 2:01:27 PM1/15/21

to

perhaps https://www.sollya.org/ can help?

-JimC
--
James Cloos <cl...@jhcloos.com> OpenPGP: 0x997A9F17ED7DAEA6

Richard Fateman

unread,

Jan 16, 2021, 1:54:46 PM1/16/21

to

without knowing the details of your program, it is not possible to diagnose why your sqrt
program is slow. Maybe you are using full precision when not necessary, (early iterations) or using
expensive division, something that is not needed, either.
As for the log iteration, that seems like a bad method. There are full algorithms in the
lisp language in the Maxima source, too. There is also the MP system by Bailey,
though it seems to have been eclipsed by GMP.
https://www.davidhbailey.com/dhbpapers/mpf90.pdf

Is there is a particular important calculation that requires exactly 448 bits?
RJF

Richard Fateman

unread,

Jan 16, 2021, 1:57:00 PM1/16/21

to

On Friday, January 15, 2021 at 6:49:42 AM UTC-8, jacob navia wrote:

Richard Fateman

unread,

Jan 16, 2021, 1:59:25 PM1/16/21

to

Here is how to avoid divisions in sqrt.

https://people.eecs.berkeley.edu/~wkahan/ieee754status/reciprt.pdf

jacob navia

unread,

Jan 16, 2021, 6:37:07 PM1/16/21

to

Le 16/01/2021 à 19:54, Richard Fateman a écrit :

> Is there is a particular important calculation that requires exactly 448 bits?
> RJF
>

The format I use is seven 64 bit numbers (the mantissa), an exponent of
32 bits and a sign of 32 bits making the number 64 bytes wide, a cache
line in many processors.

It is designed for speed, not space. 32 bits to store the sign bit seems
ludicrous but it is fast, 32 bits for the exponent is also too much but
it is fast. I could have reduced the space and use 32 bits more but the
algorithms would need to treat the first part of the mantissa
differently, what would really slow down things.

So, I decided to keep a seven 64 bit integers as mantissa (448 bits)
what gives around 135 decimal places.

The number of atoms in the universe is between 10^78 and 10^82, so this
format allows you to couunt each atom and even the numbers of neutrons
in each one if you want. It is a fast format, enough for most practical
purposes.

I wanted a very high precision system but with a fixed but large
precision...

This system can run in an inexpensive ARM64 chip (system cost US$ 25)
making very high precision calculations available in those systems.

Now that Apple took the starting ARM train, it shines in the M1, a very
fast machine.

Thanks for your time

jacob navia

unread,

Jan 16, 2021, 6:40:43 PM1/16/21

to

Le 15/01/2021 à 20:01, James Cloos a écrit :
> perhaps https://www.sollya.org/ can help?
>
> -JimC
>

Thank you very much!

Very interesting

Richard Fateman

unread,

Jan 19, 2021, 8:19:30 PM1/19/21

to

The rationale for particular sizes of exponent and mantissa was a subject for discussion in the
IEEE floating-point committee. As I recall, the number of electrons in the universe is
not so relevant. I forget the details. certainly 32 bits for the sign is excessive. Dunno
about the exponent. It would be nice if you also allowed for IEEE special items like
NaN, infinity, signed zero. etc.

If you are doing a sequence of floating-point computations in which each operation loses
some information to round-off, the usefulness of a particular length mantissa
depends on how long a sequence you have.

I have repeatedly posted one-line programs in Mathematica that lose
a decimal digit about every 3 times through a loop. The computation, repeated, is
x= 2*x-x

Cheers
RJF

Mostowski Collapse

unread,

Jan 20, 2021, 2:22:47 AM1/20/21

to

What was the rounding mode?
Round towards bullshit?

Richard Fateman schrieb:

Richard Fateman

unread,

Jan 20, 2021, 10:42:32 AM1/20/21

to

I think that may be a fair description of the default (and only) rounding mode.
If you have a copy of Mathematica, try this

x=1.00000000000000000000
Do[x = 2*x - x, 120]
If[x==0.0,BS]

returns BS.

jacob navia

unread,

Jan 20, 2021, 11:38:48 AM1/20/21

to

This program

1 #include "qhead.h"
2 int main(void)
3 {
4 Qfloat q[1],tmp[1];
5
6 asctoq("1.00000000000000000000",q,NULL);
7 for (int i=0; i<1200000; i++) {
8 qmul(q,qtwo,tmp); // q= q*2
9 qsub(q,tmp,q); // q = (q*2) - q
10 }
11 qprint(q);
12 }

prints (after 1 200 000 iterations)

1

The syntax is shown using the raw library. Using my compiler system
(lcc-win) you can use these floats as a primitive type:

8
9 q = (q*2) - q;

Nasser M. Abbasi

unread,

Jan 20, 2021, 12:31:47 PM1/20/21

to

This also prints 1 in Mathematica

x = 1.00000000000000000000`;
Do[x = 2*x - x, 120];
Print[x]

1

While

x = 1.00000000000000000000;
Do[x = 2*x - x, 120];
Print[x]

0.*10^37

The difference is the use of the backtick ` at end
of the real number.

This tells mathematica to use machine precision avoiding the issue.

--Nasser

Richard Fateman

unread,

Jan 20, 2021, 1:58:05 PM1/20/21

to

On Wednesday, January 20, 2021 at 9:31:47 AM UTC-8, Nasser M. Abbasi wrote:

>
> The difference is the use of the backtick ` at end
> of the real number.
>
> This tells mathematica to use machine precision avoiding the issue.

Using machine precision has other issues of course. The computation without the backtick is
done in HIGHER (software implemented) precision by Mathematica. What you are pointing
out is that in this example, lower precision implemented better by the hardware IEEE floating-point
arithmetic is apparently better than the higher-precision Mathematica software. Also
note the very terrible result, 0.*10^37, which compares EQUAL to zero. Imagine having
written an iterative algorithm that checks for convergence, and "succeeds" not when
a result is zero, but when a result has no significance. it is possible to program around
such issues, if you are aware that this is an issue at all, and are sufficiently clever. But
if you write a program f[x_]:= Do[x=2*x-x,120], you might not know what kind of number you
are given. I suppose you could convert every input to machine float, but is that what you want?
RJF

Mostowski Collapse

unread,

Jan 20, 2021, 3:24:51 PM1/20/21

to

Did somebody already file a bug?
The result doesn't make any sense,
neither with radix=10 nor with radix=2.

Nasser M. Abbasi schrieb:

Richard Fateman

unread,

Jan 20, 2021, 11:28:36 PM1/20/21

to

This is supposedly a feature, not a bug. I reported it many years ago, in a published review
of Mathematica, circa 1992.
The value x has many interesting properties including x==x+1, also
x==1&&x==2 is True.

It is possible to counteract this phenomenon by explicit calls to SetPrecision,
but how are you supposed to know that. No other system that I am
aware of requires such a thing.
RJF

Mostowski Collapse

unread,

Jan 23, 2021, 5:21:14 PM1/23/21

to

Thanks, but no thanks. I rather prefer:
https://de.wikipedia.org/wiki/Maxima_%28Computeralgebrasystem%29

(%i1) x:1.00000000;
(%o1) 1.0
(%i2) x:2*x-x;
(%o2) 1.0

Richard Fateman schrieb:

anti...@math.uni.wroc.pl

unread,

Jan 24, 2021, 8:34:18 AM1/24/21

to

Richard Fateman <fat...@gmail.com> wrote:
> On Saturday, January 16, 2021 at 3:40:43 PM UTC-8, jacob navia wrote:

> > Le 15/01/2021 ? 20:01, James Cloos a ?crit :

> > > perhaps https://www.sollya.org/ can help?
> > >
> > > -JimC
> > >
> > Thank you very much!
> >
> > Very interesting
>
> The rationale for particular sizes of exponent and mantissa was a subject for discussion in the
> IEEE floating-point committee. As I recall, the number of electrons in the universe is
> not so relevant. I forget the details. certainly 32 bits for the sign is excessive. Dunno
> about the exponent. It would be nice if you also allowed for IEEE special items like
> NaN, infinity, signed zero. etc.
>
> If you are doing a sequence of floating-point computations in which each operation loses
> some information to round-off, the usefulness of a particular length mantissa
> depends on how long a sequence you have.
>
> I have repeatedly posted one-line programs in Mathematica that lose
> a decimal digit about every 3 times through a loop. The computation, repeated, is
> x= 2*x-x

This is very artifical example. AFAICS x = 4*x(1 - 1) would be
a bit closer to real life. It naturaly looses 1 bit of
accuracy every iteration and with Mathematica probably 2 bits
per iteration.

--
Waldek Hebisch

nob...@nowhere.invalid

unread,

Jan 24, 2021, 11:22:16 AM1/24/21

to

anti...@math.uni.wroc.pl schrieb:

>
> Richard Fateman <fat...@gmail.com> wrote:
> >
> > The rationale for particular sizes of exponent and mantissa was a subject for discussion in the
> > IEEE floating-point committee. As I recall, the number of electrons in the universe is
> > not so relevant. I forget the details. certainly 32 bits for the sign is excessive. Dunno
> > about the exponent. It would be nice if you also allowed for IEEE special items like
> > NaN, infinity, signed zero. etc.
> >
> > If you are doing a sequence of floating-point computations in which each operation loses
> > some information to round-off, the usefulness of a particular length mantissa
> > depends on how long a sequence you have.
> >
> > I have repeatedly posted one-line programs in Mathematica that lose
> > a decimal digit about every 3 times through a loop. The computation, repeated, is
> > x= 2*x-x
>
> This is very artifical example. AFAICS x = 4*x(1 - 1) would be
> a bit closer to real life. It naturaly looses 1 bit of
> accuracy every iteration and with Mathematica probably 2 bits
> per iteration.
>

I suspect a typo somewhere:

ITERATES(4*x*(1 - 1), x, 1, 10)

[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Martin.

anti...@math.uni.wroc.pl

unread,

Jan 27, 2021, 8:33:14 PM1/27/21

to

Should be: x = 4*x(1 - x) considered as mapping from interval
[0, 1] to itself.

Extra info: there is countable set of value for which in exact
arithmetic you eventually get zero. Most ot them are irrational,
but 0, 1/2, 1 are rational, so if you want to see loss of
precision, then x should be different from those rational
values.

BTW. Case with coefficient 4 is easy for theoretical analysis.
People did a lot of experiments studying what happens if
you replace 4 by something slightly smaller. When coefficient
is small enough then resulting sequence converges to 0
regardless of starting value.

--
Waldek Hebisch