Wanted: An integer SQRT algorithm in assembly

G N Dodson

unread,

Dec 23, 1997, 3:00:00 AM12/23/97

to

Title says it all really. I don't get on well with the fp op's but need to
calculate correlation coefficients which involve SQRT's.

cheers,

Dan.

--
----------------------------------------------------------------------------
Was Windows 95% Hype?
----------------------------------------------------------------------------
Sarah, Gerald, Andrew and MaRDi Acorn StrongARM Powered
http://www.argonet.co.uk/users/gerald.dodson/
----------------------------------------------------------------------------
Forever StrongARM'd (well until a faster one comes along)
----------------------------------------------------------------------------

Ian Bannister

unread,

Dec 24, 1997, 3:00:00 AM12/24/97

to

In article <na.41c12647fc.a7...@argonet.co.uk>, G N Dodson

<gerald...@argonet.co.uk> wrote:
>
> Title says it all really. I don't get on well with the fp op's but need to
> calculate correlation coefficients which involve SQRT's.
>

The exact solution depends on what you are using to represent the numbers.
In integers you can use a fixed point system to increase precision so you
could have an accurate square root of even the lowest numbers such as 2.
You can also optimise if the range of numbers you're using is limited.

A look-up table would seem to be the answer. It could operate like a set of
maths tables which generally have a couple of values added together to
improve accuracy without using a lot of memory for the tables.

If you're more exact with the requirements I might have a go. Unless speed
is very important though, it may be better to go for fp (the problems are
not great) than trying to code a completely general routine for all
integers.

Floating point code to make R1=sqrt(R0):-
FLTD f0,R0
SQTD f0,f0
FIXSZ R1,f0

This is a pure integer function. To use the 16:16 fixed point system you
could MOV R0,R0,LSL#16 before converting it to float

--
Ian Bannister ____________________________
||---------------------/ \
|| <-* banni...@argonet.co.uk *->
||---------------------\____________________________/
Wed,24 Dec 1997.17:12:40

Mechanoid

unread,

Dec 24, 1997, 3:00:00 AM12/24/97

to

In article <na.41c12647fc.a7...@argonet.co.uk>, G N Dodson
<gerald...@argonet.co.uk> wrote:

> Title says it all really. I don't get on well with the fp op's but need to
> calculate correlation coefficients which involve SQRT's.

Here's mine:

; Optimised 32-bit integer sqrt
; Calculates square root of R0
; Result returned in R0, uses R1 and R2
; Worst case execution time exactly 70 S cycles

.intsqr
MOV r1,#0
CMP r1,r0,LSR #8
BEQ bit8 ; &000000xx numbers

CMP r1,r0,LSR #16
BEQ bit16 ; &0000xxxx numbers

CMP r1,r0,LSR #24
BEQ bit24 ; &00xxxxxx numbers

SUBS r2,r0,#&40000000
MOVCS r0,r2
MOVCS r1,#&10000

ADD r2,r1,#&4000
SUBS r2,r0,r2,LSL#14
MOVCS r0,r2
ADDCS r1,r1,#&8000

ADD r2,r1,#&2000
SUBS r2,r0,r2,LSL#13
MOVCS r0,r2
ADDCS r1,r1,#&4000

.bit24
ADD r2,r1,#&1000
SUBS r2,r0,r2,LSL#12
MOVCS r0,r2
ADDCS r1,r1,#&2000

ADD r2,r1,#&800
SUBS r2,r0,r2,LSL#11
MOVCS r0,r2
ADDCS r1,r1,#&1000

ADD r2,r1,#&400
SUBS r2,r0,r2,LSL#10
MOVCS r0,r2
ADDCS r1,r1,#&800

ADD r2,r1,#&200
SUBS r2,r0,r2,LSL#9
MOVCS r0,r2
ADDCS r1,r1,#&400

.bit16
ADD r2,r1,#&100
SUBS r2,r0,r2,LSL#8
MOVCS r0,r2
ADDCS r1,r1,#&200

ADD r2,r1,#&80
SUBS r2,r0,r2,LSL#7
MOVCS r0,r2
ADDCS r1,r1,#&100

ADD r2,r1,#&40
SUBS r2,r0,r2,LSL#6
MOVCS r0,r2
ADDCS r1,r1,#&80

ADD r2,r1,#&20
SUBS r2,r0,r2,LSL#5
MOVCS r0,r2
ADDCS r1,r1,#&40

.bit8
ADD r2,r1,#&10
SUBS r2,r0,r2,LSL#4
MOVCS r0,r2
ADDCS r1,r1,#&20

ADD r2,r1,#&8
SUBS r2,r0,r2,LSL#3
MOVCS r0,r2
ADDCS r1,r1,#&10

ADD r2,r1,#&4
SUBS r2,r0,r2,LSL#2
MOVCS r0,r2
ADDCS r1,r1,#&8

ADD r2,r1,#&2
SUBS r2,r0,r2,LSL#1
MOVCS r0,r2
ADDCS r1,r1,#&4

ADD r2,r1,#&1
CMP r0,r2
ADDCS r1,r1,#&2

MOV r0,r1,LSR#1 ; result in R0

Optimised by myself, but I dunno where I got the original from.

If by 'integer' you meant a fixed point routine, then I've got one of them
too.

Cheers,

--
Dan "Mech" Maloney.
__ _______ ______ __
/ |/ / __/ ___/ /_/ / # Mechanoid the Haemorroidal Android
/ /|_/ / _// /__/ __ / # Disclaimer: I may have been wrong deliberately
/_/ /_/___/\___/_/ /_/ # mailto:mech...@usa.net

Mark Olleson

unread,

Dec 25, 1997, 3:00:00 AM12/25/97

to

In message <na.41c12647fc.a7...@argonet.co.uk>

G N Dodson <gerald...@argonet.co.uk> wrote:

> Title says it all really. I don't get on well with the fp op's but need to
> calculate correlation coefficients which involve SQRT's.

If you want an algorithm for doing this in integer precision, try using the
Newton-Raphson method. [this is in fact an iterative approximation method,
but you'll probably find that its plenty quick enough in integer precision.]

IIRC any (+ve!) seed value is good for calculating SQRT, although you can
use a look-up table to choose a result that is somewhere in the ball-park.

If you get stuck e-mail me and I'll work out the algorithm needed

PS you can also do division using N-R and a look-up table. I'd hazard a guess
that for some divisions it'll be better than the onrolled algorithms..
>

--
.................................................................
......Make Everything as Simple as possible, but not simpler.....
.................................................................

Mark Olleson - Department of Electrical & Electronic Engineering
Newcastle University, UK. M.J.O...@ncl.ac.uk

Antoon Pardon

unread,

Jan 5, 1998, 3:00:00 AM1/5/98

to

Mark Olleson (M.J.O...@ncl.ac.uk) wrote:
: In message <na.41c12647fc.a7...@argonet.co.uk>

: G N Dodson <gerald...@argonet.co.uk> wrote:

: > Title says it all really. I don't get on well with the fp op's but need to
: > calculate correlation coefficients which involve SQRT's.

: If you want an algorithm for doing this in integer precision, try using the
: Newton-Raphson method. [this is in fact an iterative approximation method,
: but you'll probably find that its plenty quick enough in integer precision.]

: IIRC any (+ve!) seed value is good for calculating SQRT, although you can
: use a look-up table to choose a result that is somewhere in the ball-park.

To pick a good starting value. Take a value that has half as many
significant bits as the argument.

--
All opinions expressed herein are currently under revision
==========================================================
Antoon Pardon Brussels Free University Computing Centre
==========================================================

Jon Harrop

unread,

Jan 8, 1998, 3:00:00 AM1/8/98

to

Ian Bannister <banni...@argonet.co.uk> wrote in article
<na.5994f547fd.a...@argonet.co.uk>...
> In article <na.41c12647fc.a7...@argonet.co.uk>, G N Dodson

> <gerald...@argonet.co.uk> wrote:
> >
> > Title says it all really. I don't get on well with the fp op's but need
to
> > calculate correlation coefficients which involve SQRT's.
> >

> The exact solution depends on what you are using to represent the
numbers.

> In integers you can use a fixed point system...

Then they are no-longer integers... :-)

Jon.

Jon Harrop

unread,

Jan 8, 1998, 3:00:00 AM1/8/98

to

Antoon Pardon <apa...@rc4.vub.ac.be> wrote in article
<68qjqe$j...@rc1.vub.ac.be>...

> Mark Olleson (M.J.O...@ncl.ac.uk) wrote:
> : In message <na.41c12647fc.a7...@argonet.co.uk>

> : G N Dodson <gerald...@argonet.co.uk> wrote:
>
> : > Title says it all really. I don't get on well with the fp op's but
need to
> : > calculate correlation coefficients which involve SQRT's.
>

> : If you want an algorithm for doing this in integer precision, try using
the
> : Newton-Raphson method. [this is in fact an iterative approximation
method,
> : but you'll probably find that its plenty quick enough in integer
precision.]
>
> : IIRC any (+ve!) seed value is good for calculating SQRT, although you
can
> : use a look-up table to choose a result that is somewhere in the
ball-park.
>
> To pick a good starting value. Take a value that has half as many
> significant bits as the argument.

For the NR algorithm the number of decimal places accuracy (when near
to the solution) doubles with each iteration. This makes it useful for
postage
stamp calculations but a bitwise solution is generally preferable on a
computer.

A 51 cycle routine is available. IIRC a complete discussion of its
derivation
may be found at Robin Watt's home page (at .ox).

I have mailed Mr Dodson all my information on this matter.

Cheers,
Jon.