Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss
Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Square Root

139 views
Skip to first unread message

David Chester

unread,
Oct 3, 1998, 3:00:00 AM10/3/98
to
Hi,
Anyone know where I can get hold of a Integer Square root algorithm for ARM assembler.

Oh yeah, and where is the FAQ for this newsgroup kept?
--
David Chester

Peter Teichmann

unread,
Oct 4, 1998, 3:00:00 AM10/4/98
to
In message <2c76468f48%Da...@chescam.demon.co.uk>
David Chester <Da...@chescam.demon.co.uk> wrote:

> Hi,
> Anyone know where I can get hold of a Integer Square root algorithm for ARM assembler.

You could use the following iteration formula:

Y1 = ( Y0 + W / Y0 ) / 2

where Y1 = (N+1)th estimation
Y0 = Nth estimation
W = number you want the square root from

you stop the iteration if Y1=Y0.

care should be taken with the first estimation,if you take 1 it is fast
for small sqrts and slow for big. If you take 65536 it is the other way
round.

ARM code [r0=sqrt(r0)]:

stmfd r13!,{r14}

mov r3,r0 ;r3=W
mov r0,#16384*2 ;r0=start estimation * 2

loop
mov r14,r0,lsr#1
mov r0,r3
mov r1,r14
bl FDIV ;r0=r0/r1 r2=scrapped
add r0,r0,r14
cmp r14,r0,lsr#1
bne loop

mov r0,r0,lsr#1

ldmfd r13!,{pc}^

For the division look at my homepage please:
http://rcswww.urz.tu-dresden.de/~teich-p/adiv1e.html

On a Strongarm 202.8 MHz it takes between 2 and 2.5 seconds with the above
start estimation for 1000000 sqrts (2.5 for sqrt(1) and sqrt(4e9).


--
Peter Teichmann

----------------------------------------------------------------------------
Email: tei...@rcs.urz.tu-dresden.de WWW: rcswww.urz.tu-dresden.de/~teich-p

Mark Olleson

unread,
Oct 4, 1998, 3:00:00 AM10/4/98
to

> Hi,
> Anyone know where I can get hold of a Integer Square root algorithm for
> ARM assembler.
>

> Oh yeah, and where is the FAQ for this newsgroup kept?

I'm, fairly sure that this has been asked before on c.s.a. - A search on
DejaNews will probably find it.

The nornal way of calculating square roots is by the use of the
Newton-Raphson theorum which is an iterative approximation.

--
.................................................................
......Make Everything as Simple as possible, but not simpler.....
.................................................................

Mark Olleson, Electronic Engineer
ma...@vapour-trail.demon.co.uk, mjol...@iee.org.uk
A sorcerer is one who by commerce with the Devil has a full intention of
attaining his own ends.

Andreas Joos

unread,
Oct 4, 1998, 3:00:00 AM10/4/98
to
David Chester <Da...@chescam.demon.co.uk> wrote:

DC> Anyone know where I can get hold of a Integer Square root algorithm for
DC> ARM assembler.

This article appeared sometime in usent:

From: Mechanoid <dmal...@argonet.co.uk>
Reply-To: Mechanoid <dmal...@argonet.co.uk>
X-Newsreader: NewsAgent 0.84 for RISC OS

In article <na.41c12647fc.a7...@argonet.co.uk>, G N Dodson
<gerald...@argonet.co.uk> wrote:

> Title says it all really. I don't get on well with the fp op's but need to
> calculate correlation coefficients which involve SQRT's.

Here's mine:


; Optimised 32-bit integer sqrt
; Calculates square root of R0
; Result returned in R0, uses R1 and R2
; Worst case execution time exactly 70 S cycles

.intsqr
MOV r1,#0
CMP r1,r0,LSR #8
BEQ bit8 ; &000000xx numbers

CMP r1,r0,LSR #16
BEQ bit16 ; &0000xxxx numbers

CMP r1,r0,LSR #24
BEQ bit24 ; &00xxxxxx numbers

SUBS r2,r0,#&40000000
MOVCS r0,r2
MOVCS r1,#&10000

ADD r2,r1,#&4000
SUBS r2,r0,r2,LSL#14
MOVCS r0,r2
ADDCS r1,r1,#&8000

ADD r2,r1,#&2000
SUBS r2,r0,r2,LSL#13
MOVCS r0,r2
ADDCS r1,r1,#&4000

.bit24
ADD r2,r1,#&1000
SUBS r2,r0,r2,LSL#12
MOVCS r0,r2
ADDCS r1,r1,#&2000

ADD r2,r1,#&800
SUBS r2,r0,r2,LSL#11
MOVCS r0,r2
ADDCS r1,r1,#&1000

ADD r2,r1,#&400
SUBS r2,r0,r2,LSL#10
MOVCS r0,r2
ADDCS r1,r1,#&800

ADD r2,r1,#&200
SUBS r2,r0,r2,LSL#9
MOVCS r0,r2
ADDCS r1,r1,#&400

.bit16
ADD r2,r1,#&100
SUBS r2,r0,r2,LSL#8
MOVCS r0,r2
ADDCS r1,r1,#&200

ADD r2,r1,#&80
SUBS r2,r0,r2,LSL#7
MOVCS r0,r2
ADDCS r1,r1,#&100

ADD r2,r1,#&40
SUBS r2,r0,r2,LSL#6
MOVCS r0,r2
ADDCS r1,r1,#&80

ADD r2,r1,#&20
SUBS r2,r0,r2,LSL#5
MOVCS r0,r2
ADDCS r1,r1,#&40

.bit8
ADD r2,r1,#&10
SUBS r2,r0,r2,LSL#4
MOVCS r0,r2
ADDCS r1,r1,#&20

ADD r2,r1,#&8
SUBS r2,r0,r2,LSL#3
MOVCS r0,r2
ADDCS r1,r1,#&10

ADD r2,r1,#&4
SUBS r2,r0,r2,LSL#2
MOVCS r0,r2
ADDCS r1,r1,#&8

ADD r2,r1,#&2
SUBS r2,r0,r2,LSL#1
MOVCS r0,r2
ADDCS r1,r1,#&4

ADD r2,r1,#&1
CMP r0,r2
ADDCS r1,r1,#&2

MOV r0,r1,LSR#1 ; result in R0


Optimised by myself, but I dunno where I got the original from.

If by 'integer' you meant a fixed point routine, then I've got one of them
too.

Cheers,

--
Dan "Mech" Maloney.
__ _______ ______ __
/ |/ / __/ ___/ /_/ / # Mechanoid the Haemorroidal Android
/ /|_/ / _// /__/ __ / # Disclaimer: I may have been wrong deliberately
/_/ /_/___/\___/_/ /_/ # mailto:mech...@usa.net

---
* Origin: ArgoNet, but does not reflect its views (2:246/1416.500)

Robert Harley

unread,
Oct 5, 1998, 3:00:00 AM10/5/98
to
David Chester <Da...@chescam.demon.co.uk> writes:

>Anyone know where I can get hold of a Integer Square root algorithm for ARM assembler.


Here's what I use, in C. It returns floor(sqrt(n)):

==============================================================================
typedef unsigned int u32;

u32 isqrt(const u32 n);

/*-- isqrt -----------------------------------------------------------------*/

u32 isqrt(u32 n) {
u32 s, t;

#define sqrtBit(k) \
t = s+(1UL<<(k-1)); t <<= k+1; if (n >= t) { n -= t; s |= 1UL<<k; }

s = 0UL;
#ifdef __alpha
if (n >= 1UL<<62) { n -= 1UL<<62; s = 1UL<<31; }
sqrtBit(30); sqrtBit(29); sqrtBit(28); sqrtBit(27); sqrtBit(26);
sqrtBit(25); sqrtBit(24); sqrtBit(23); sqrtBit(22); sqrtBit(21);
sqrtBit(20); sqrtBit(19); sqrtBit(18); sqrtBit(17); sqrtBit(16);
sqrtBit(15);
#else
if (n >= 1UL<<30) { n -= 1UL<<30; s = 1UL<<15; }
#endif
sqrtBit(14); sqrtBit(13); sqrtBit(12); sqrtBit(11); sqrtBit(10);
sqrtBit(9); sqrtBit(8); sqrtBit(7); sqrtBit(6); sqrtBit(5);
sqrtBit(4); sqrtBit(3); sqrtBit(2); sqrtBit(1);
if (n > s<<1) s |= 1UL;

#undef sqrtBit

return s;
} /* end isqrt */
==============================================================================

I compiled it with a cross-compiler from x86 Linux to ARM, running the
compiler on Alpha Linux emulating an x86 (!):

==============================================================================
>/usr/local/gnuarm/arm-unknown-coff/bin/gcc -O2 -fomit-frame-pointer isqrt.c -o isqrt.s -S
>cat isqrt.s

@ Generated by gcc 2.8.1 for ARM/coff
.text
.align 0
.global _isqrt
_isqrt:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, current_function_anonymous_args = 0
mov r2, #0
cmn r0, #-1073741823
addhi r0, r0, #-1073741824
movhi r2, #32768
.L2:
add r3, r2, #8192
mov r3, r3, asl #15
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #16384
.L3:
add r3, r2, #4096
mov r3, r3, asl #14
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #8192
.L4:
add r3, r2, #2048
mov r3, r3, asl #13
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #4096
.L5:
add r3, r2, #1024
mov r3, r3, asl #12
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #2048
.L6:
add r3, r2, #512
mov r3, r3, asl #11
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #1024
.L7:
add r3, r2, #256
mov r3, r3, asl #10
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #512
.L8:
add r3, r2, #128
mov r3, r3, asl #9
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #256
.L9:
add r3, r2, #64
mov r3, r3, asl #8
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #128
.L10:
add r3, r2, #32
mov r3, r3, asl #7
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #64
.L11:
add r3, r2, #16
mov r3, r3, asl #6
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #32
.L12:
add r3, r2, #8
mov r3, r3, asl #5
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #16
.L13:
add r3, r2, #4
mov r3, r3, asl #4
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #8
.L14:
add r3, r2, #2
mov r3, r3, asl #3
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #4
.L15:
add r3, r2, #1
mov r3, r3, asl #2
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #2
.L16:
cmp r0, r2, asl #1
movls r0, r2
orrhi r0, r2, #1
mov pc, lr
==============================================================================

You could do the obvious optimisation replacing:

add r3, r2, #8192
mov r3, r3, asl #15
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #16384

with:

add r3, r2, #8192
cmp r0, r3, asl #15
subcs r0, r0, r3, asl #15
orrcs r2, r2, #16384

and so on, for 4 cycles per bit.

Bye,
Rob.

Rich Walker

unread,
Oct 5, 1998, 3:00:00 AM10/5/98
to

> Hi,


> Anyone know where I can get hold of a Integer Square root
> algorithm for ARM assembler.

I find this piece of C is optimal, in the sense that NorCroft C on the ARM
produces exactly the assembler I'd expect.

static __pure unsigned int isqrt(unsigned int i)
{
int max;
int j;
unsigned int a, c, d, s;
a=0; c=0;
if (i>=0x10000) max=15; else max=7;
s = 1<<(max*2);
j=max;
do {
d = c + (a<<(j+1)) + s;
if (d<=i) { c=d; a |= 1<<j; };
if (d!=i) { s>>=2; j--; }
} while ((j>=0)&&(d!=i));
return a;
}

You may want to remove the __pure, and tell your C compiler some other way
that this is a pure function.

you may also wish to unroll this...


cheers,Rich.

--
Rich Walker: r...@shadow.org.uk (Shadow Robot Project)
http://www.shadow.org.uk 251 Liverpool Road
+44(0)171 700 2487 London N1 1LX
"Sometimes after an electrical storm I see in 5 dimensions"
-- Cornfed Pig, Duckman.

Wilco.D...@arm.com

unread,
Oct 6, 1998, 3:00:00 AM10/6/98
to
In article <2c76468f48%Da...@chescam.demon.co.uk>,

David Chester <Da...@chescam.demon.co.uk> wrote:
> Hi,
> Anyone know where I can get hold of a Integer Square root algorithm for ARM
assembler.


In article 1996/06/20 "3 cycle/bit square root (algorithm & code)" I wrote:

<Derivation of the algorithm removed>

; IN : n 32 bit unsigned integer
; OUT: root = INT (SQRT (n))
; TMP: offset

MOV offset, #3 << 30
MOV root, #1 << 30
[ unroll for i = 0 .. 15
CMP n, root, ROR #2 * i
SUBHS n, n, root, ROR #2 * i
ADC root, offset, root, LSL #1
]
BIC root, root, #3 << 30 ; for rounding add: CMP n, root ADC root, #1

This routine takes 3 * 16 + 3 = 51 cycles for a root of a 32 bit integer.
Of course, there are many other possibilities. The routine can be used with
some extra code which uses early termination if n is less than 32 bit. Also
fixed point versions are possible, or roots of 64 bit numbers. Even looped
versions are no problem: Shift in 8 or 16 bits at a time,
so you need only to unroll 4 or 8 times (overhead 6 cycles).

----

I also found some C code that is optimized for ARM (4 cycle per bit):

typedef unsigned uint32;

#define iter1(N) \
try = root + (1 << (N)); \
if (n >= try << (N)) \
{ n -= try << (N); \
root |= 2 << (N); \
}

uint32 sqrt (uint32 n)
{
uint32 root = 0, try;
iter1 (15); iter1 (14); iter1 (13); iter1 (12);
iter1 (11); iter1 (10); iter1 ( 9); iter1 ( 8);
iter1 ( 7); iter1 ( 6); iter1 ( 5); iter1 ( 4);
iter1 ( 3); iter1 ( 2); iter1 ( 1); iter1 ( 0);
return root >> 1;
}

> Oh yeah, and where is the FAQ for this newsgroup kept?

There isn't one - maybe we need to create one?

Wilco

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

Wilco.D...@arm.com

unread,
Oct 6, 1998, 3:00:00 AM10/6/98
to
In article <48904B3108%r...@shadow.org.uk>,
r...@shadow.org.uk wrote:
> In message <2c76468f48%Da...@chescam.demon.co.uk>

> David Chester <Da...@chescam.demon.co.uk> wrote:
>
> > Hi,
> > Anyone know where I can get hold of a Integer Square root
> > algorithm for ARM assembler.
>
> I find this piece of C is optimal, in the sense that NorCroft C on the ARM
> produces exactly the assembler I'd expect.
>
> static __pure unsigned int isqrt(unsigned int i)
> {
> int max;
> int j;
> unsigned int a, c, d, s;
> a=0; c=0;
> if (i>=0x10000) max=15; else max=7;
> s = 1<<(max*2);
> j=max;
> do {
> d = c + (a<<(j+1)) + s;
> if (d<=i) { c=d; a |= 1<<j; };
> if (d!=i) { s>>=2; j--; }
> } while ((j>=0)&&(d!=i));
> return a;
> }

You may want to replace it with this, it uses the same early
termination, but takes only 9 cycles per bit...

typedef unsigned int uint32;

uint32 isqrt3(uint32 n)
{
uint32 root = 0, bit, trial;
bit = (n >= 0x10000) ? 1<<30 : 1<<14;
do
{
trial = root+bit;
if (n >= trial)
{
n -= trial;
root = trial+bit;
}
root >>= 1;
bit >>= 2;
} while (bit);
return root;

Steven Singer

unread,
Oct 7, 1998, 3:00:00 AM10/7/98
to
Wilco.D...@arm.com wrote:
: In article <2c76468f48%Da...@chescam.demon.co.uk>,
: David Chester <Da...@chescam.demon.co.uk> wrote:
: > Oh yeah, and where is the FAQ for this newsgroup kept?

: There isn't one - maybe we need to create one?

With that in mind, and so that next time this question comes up there
will be a reference point, I've taken every code suggestion in this
thread and edited them together to make a document on integer square
root routines for ARM processors. I've even dug up Wilco's original
article on his 3 cycle/bit square root routine which shows his
derivation.

You can find it at http://www.finesse.demon.co.uk/steven/sqrt.html

This does not mean I'm volunteering to produce a FAQ, but if I get some
positive feedback I may be persuaded to similarly archive other code
snippets.

- Steven

David Chester

unread,
Oct 11, 1998, 3:00:00 AM10/11/98
to
Many thanks people,

setting up a FAQ would probably be a good Idea, maby some pointers to sites with arm code on?
--
David Chester

0 new messages