Oh yeah, and where is the FAQ for this newsgroup kept?
--
David Chester
> Hi,
> Anyone know where I can get hold of a Integer Square root algorithm for ARM assembler.
You could use the following iteration formula:
Y1 = ( Y0 + W / Y0 ) / 2
where Y1 = (N+1)th estimation
Y0 = Nth estimation
W = number you want the square root from
you stop the iteration if Y1=Y0.
care should be taken with the first estimation,if you take 1 it is fast
for small sqrts and slow for big. If you take 65536 it is the other way
round.
ARM code [r0=sqrt(r0)]:
stmfd r13!,{r14}
mov r3,r0 ;r3=W
mov r0,#16384*2 ;r0=start estimation * 2
loop
mov r14,r0,lsr#1
mov r0,r3
mov r1,r14
bl FDIV ;r0=r0/r1 r2=scrapped
add r0,r0,r14
cmp r14,r0,lsr#1
bne loop
mov r0,r0,lsr#1
ldmfd r13!,{pc}^
For the division look at my homepage please:
http://rcswww.urz.tu-dresden.de/~teich-p/adiv1e.html
On a Strongarm 202.8 MHz it takes between 2 and 2.5 seconds with the above
start estimation for 1000000 sqrts (2.5 for sqrt(1) and sqrt(4e9).
--
Peter Teichmann
----------------------------------------------------------------------------
Email: tei...@rcs.urz.tu-dresden.de WWW: rcswww.urz.tu-dresden.de/~teich-p
> Hi,
> Anyone know where I can get hold of a Integer Square root algorithm for
> ARM assembler.
>
> Oh yeah, and where is the FAQ for this newsgroup kept?
I'm, fairly sure that this has been asked before on c.s.a. - A search on
DejaNews will probably find it.
The nornal way of calculating square roots is by the use of the
Newton-Raphson theorum which is an iterative approximation.
--
.................................................................
......Make Everything as Simple as possible, but not simpler.....
.................................................................
Mark Olleson, Electronic Engineer
ma...@vapour-trail.demon.co.uk, mjol...@iee.org.uk
A sorcerer is one who by commerce with the Devil has a full intention of
attaining his own ends.
DC> Anyone know where I can get hold of a Integer Square root algorithm for
DC> ARM assembler.
This article appeared sometime in usent:
From: Mechanoid <dmal...@argonet.co.uk>
Reply-To: Mechanoid <dmal...@argonet.co.uk>
X-Newsreader: NewsAgent 0.84 for RISC OS
In article <na.41c12647fc.a7...@argonet.co.uk>, G N Dodson
<gerald...@argonet.co.uk> wrote:
> Title says it all really. I don't get on well with the fp op's but need to
> calculate correlation coefficients which involve SQRT's.
Here's mine:
; Optimised 32-bit integer sqrt
; Calculates square root of R0
; Result returned in R0, uses R1 and R2
; Worst case execution time exactly 70 S cycles
.intsqr
MOV r1,#0
CMP r1,r0,LSR #8
BEQ bit8 ; &000000xx numbers
CMP r1,r0,LSR #16
BEQ bit16 ; &0000xxxx numbers
CMP r1,r0,LSR #24
BEQ bit24 ; &00xxxxxx numbers
SUBS r2,r0,#&40000000
MOVCS r0,r2
MOVCS r1,#&10000
ADD r2,r1,#&4000
SUBS r2,r0,r2,LSL#14
MOVCS r0,r2
ADDCS r1,r1,#&8000
ADD r2,r1,#&2000
SUBS r2,r0,r2,LSL#13
MOVCS r0,r2
ADDCS r1,r1,#&4000
.bit24
ADD r2,r1,#&1000
SUBS r2,r0,r2,LSL#12
MOVCS r0,r2
ADDCS r1,r1,#&2000
ADD r2,r1,#&800
SUBS r2,r0,r2,LSL#11
MOVCS r0,r2
ADDCS r1,r1,#&1000
ADD r2,r1,#&400
SUBS r2,r0,r2,LSL#10
MOVCS r0,r2
ADDCS r1,r1,#&800
ADD r2,r1,#&200
SUBS r2,r0,r2,LSL#9
MOVCS r0,r2
ADDCS r1,r1,#&400
.bit16
ADD r2,r1,#&100
SUBS r2,r0,r2,LSL#8
MOVCS r0,r2
ADDCS r1,r1,#&200
ADD r2,r1,#&80
SUBS r2,r0,r2,LSL#7
MOVCS r0,r2
ADDCS r1,r1,#&100
ADD r2,r1,#&40
SUBS r2,r0,r2,LSL#6
MOVCS r0,r2
ADDCS r1,r1,#&80
ADD r2,r1,#&20
SUBS r2,r0,r2,LSL#5
MOVCS r0,r2
ADDCS r1,r1,#&40
.bit8
ADD r2,r1,#&10
SUBS r2,r0,r2,LSL#4
MOVCS r0,r2
ADDCS r1,r1,#&20
ADD r2,r1,#&8
SUBS r2,r0,r2,LSL#3
MOVCS r0,r2
ADDCS r1,r1,#&10
ADD r2,r1,#&4
SUBS r2,r0,r2,LSL#2
MOVCS r0,r2
ADDCS r1,r1,#&8
ADD r2,r1,#&2
SUBS r2,r0,r2,LSL#1
MOVCS r0,r2
ADDCS r1,r1,#&4
ADD r2,r1,#&1
CMP r0,r2
ADDCS r1,r1,#&2
MOV r0,r1,LSR#1 ; result in R0
Optimised by myself, but I dunno where I got the original from.
If by 'integer' you meant a fixed point routine, then I've got one of them
too.
Cheers,
--
Dan "Mech" Maloney.
__ _______ ______ __
/ |/ / __/ ___/ /_/ / # Mechanoid the Haemorroidal Android
/ /|_/ / _// /__/ __ / # Disclaimer: I may have been wrong deliberately
/_/ /_/___/\___/_/ /_/ # mailto:mech...@usa.net
---
* Origin: ArgoNet, but does not reflect its views (2:246/1416.500)
>Anyone know where I can get hold of a Integer Square root algorithm for ARM assembler.
Here's what I use, in C. It returns floor(sqrt(n)):
==============================================================================
typedef unsigned int u32;
u32 isqrt(const u32 n);
/*-- isqrt -----------------------------------------------------------------*/
u32 isqrt(u32 n) {
u32 s, t;
#define sqrtBit(k) \
t = s+(1UL<<(k-1)); t <<= k+1; if (n >= t) { n -= t; s |= 1UL<<k; }
s = 0UL;
#ifdef __alpha
if (n >= 1UL<<62) { n -= 1UL<<62; s = 1UL<<31; }
sqrtBit(30); sqrtBit(29); sqrtBit(28); sqrtBit(27); sqrtBit(26);
sqrtBit(25); sqrtBit(24); sqrtBit(23); sqrtBit(22); sqrtBit(21);
sqrtBit(20); sqrtBit(19); sqrtBit(18); sqrtBit(17); sqrtBit(16);
sqrtBit(15);
#else
if (n >= 1UL<<30) { n -= 1UL<<30; s = 1UL<<15; }
#endif
sqrtBit(14); sqrtBit(13); sqrtBit(12); sqrtBit(11); sqrtBit(10);
sqrtBit(9); sqrtBit(8); sqrtBit(7); sqrtBit(6); sqrtBit(5);
sqrtBit(4); sqrtBit(3); sqrtBit(2); sqrtBit(1);
if (n > s<<1) s |= 1UL;
#undef sqrtBit
return s;
} /* end isqrt */
==============================================================================
I compiled it with a cross-compiler from x86 Linux to ARM, running the
compiler on Alpha Linux emulating an x86 (!):
==============================================================================
>/usr/local/gnuarm/arm-unknown-coff/bin/gcc -O2 -fomit-frame-pointer isqrt.c -o isqrt.s -S
>cat isqrt.s
@ Generated by gcc 2.8.1 for ARM/coff
.text
.align 0
.global _isqrt
_isqrt:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, current_function_anonymous_args = 0
mov r2, #0
cmn r0, #-1073741823
addhi r0, r0, #-1073741824
movhi r2, #32768
.L2:
add r3, r2, #8192
mov r3, r3, asl #15
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #16384
.L3:
add r3, r2, #4096
mov r3, r3, asl #14
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #8192
.L4:
add r3, r2, #2048
mov r3, r3, asl #13
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #4096
.L5:
add r3, r2, #1024
mov r3, r3, asl #12
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #2048
.L6:
add r3, r2, #512
mov r3, r3, asl #11
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #1024
.L7:
add r3, r2, #256
mov r3, r3, asl #10
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #512
.L8:
add r3, r2, #128
mov r3, r3, asl #9
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #256
.L9:
add r3, r2, #64
mov r3, r3, asl #8
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #128
.L10:
add r3, r2, #32
mov r3, r3, asl #7
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #64
.L11:
add r3, r2, #16
mov r3, r3, asl #6
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #32
.L12:
add r3, r2, #8
mov r3, r3, asl #5
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #16
.L13:
add r3, r2, #4
mov r3, r3, asl #4
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #8
.L14:
add r3, r2, #2
mov r3, r3, asl #3
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #4
.L15:
add r3, r2, #1
mov r3, r3, asl #2
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #2
.L16:
cmp r0, r2, asl #1
movls r0, r2
orrhi r0, r2, #1
mov pc, lr
==============================================================================
You could do the obvious optimisation replacing:
add r3, r2, #8192
mov r3, r3, asl #15
cmp r0, r3
rsbcs r0, r3, r0
orrcs r2, r2, #16384
with:
add r3, r2, #8192
cmp r0, r3, asl #15
subcs r0, r0, r3, asl #15
orrcs r2, r2, #16384
and so on, for 4 cycles per bit.
Bye,
Rob.
> Hi,
> Anyone know where I can get hold of a Integer Square root
> algorithm for ARM assembler.
I find this piece of C is optimal, in the sense that NorCroft C on the ARM
produces exactly the assembler I'd expect.
static __pure unsigned int isqrt(unsigned int i)
{
int max;
int j;
unsigned int a, c, d, s;
a=0; c=0;
if (i>=0x10000) max=15; else max=7;
s = 1<<(max*2);
j=max;
do {
d = c + (a<<(j+1)) + s;
if (d<=i) { c=d; a |= 1<<j; };
if (d!=i) { s>>=2; j--; }
} while ((j>=0)&&(d!=i));
return a;
}
You may want to remove the __pure, and tell your C compiler some other way
that this is a pure function.
you may also wish to unroll this...
cheers,Rich.
--
Rich Walker: r...@shadow.org.uk (Shadow Robot Project)
http://www.shadow.org.uk 251 Liverpool Road
+44(0)171 700 2487 London N1 1LX
"Sometimes after an electrical storm I see in 5 dimensions"
-- Cornfed Pig, Duckman.
In article 1996/06/20 "3 cycle/bit square root (algorithm & code)" I wrote:
<Derivation of the algorithm removed>
; IN : n 32 bit unsigned integer
; OUT: root = INT (SQRT (n))
; TMP: offset
MOV offset, #3 << 30
MOV root, #1 << 30
[ unroll for i = 0 .. 15
CMP n, root, ROR #2 * i
SUBHS n, n, root, ROR #2 * i
ADC root, offset, root, LSL #1
]
BIC root, root, #3 << 30 ; for rounding add: CMP n, root ADC root, #1
This routine takes 3 * 16 + 3 = 51 cycles for a root of a 32 bit integer.
Of course, there are many other possibilities. The routine can be used with
some extra code which uses early termination if n is less than 32 bit. Also
fixed point versions are possible, or roots of 64 bit numbers. Even looped
versions are no problem: Shift in 8 or 16 bits at a time,
so you need only to unroll 4 or 8 times (overhead 6 cycles).
----
I also found some C code that is optimized for ARM (4 cycle per bit):
typedef unsigned uint32;
#define iter1(N) \
try = root + (1 << (N)); \
if (n >= try << (N)) \
{ n -= try << (N); \
root |= 2 << (N); \
}
uint32 sqrt (uint32 n)
{
uint32 root = 0, try;
iter1 (15); iter1 (14); iter1 (13); iter1 (12);
iter1 (11); iter1 (10); iter1 ( 9); iter1 ( 8);
iter1 ( 7); iter1 ( 6); iter1 ( 5); iter1 ( 4);
iter1 ( 3); iter1 ( 2); iter1 ( 1); iter1 ( 0);
return root >> 1;
}
> Oh yeah, and where is the FAQ for this newsgroup kept?
There isn't one - maybe we need to create one?
Wilco
-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
You may want to replace it with this, it uses the same early
termination, but takes only 9 cycles per bit...
typedef unsigned int uint32;
uint32 isqrt3(uint32 n)
{
uint32 root = 0, bit, trial;
bit = (n >= 0x10000) ? 1<<30 : 1<<14;
do
{
trial = root+bit;
if (n >= trial)
{
n -= trial;
root = trial+bit;
}
root >>= 1;
bit >>= 2;
} while (bit);
return root;
With that in mind, and so that next time this question comes up there
will be a reference point, I've taken every code suggestion in this
thread and edited them together to make a document on integer square
root routines for ARM processors. I've even dug up Wilco's original
article on his 3 cycle/bit square root routine which shows his
derivation.
You can find it at http://www.finesse.demon.co.uk/steven/sqrt.html
This does not mean I'm volunteering to produce a FAQ, but if I get some
positive feedback I may be persuaded to similarly archive other code
snippets.
- Steven
setting up a FAQ would probably be a good Idea, maby some pointers to sites with arm code on?
--
David Chester