44 views

Skip to first unread message

Oct 3, 1998, 3:00:00 AM10/3/98

to

Hi,

Anyone know where I can get hold of a Integer Square root algorithm for ARM assembler.

Anyone know where I can get hold of a Integer Square root algorithm for ARM assembler.

Oh yeah, and where is the FAQ for this newsgroup kept?

--

David Chester

Oct 4, 1998, 3:00:00 AM10/4/98

to

In message <2c76468f48%Da...@chescam.demon.co.uk>

David Chester <Da...@chescam.demon.co.uk> wrote:

David Chester <Da...@chescam.demon.co.uk> wrote:

> Hi,

> Anyone know where I can get hold of a Integer Square root algorithm for ARM assembler.

You could use the following iteration formula:

Y1 = ( Y0 + W / Y0 ) / 2

where Y1 = (N+1)th estimation

Y0 = Nth estimation

W = number you want the square root from

you stop the iteration if Y1=Y0.

care should be taken with the first estimation,if you take 1 it is fast

for small sqrts and slow for big. If you take 65536 it is the other way

round.

ARM code [r0=sqrt(r0)]:

stmfd r13!,{r14}

mov r3,r0 ;r3=W

mov r0,#16384*2 ;r0=start estimation * 2

loop

mov r14,r0,lsr#1

mov r0,r3

mov r1,r14

bl FDIV ;r0=r0/r1 r2=scrapped

add r0,r0,r14

cmp r14,r0,lsr#1

bne loop

mov r0,r0,lsr#1

ldmfd r13!,{pc}^

For the division look at my homepage please:

http://rcswww.urz.tu-dresden.de/~teich-p/adiv1e.html

On a Strongarm 202.8 MHz it takes between 2 and 2.5 seconds with the above

start estimation for 1000000 sqrts (2.5 for sqrt(1) and sqrt(4e9).

--

Peter Teichmann

----------------------------------------------------------------------------

Email: tei...@rcs.urz.tu-dresden.de WWW: rcswww.urz.tu-dresden.de/~teich-p

Oct 4, 1998, 3:00:00 AM10/4/98

to

> Hi,

> Anyone know where I can get hold of a Integer Square root algorithm for

> ARM assembler.

>

> Oh yeah, and where is the FAQ for this newsgroup kept?

I'm, fairly sure that this has been asked before on c.s.a. - A search on

DejaNews will probably find it.

The nornal way of calculating square roots is by the use of the

Newton-Raphson theorum which is an iterative approximation.

--

.................................................................

......Make Everything as Simple as possible, but not simpler.....

.................................................................

Mark Olleson, Electronic Engineer

ma...@vapour-trail.demon.co.uk, mjol...@iee.org.uk

A sorcerer is one who by commerce with the Devil has a full intention of

attaining his own ends.

Oct 4, 1998, 3:00:00 AM10/4/98

to

David Chester <Da...@chescam.demon.co.uk> wrote:

DC> Anyone know where I can get hold of a Integer Square root algorithm for

DC> ARM assembler.

This article appeared sometime in usent:

From: Mechanoid <dmal...@argonet.co.uk>

Reply-To: Mechanoid <dmal...@argonet.co.uk>

X-Newsreader: NewsAgent 0.84 for RISC OS

In article <na.41c12647fc.a7...@argonet.co.uk>, G N Dodson

<gerald...@argonet.co.uk> wrote:

> Title says it all really. I don't get on well with the fp op's but need to

> calculate correlation coefficients which involve SQRT's.

Here's mine:

; Optimised 32-bit integer sqrt

; Calculates square root of R0

; Result returned in R0, uses R1 and R2

; Worst case execution time exactly 70 S cycles

.intsqr

MOV r1,#0

CMP r1,r0,LSR #8

BEQ bit8 ; &000000xx numbers

CMP r1,r0,LSR #16

BEQ bit16 ; &0000xxxx numbers

CMP r1,r0,LSR #24

BEQ bit24 ; &00xxxxxx numbers

SUBS r2,r0,#&40000000

MOVCS r0,r2

MOVCS r1,#&10000

ADD r2,r1,#&4000

SUBS r2,r0,r2,LSL#14

MOVCS r0,r2

ADDCS r1,r1,#&8000

ADD r2,r1,#&2000

SUBS r2,r0,r2,LSL#13

MOVCS r0,r2

ADDCS r1,r1,#&4000

.bit24

ADD r2,r1,#&1000

SUBS r2,r0,r2,LSL#12

MOVCS r0,r2

ADDCS r1,r1,#&2000

ADD r2,r1,#&800

SUBS r2,r0,r2,LSL#11

MOVCS r0,r2

ADDCS r1,r1,#&1000

ADD r2,r1,#&400

SUBS r2,r0,r2,LSL#10

MOVCS r0,r2

ADDCS r1,r1,#&800

ADD r2,r1,#&200

SUBS r2,r0,r2,LSL#9

MOVCS r0,r2

ADDCS r1,r1,#&400

.bit16

ADD r2,r1,#&100

SUBS r2,r0,r2,LSL#8

MOVCS r0,r2

ADDCS r1,r1,#&200

ADD r2,r1,#&80

SUBS r2,r0,r2,LSL#7

MOVCS r0,r2

ADDCS r1,r1,#&100

ADD r2,r1,#&40

SUBS r2,r0,r2,LSL#6

MOVCS r0,r2

ADDCS r1,r1,#&80

ADD r2,r1,#&20

SUBS r2,r0,r2,LSL#5

MOVCS r0,r2

ADDCS r1,r1,#&40

.bit8

ADD r2,r1,#&10

SUBS r2,r0,r2,LSL#4

MOVCS r0,r2

ADDCS r1,r1,#&20

ADD r2,r1,#&8

SUBS r2,r0,r2,LSL#3

MOVCS r0,r2

ADDCS r1,r1,#&10

ADD r2,r1,#&4

SUBS r2,r0,r2,LSL#2

MOVCS r0,r2

ADDCS r1,r1,#&8

ADD r2,r1,#&2

SUBS r2,r0,r2,LSL#1

MOVCS r0,r2

ADDCS r1,r1,#&4

ADD r2,r1,#&1

CMP r0,r2

ADDCS r1,r1,#&2

MOV r0,r1,LSR#1 ; result in R0

Optimised by myself, but I dunno where I got the original from.

If by 'integer' you meant a fixed point routine, then I've got one of them

too.

Cheers,

--

Dan "Mech" Maloney.

__ _______ ______ __

/ |/ / __/ ___/ /_/ / # Mechanoid the Haemorroidal Android

/ /|_/ / _// /__/ __ / # Disclaimer: I may have been wrong deliberately

/_/ /_/___/\___/_/ /_/ # mailto:mech...@usa.net

---

* Origin: ArgoNet, but does not reflect its views (2:246/1416.500)

Oct 5, 1998, 3:00:00 AM10/5/98

to

David Chester <Da...@chescam.demon.co.uk> writes:

>Anyone know where I can get hold of a Integer Square root algorithm for ARM assembler.

Here's what I use, in C. It returns floor(sqrt(n)):

==============================================================================

typedef unsigned int u32;

u32 isqrt(const u32 n);

/*-- isqrt -----------------------------------------------------------------*/

u32 isqrt(u32 n) {

u32 s, t;

#define sqrtBit(k) \

t = s+(1UL<<(k-1)); t <<= k+1; if (n >= t) { n -= t; s |= 1UL<<k; }

s = 0UL;

#ifdef __alpha

if (n >= 1UL<<62) { n -= 1UL<<62; s = 1UL<<31; }

sqrtBit(30); sqrtBit(29); sqrtBit(28); sqrtBit(27); sqrtBit(26);

sqrtBit(25); sqrtBit(24); sqrtBit(23); sqrtBit(22); sqrtBit(21);

sqrtBit(20); sqrtBit(19); sqrtBit(18); sqrtBit(17); sqrtBit(16);

sqrtBit(15);

#else

if (n >= 1UL<<30) { n -= 1UL<<30; s = 1UL<<15; }

#endif

sqrtBit(14); sqrtBit(13); sqrtBit(12); sqrtBit(11); sqrtBit(10);

sqrtBit(9); sqrtBit(8); sqrtBit(7); sqrtBit(6); sqrtBit(5);

sqrtBit(4); sqrtBit(3); sqrtBit(2); sqrtBit(1);

if (n > s<<1) s |= 1UL;

#undef sqrtBit

return s;

} /* end isqrt */

==============================================================================

I compiled it with a cross-compiler from x86 Linux to ARM, running the

compiler on Alpha Linux emulating an x86 (!):

==============================================================================

>/usr/local/gnuarm/arm-unknown-coff/bin/gcc -O2 -fomit-frame-pointer isqrt.c -o isqrt.s -S

>cat isqrt.s

@ Generated by gcc 2.8.1 for ARM/coff

.text

.align 0

.global _isqrt

_isqrt:

@ args = 0, pretend = 0, frame = 0

@ frame_needed = 0, current_function_anonymous_args = 0

mov r2, #0

cmn r0, #-1073741823

addhi r0, r0, #-1073741824

movhi r2, #32768

.L2:

add r3, r2, #8192

mov r3, r3, asl #15

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #16384

.L3:

add r3, r2, #4096

mov r3, r3, asl #14

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #8192

.L4:

add r3, r2, #2048

mov r3, r3, asl #13

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #4096

.L5:

add r3, r2, #1024

mov r3, r3, asl #12

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #2048

.L6:

add r3, r2, #512

mov r3, r3, asl #11

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #1024

.L7:

add r3, r2, #256

mov r3, r3, asl #10

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #512

.L8:

add r3, r2, #128

mov r3, r3, asl #9

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #256

.L9:

add r3, r2, #64

mov r3, r3, asl #8

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #128

.L10:

add r3, r2, #32

mov r3, r3, asl #7

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #64

.L11:

add r3, r2, #16

mov r3, r3, asl #6

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #32

.L12:

add r3, r2, #8

mov r3, r3, asl #5

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #16

.L13:

add r3, r2, #4

mov r3, r3, asl #4

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #8

.L14:

add r3, r2, #2

mov r3, r3, asl #3

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #4

.L15:

add r3, r2, #1

mov r3, r3, asl #2

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #2

.L16:

cmp r0, r2, asl #1

movls r0, r2

orrhi r0, r2, #1

mov pc, lr

==============================================================================

You could do the obvious optimisation replacing:

add r3, r2, #8192

mov r3, r3, asl #15

cmp r0, r3

rsbcs r0, r3, r0

orrcs r2, r2, #16384

with:

add r3, r2, #8192

cmp r0, r3, asl #15

subcs r0, r0, r3, asl #15

orrcs r2, r2, #16384

and so on, for 4 cycles per bit.

Bye,

Rob.

Oct 5, 1998, 3:00:00 AM10/5/98

to

> Hi,

> Anyone know where I can get hold of a Integer Square root

> algorithm for ARM assembler.

I find this piece of C is optimal, in the sense that NorCroft C on the ARM

produces exactly the assembler I'd expect.

static __pure unsigned int isqrt(unsigned int i)

{

int max;

int j;

unsigned int a, c, d, s;

a=0; c=0;

if (i>=0x10000) max=15; else max=7;

s = 1<<(max*2);

j=max;

do {

d = c + (a<<(j+1)) + s;

if (d<=i) { c=d; a |= 1<<j; };

if (d!=i) { s>>=2; j--; }

} while ((j>=0)&&(d!=i));

return a;

}

You may want to remove the __pure, and tell your C compiler some other way

that this is a pure function.

you may also wish to unroll this...

cheers,Rich.

--

Rich Walker: r...@shadow.org.uk (Shadow Robot Project)

http://www.shadow.org.uk 251 Liverpool Road

+44(0)171 700 2487 London N1 1LX

"Sometimes after an electrical storm I see in 5 dimensions"

-- Cornfed Pig, Duckman.

Oct 6, 1998, 3:00:00 AM10/6/98

to

In article <2c76468f48%Da...@chescam.demon.co.uk>,

David Chester <Da...@chescam.demon.co.uk> wrote:

> Hi,

> Anyone know where I can get hold of a Integer Square root algorithm for ARM

assembler.

David Chester <Da...@chescam.demon.co.uk> wrote:

> Hi,

> Anyone know where I can get hold of a Integer Square root algorithm for ARM

assembler.

In article 1996/06/20 "3 cycle/bit square root (algorithm & code)" I wrote:

<Derivation of the algorithm removed>

; IN : n 32 bit unsigned integer

; OUT: root = INT (SQRT (n))

; TMP: offset

MOV offset, #3 << 30

MOV root, #1 << 30

[ unroll for i = 0 .. 15

CMP n, root, ROR #2 * i

SUBHS n, n, root, ROR #2 * i

ADC root, offset, root, LSL #1

]

BIC root, root, #3 << 30 ; for rounding add: CMP n, root ADC root, #1

This routine takes 3 * 16 + 3 = 51 cycles for a root of a 32 bit integer.

Of course, there are many other possibilities. The routine can be used with

some extra code which uses early termination if n is less than 32 bit. Also

fixed point versions are possible, or roots of 64 bit numbers. Even looped

versions are no problem: Shift in 8 or 16 bits at a time,

so you need only to unroll 4 or 8 times (overhead 6 cycles).

----

I also found some C code that is optimized for ARM (4 cycle per bit):

typedef unsigned uint32;

#define iter1(N) \

try = root + (1 << (N)); \

if (n >= try << (N)) \

{ n -= try << (N); \

root |= 2 << (N); \

}

uint32 sqrt (uint32 n)

{

uint32 root = 0, try;

iter1 (15); iter1 (14); iter1 (13); iter1 (12);

iter1 (11); iter1 (10); iter1 ( 9); iter1 ( 8);

iter1 ( 7); iter1 ( 6); iter1 ( 5); iter1 ( 4);

iter1 ( 3); iter1 ( 2); iter1 ( 1); iter1 ( 0);

return root >> 1;

}

> Oh yeah, and where is the FAQ for this newsgroup kept?

There isn't one - maybe we need to create one?

Wilco

-----------== Posted via Deja News, The Discussion Network ==----------

http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

Oct 6, 1998, 3:00:00 AM10/6/98

to

In article <48904B3108%r...@shadow.org.uk>,

r...@shadow.org.uk wrote:

> In message <2c76468f48%Da...@chescam.demon.co.uk>

> David Chester <Da...@chescam.demon.co.uk> wrote:

>

> > Hi,

> > Anyone know where I can get hold of a Integer Square root

> > algorithm for ARM assembler.

>

r...@shadow.org.uk wrote:

> In message <2c76468f48%Da...@chescam.demon.co.uk>

> David Chester <Da...@chescam.demon.co.uk> wrote:

>

> > Hi,

> > Anyone know where I can get hold of a Integer Square root

> > algorithm for ARM assembler.

>

> I find this piece of C is optimal, in the sense that NorCroft C on the ARM

> produces exactly the assembler I'd expect.

>

> static __pure unsigned int isqrt(unsigned int i)

> {

> int max;

> int j;

> unsigned int a, c, d, s;

> a=0; c=0;

> if (i>=0x10000) max=15; else max=7;

> s = 1<<(max*2);

> j=max;

> do {

> d = c + (a<<(j+1)) + s;

> if (d<=i) { c=d; a |= 1<<j; };

> if (d!=i) { s>>=2; j--; }

> } while ((j>=0)&&(d!=i));

> return a;

> }

> produces exactly the assembler I'd expect.

>

> static __pure unsigned int isqrt(unsigned int i)

> {

> int max;

> int j;

> unsigned int a, c, d, s;

> a=0; c=0;

> if (i>=0x10000) max=15; else max=7;

> s = 1<<(max*2);

> j=max;

> do {

> d = c + (a<<(j+1)) + s;

> if (d<=i) { c=d; a |= 1<<j; };

> if (d!=i) { s>>=2; j--; }

> } while ((j>=0)&&(d!=i));

> return a;

> }

You may want to replace it with this, it uses the same early

termination, but takes only 9 cycles per bit...

typedef unsigned int uint32;

uint32 isqrt3(uint32 n)

{

uint32 root = 0, bit, trial;

bit = (n >= 0x10000) ? 1<<30 : 1<<14;

do

{

trial = root+bit;

if (n >= trial)

{

n -= trial;

root = trial+bit;

}

root >>= 1;

bit >>= 2;

} while (bit);

return root;

Oct 7, 1998, 3:00:00 AM10/7/98

to

Wilco.D...@arm.com wrote:

: In article <2c76468f48%Da...@chescam.demon.co.uk>,

: David Chester <Da...@chescam.demon.co.uk> wrote:

: > Oh yeah, and where is the FAQ for this newsgroup kept?

: There isn't one - maybe we need to create one?

: In article <2c76468f48%Da...@chescam.demon.co.uk>,

: David Chester <Da...@chescam.demon.co.uk> wrote:

: > Oh yeah, and where is the FAQ for this newsgroup kept?

: There isn't one - maybe we need to create one?

With that in mind, and so that next time this question comes up there

will be a reference point, I've taken every code suggestion in this

thread and edited them together to make a document on integer square

root routines for ARM processors. I've even dug up Wilco's original

article on his 3 cycle/bit square root routine which shows his

derivation.

You can find it at http://www.finesse.demon.co.uk/steven/sqrt.html

This does not mean I'm volunteering to produce a FAQ, but if I get some

positive feedback I may be persuaded to similarly archive other code

snippets.

- Steven

Oct 11, 1998, 3:00:00 AM10/11/98

to

Many thanks people,

setting up a FAQ would probably be a good Idea, maby some pointers to sites with arm code on?

--

David Chester

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu