ARM7TDMI leading zero count source code

Phil

unread,

Jan 24, 2002, 5:32:40 AM1/24/02

to

Does anyone have any fast source to calculate the number of leading zeros
(like the v5 clz instruction) on ARM7TDMI ?

Any suggestions how to do this are welcome.

Thanks,

Phil

Nemo

unread,

Jan 24, 2002, 7:58:16 AM1/24/02

to

On 24 Jan, Phil wrote:

> Does anyone have any fast source to calculate the number of leading zeros
> (like the v5 clz instruction) on ARM7TDMI ?

The not-particularly-clever version:

; > R0: bitfield to examine
; < R0: number of high zeroes
; R1,flags: corrupted

HighZeroes MOVS R1, R0
MOVEQ R0, #32
MOVMI R0, #0
MOVLE PC, R14

MOVS R0, R1, LSR#16
MOVEQ R1, R1, LSL#16
MOVEQ R0, #16
MOVNE R0, #0

TST R1, #255<<24
MOVEQ R1, R1, LSL#8
ADDEQ R0, R0, #8

TST R1, #15<<28
MOVEQ R1, R1, LSL#4
ADDEQ R0, R0, #4

TST R1, #3<<30
MOVEQ R1, R1, LSL#2
ADDEQ R0, R0, #2

TST R1, #1<<31
ADDEQ R0, R0, #1
MOV PC, R14

--
Nemo ne...@20000.org

Message has been deleted

Phil

unread,

Jan 24, 2002, 6:50:06 PM1/24/02

to

Are you sure the ffs below is correct ? I tried it with something like
0x85400000 as argument and it gave 26 as result ?
Inserting 0x80000000 returns 12...

"Bill Pringlemeir" <bpring...@yahoo.com> wrote in message
news:uelkfv...@yahoo.com...

>
> >>>> On 24 Jan, Phil wrote:
> >> Does anyone have any fast source to calculate the number of
> >> leading zeros (like the v5 clz instruction) on ARM7TDMI ?
>

> >>>>> "Nemo" == Nemo <ne...@20000.org> writes:
> Nemo> The not-particularly-clever version:
>
> I have a very similar function.
>
> /* fls (find last set) function: Same as log2.
> * on entry R0 is the number you want to find the last bit of
> * on exit R0 contains either 0 if the number was 0, or 1-32 for
> * the bit set R1 is trashed.
> */
> _fls: sub r1,r1,r1
> cmp r0,#0x8000
> addhi r1,r1,#16
> movhi r0,r0, lsr #16
> cmp r0,#0x80
> addhi r1,r1,#8
> movhi r0,r0, lsr #8
> cmp r0,#0x8
> addhi r1,r1,#4
> movhi r0,r0, lsr #4
> cmp r0,#0x2
> addhi r1,r1,#2
> movhi r0,r0, lsr #2
> cmp r0,#0x1
> addhi r1,r1,#1
> movhi r0,r0, lsr #1
> add r0,r0,r1
> mov pc, lr
>
> I think that you could do something clever by using an add with shift
> and flag set (ie. addhis r0,r1,r0,rsr #8) to combine the compare (or
> tst) add and mov instructions, for at least part of the algorithm. For
> example the accumulated bit count could be done in the lower
>
> Also, this function finds the last bit. If there is some way to
> quickly change a value from 000..0001xxx.. -> 000..0001000.. then you
> can use the code below. However, I couldn't come up with anything
> when I thought about it briefly.
>
> /* ffs (find first set) function:
> * on entry R0 is the number you want to find the first bit of
> * on exit R0 contains either 0 if the number was 0, or 1-32 for
> * the bit set R1 is trashed.
> */
> _ffs:
> rsb r1, r0, #0
> ands r1, r1, r0 /* r1 = X with 0 or 1 bits set
*/
>
> /* R1 * 0x4d7651fUL */
> addne r0, r1, r1, lsl #2
> rsbne r0, r0, r0, lsl #11
> addne r1, r1, r0, lsl #8
> rsbne r1, r1, r1, lsl #5
>
> ldrneb r0, [pc, r1, lsr #27] /* result taken from top 5 bits
*/
>
> mov pc, lr
>
> Lffs_table:
> .byte 0, 1, 2, 4, 9, 19, 6, 13
> .byte 26, 21, 11, 23, 14, 29, 27, 22
> .byte 12, 25, 18, 5, 10, 20, 8, 17
> .byte 3, 7, 15, 31, 30, 28, 24, 16
>
> This is from some code shamelessly stolen from this newsgroup.
>
> regards,
> Bill Pringlemeir.
>
> --
> All Polish dudes should get dogs , huh? If you've seen one mental
> problem, you've seen them all, right?

cal...@adresa.je.u.sigu.com

unread,

Jan 24, 2002, 7:44:11 PM1/24/02

to

Bill Pringlemeir <bpring...@yahoo.com> wrote:
BP> Also, this function finds the last bit. If there is some way to
BP> quickly change a value from 000..0001xxx.. -> 000..0001000.. then you
BP> can use the code below. However, I couldn't come up with anything
BP> when I thought about it briefly.

Maybe AND with 0xFFFFFFF8?? I mean, your example AND 0xFFFFFFF8?

Could that work?

I must say that I didn't look at the code, but I will now... :) Maybe I said
something stupid trying to help here... ;))))))))

--
"Maslinovs li majmuncicu bacu ?" upita Hrvato lize Hitleru crtu.
"Nisam ja nikog bombardiro !" rece Dzonia drka "Ja samo Amerikankau masiru sretanm !" By runf

Damir Lukic
cal...@fly.srk.fer.hr

cal...@adresa.je.u.sigu.com

unread,

Jan 24, 2002, 7:41:10 PM1/24/02

to

Phil <ph...@miracle-designs-removeme.com> wrote:
P> Does anyone have any fast source to calculate the number of leading zeros
P> (like the v5 clz instruction) on ARM7TDMI ?

:))))

We had this on exam! :)) Not on ARM, but on FRISC (don't ask, CPU developed
at our faculty)... :))

And, guess what, I didn't get any points on that, just because they didn't
like it! :))))) Nevermind, I have other exam (ARM involved) on saturday...
:))

Keep your fingers crossed, people! :))

--
Bugaro jebe imbecilan u svlacionici psihijataro cvrkuce
svakih 15 minuta. By runf

Damir Lukic
cal...@fly.srk.fer.hr

Message has been deleted

Phil

unread,

Jan 26, 2002, 5:22:33 AM1/26/02

to

Thanks for your help guys!

I went for the binary approach... About 18 instructions for clz (including
0)

"Bill Pringlemeir" <bpring...@yahoo.com> wrote in message

news:u4rlbt...@yahoo.com...
> Phil> Are you sure the ffs below is correct ? I tried it with
> Phil> something like 0x85400000 as argument and it gave 26 as result
> Phil> ? Inserting 0x80000000 returns 12...
>
> Bill> Sorry, I thought you meant the fls code. It was newer. I did
> Bill> several values with the ffs routine. 0x04d7651fUL is suppose
> Bill> to be a perfect hash of the set 0x00000001, 0x00000002,
> Bill> 0x00000004, ..., 0x40000000, 0x80000000. Either my constant
> Bill> multiplication is wrong or I have messed up the table.

>
> >> Lffs_table:
> >> .byte 0, 1, 2, 4, 9, 19, 6, 13
> >> .byte 26, 21, 11, 23, 14, 29, 27, 22
> >> .byte 12, 25, 18, 5, 10, 20, 8, 17
> >> .byte 3, 7, 15, 31, 30, 28, 24, 16
>

> In case you care, this table should work with the same constant
> multiplier. Thanks for pointing that out. I did this by hand, so
> there might be more errors! I haven't checked it programmatically yet.
>
> Lffs_table:
> .byte 0, 1, 2, 24, 3, 19, 6, 25
> .byte 22, 4, 20, 10, 16, 7, 12, 26
> .byte 31, 23, 18, 5, 21, 9, 15, 11
> .byte 30, 17, 8, 14, 29, 13, 28, 27
>
> regards,
> Bill Pringlemeir.
>
> --
> The goal of science is to inspire the mind.

Arjan Bakker

unread,

Jan 27, 2002, 9:08:03 AM1/27/02

to

Try this one:

@r0 is reg to clz
clz:
movs r1,r0,lsr #16
moveq r1,r0
moveq r0,#16
movne r0,#0
movs r2,r1,lsr #8
addeq r0,r0,#8
moveq r2,r1
movs r1,r2,lsr #4
addeq r0,r0,#4
moveq r1,r2
movs r2,r1,lsr #2
addeq r0,r0,#2
moveq r2,r1
movs r1,r2,lsr #1
addeq r0,r0,#1
subne r0,r0,#1
addcc r0,r0,#1

Vadim Borshchev

unread,

Jan 27, 2002, 11:59:36 AM1/27/02

to

Try the following code. Compile to assembly output and do your best
optimizing it :)

/* Count leading zeros */

static const unsigned clz_magic = 0x7dcd629;
static const char clz_table[] = {
0, 31, 9, 30, 3, 8, 18, 29,
2, 5, 7, 14, 12, 17, 22, 28,
1, 10, 4, 19, 6, 15, 13, 23,
11, 20, 16, 24, 21, 25, 26, 27
};

unsigned clz (unsigned x) {
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
return x ? clz_table[((clz_magic * x) + clz_magic) >> 27] : 32;
}

/* Count trailing zeros */

static const unsigned ctz_magic = 0xfb9ac52;
static const char ctz_table[] = {
31, 0, 22, 1, 28, 23, 13, 2,
29, 26, 24, 17, 19, 14, 9, 3,
30, 21, 27, 12, 25, 16, 18, 8,
20, 11, 15, 7, 10, 6, 5, 4
};

unsigned ctz (unsigned x) {
return (x &= -x) ? ctz_table[(x * ctz_magic) >> 27] : 32;
}

Sincerely,
Vadim Borshchev

Adam SPeight

unread,

Jan 31, 2002, 7:12:40 PM1/31/02

to

Nemo <ne...@20000.org> wrote in message news:<da54dbfd4a%ne...@20000.org>...

3 2 1
10987654321098765432109876543210

-> r0 number
<- r0 =number of leading zero bits
r1 =0
Compact version

MOV R1,R0
MOV r0,#0
.loop MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
BCC loop

unrolled version

MOV r1,r0
MOV R0,#0
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1