Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ARM7TDMI leading zero count source code

121 views
Skip to first unread message

Phil

unread,
Jan 24, 2002, 5:32:40 AM1/24/02
to
Does anyone have any fast source to calculate the number of leading zeros
(like the v5 clz instruction) on ARM7TDMI ?

Any suggestions how to do this are welcome.

Thanks,

Phil


Nemo

unread,
Jan 24, 2002, 7:58:16 AM1/24/02
to
On 24 Jan, Phil wrote:

> Does anyone have any fast source to calculate the number of leading zeros
> (like the v5 clz instruction) on ARM7TDMI ?

The not-particularly-clever version:


; > R0: bitfield to examine
; < R0: number of high zeroes
; R1,flags: corrupted

HighZeroes MOVS R1, R0
MOVEQ R0, #32
MOVMI R0, #0
MOVLE PC, R14

MOVS R0, R1, LSR#16
MOVEQ R1, R1, LSL#16
MOVEQ R0, #16
MOVNE R0, #0

TST R1, #255<<24
MOVEQ R1, R1, LSL#8
ADDEQ R0, R0, #8

TST R1, #15<<28
MOVEQ R1, R1, LSL#4
ADDEQ R0, R0, #4

TST R1, #3<<30
MOVEQ R1, R1, LSL#2
ADDEQ R0, R0, #2

TST R1, #1<<31
ADDEQ R0, R0, #1
MOV PC, R14

--
Nemo ne...@20000.org

Message has been deleted

Phil

unread,
Jan 24, 2002, 6:50:06 PM1/24/02
to
Are you sure the ffs below is correct ? I tried it with something like
0x85400000 as argument and it gave 26 as result ?
Inserting 0x80000000 returns 12...


"Bill Pringlemeir" <bpring...@yahoo.com> wrote in message
news:uelkfv...@yahoo.com...


>
> >>>> On 24 Jan, Phil wrote:
> >> Does anyone have any fast source to calculate the number of
> >> leading zeros (like the v5 clz instruction) on ARM7TDMI ?
>

> >>>>> "Nemo" == Nemo <ne...@20000.org> writes:
> Nemo> The not-particularly-clever version:
>
> I have a very similar function.
>
> /* fls (find last set) function: Same as log2.
> * on entry R0 is the number you want to find the last bit of
> * on exit R0 contains either 0 if the number was 0, or 1-32 for
> * the bit set R1 is trashed.
> */
> _fls: sub r1,r1,r1
> cmp r0,#0x8000
> addhi r1,r1,#16
> movhi r0,r0, lsr #16
> cmp r0,#0x80
> addhi r1,r1,#8
> movhi r0,r0, lsr #8
> cmp r0,#0x8
> addhi r1,r1,#4
> movhi r0,r0, lsr #4
> cmp r0,#0x2
> addhi r1,r1,#2
> movhi r0,r0, lsr #2
> cmp r0,#0x1
> addhi r1,r1,#1
> movhi r0,r0, lsr #1
> add r0,r0,r1
> mov pc, lr
>
> I think that you could do something clever by using an add with shift
> and flag set (ie. addhis r0,r1,r0,rsr #8) to combine the compare (or
> tst) add and mov instructions, for at least part of the algorithm. For
> example the accumulated bit count could be done in the lower
>
> Also, this function finds the last bit. If there is some way to
> quickly change a value from 000..0001xxx.. -> 000..0001000.. then you
> can use the code below. However, I couldn't come up with anything
> when I thought about it briefly.
>
> /* ffs (find first set) function:
> * on entry R0 is the number you want to find the first bit of
> * on exit R0 contains either 0 if the number was 0, or 1-32 for
> * the bit set R1 is trashed.
> */
> _ffs:
> rsb r1, r0, #0
> ands r1, r1, r0 /* r1 = X with 0 or 1 bits set
*/
>
> /* R1 * 0x4d7651fUL */
> addne r0, r1, r1, lsl #2
> rsbne r0, r0, r0, lsl #11
> addne r1, r1, r0, lsl #8
> rsbne r1, r1, r1, lsl #5
>
> ldrneb r0, [pc, r1, lsr #27] /* result taken from top 5 bits
*/
>
> mov pc, lr
>
> Lffs_table:
> .byte 0, 1, 2, 4, 9, 19, 6, 13
> .byte 26, 21, 11, 23, 14, 29, 27, 22
> .byte 12, 25, 18, 5, 10, 20, 8, 17
> .byte 3, 7, 15, 31, 30, 28, 24, 16
>
> This is from some code shamelessly stolen from this newsgroup.
>
> regards,
> Bill Pringlemeir.
>
> --
> All Polish dudes should get dogs , huh? If you've seen one mental
> problem, you've seen them all, right?


cal...@adresa.je.u.sigu.com

unread,
Jan 24, 2002, 7:44:11 PM1/24/02
to
Bill Pringlemeir <bpring...@yahoo.com> wrote:
BP> Also, this function finds the last bit. If there is some way to
BP> quickly change a value from 000..0001xxx.. -> 000..0001000.. then you
BP> can use the code below. However, I couldn't come up with anything
BP> when I thought about it briefly.

Maybe AND with 0xFFFFFFF8?? I mean, your example AND 0xFFFFFFF8?

Could that work?

I must say that I didn't look at the code, but I will now... :) Maybe I said
something stupid trying to help here... ;))))))))


--
"Maslinovs li majmuncicu bacu ?" upita Hrvato lize Hitleru crtu.
"Nisam ja nikog bombardiro !" rece Dzonia drka "Ja samo Amerikankau masiru sretanm !" By runf

Damir Lukic
cal...@fly.srk.fer.hr

cal...@adresa.je.u.sigu.com

unread,
Jan 24, 2002, 7:41:10 PM1/24/02
to
Phil <ph...@miracle-designs-removeme.com> wrote:
P> Does anyone have any fast source to calculate the number of leading zeros
P> (like the v5 clz instruction) on ARM7TDMI ?

:))))

We had this on exam! :)) Not on ARM, but on FRISC (don't ask, CPU developed
at our faculty)... :))

And, guess what, I didn't get any points on that, just because they didn't
like it! :))))) Nevermind, I have other exam (ARM involved) on saturday...
:))

Keep your fingers crossed, people! :))


--
Bugaro jebe imbecilan u svlacionici psihijataro cvrkuce
svakih 15 minuta. By runf

Damir Lukic
cal...@fly.srk.fer.hr

Message has been deleted
Message has been deleted
Message has been deleted

Phil

unread,
Jan 26, 2002, 5:22:33 AM1/26/02
to
Thanks for your help guys!

I went for the binary approach... About 18 instructions for clz (including
0)


"Bill Pringlemeir" <bpring...@yahoo.com> wrote in message

news:u4rlbt...@yahoo.com...
> Phil> Are you sure the ffs below is correct ? I tried it with
> Phil> something like 0x85400000 as argument and it gave 26 as result
> Phil> ? Inserting 0x80000000 returns 12...
>
> Bill> Sorry, I thought you meant the fls code. It was newer. I did
> Bill> several values with the ffs routine. 0x04d7651fUL is suppose
> Bill> to be a perfect hash of the set 0x00000001, 0x00000002,
> Bill> 0x00000004, ..., 0x40000000, 0x80000000. Either my constant
> Bill> multiplication is wrong or I have messed up the table.


>
> >> Lffs_table:
> >> .byte 0, 1, 2, 4, 9, 19, 6, 13
> >> .byte 26, 21, 11, 23, 14, 29, 27, 22
> >> .byte 12, 25, 18, 5, 10, 20, 8, 17
> >> .byte 3, 7, 15, 31, 30, 28, 24, 16
>

> In case you care, this table should work with the same constant
> multiplier. Thanks for pointing that out. I did this by hand, so
> there might be more errors! I haven't checked it programmatically yet.
>
> Lffs_table:
> .byte 0, 1, 2, 24, 3, 19, 6, 25
> .byte 22, 4, 20, 10, 16, 7, 12, 26
> .byte 31, 23, 18, 5, 21, 9, 15, 11
> .byte 30, 17, 8, 14, 29, 13, 28, 27
>
> regards,
> Bill Pringlemeir.
>
> --
> The goal of science is to inspire the mind.


Arjan Bakker

unread,
Jan 27, 2002, 9:08:03 AM1/27/02
to

Try this one:

@r0 is reg to clz
clz:
movs r1,r0,lsr #16
moveq r1,r0
moveq r0,#16
movne r0,#0
movs r2,r1,lsr #8
addeq r0,r0,#8
moveq r2,r1
movs r1,r2,lsr #4
addeq r0,r0,#4
moveq r1,r2
movs r2,r1,lsr #2
addeq r0,r0,#2
moveq r2,r1
movs r1,r2,lsr #1
addeq r0,r0,#1
subne r0,r0,#1
addcc r0,r0,#1

Vadim Borshchev

unread,
Jan 27, 2002, 11:59:36 AM1/27/02
to

Try the following code. Compile to assembly output and do your best
optimizing it :)


/* Count leading zeros */

static const unsigned clz_magic = 0x7dcd629;
static const char clz_table[] = {
0, 31, 9, 30, 3, 8, 18, 29,
2, 5, 7, 14, 12, 17, 22, 28,
1, 10, 4, 19, 6, 15, 13, 23,
11, 20, 16, 24, 21, 25, 26, 27
};

unsigned clz (unsigned x) {
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
return x ? clz_table[((clz_magic * x) + clz_magic) >> 27] : 32;
}


/* Count trailing zeros */

static const unsigned ctz_magic = 0xfb9ac52;
static const char ctz_table[] = {
31, 0, 22, 1, 28, 23, 13, 2,
29, 26, 24, 17, 19, 14, 9, 3,
30, 21, 27, 12, 25, 16, 18, 8,
20, 11, 15, 7, 10, 6, 5, 4
};

unsigned ctz (unsigned x) {
return (x &= -x) ? ctz_table[(x * ctz_magic) >> 27] : 32;
}

Sincerely,
Vadim Borshchev

Adam SPeight

unread,
Jan 31, 2002, 7:12:40 PM1/31/02
to
Nemo <ne...@20000.org> wrote in message news:<da54dbfd4a%ne...@20000.org>...

3 2 1
10987654321098765432109876543210

-> r0 number
<- r0 =number of leading zero bits
r1 =0
Compact version

MOV R1,R0
MOV r0,#0
.loop MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
BCC loop


unrolled version

MOV r1,r0
MOV R0,#0
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1
MOV r1,r1,lsl#1
MOVCS pc,r14
ADDCC r0,r0,#1

0 new messages