FFS and FLS words

閲覧: 49 回
最初の未読メッセージにスキップ

The Beez

未読、
2017/09/09 12:56:202017/09/09
To: 4tH-compiler
Hi 4tH-ers!

A recent discussion on c.l.f. brought it to my attention that in some circumstances BITSET.4TH may be too slow. If you don't remember, these are implementations of BSD functions ffs() and fls(), dealing with the first or the last bit set in in a cell.

After studying several functions I didn't really find what I was looking for: a fast function that was easily extended to 64 bit. So I decided to design my own. The new one is completely unrolled and only uses additions and bitwise words. It works bytewise and uses an array to get the bit inside the byte.

I tested all positive numbers (zero included) to make sure it was absolutely identical and then I benchmarked it. Results:

habe@linux-471m:~/4th.dev> time pp4th -x ffsfls.4th (lastbit old)

real    1m40.189s
user    1m40.014s
sys     0m0.072s
habe@linux-471m:~/4th.dev> time pp4th -x ffsfls.4th (lastbit new)

real    0m7.975s
user    0m7.968s
sys     0m0.008s
habe@linux-471m:~/4th.dev>

habe@linux-471m:~/4th.dev> time pp4th -x ffsfls.4th (firstbit old)

real    0m11.172s
user    0m11.169s
sys     0m0.000s
habe@linux-471m:~/4th.dev> time pp4th -x ffsfls.4th (firstbit new)

real    0m6.844s
user    0m6.840s
sys     0m0.000s
habe@linux-471m:~/4th.dev>

You got it right: the performance of the old LASTBIT was abysmal. I managed to speed it up 3 fold by designing a new one. You see, the old one shifted right until only the number "1" was left. The new one works the other way around: it shifts left until the high bit is set. The new ones though do 100 million tests in about 7 seconds, another 2-4 times speedup.

Maybe I'll try some other stuff, just for fun. In the meanwhile, I think a 12 fold speedup for LASTBIT isn't half bad..

Hans Bezemer
全員に返信
投稿者に返信
転送
新着メール 0 件