Hi 4tH-ers!
A recent discussion on c.l.f. brought it to my attention that in some circumstances BITSET.4TH may be too slow. If you don't remember, these are implementations of BSD functions ffs() and fls(), dealing with the first or the last bit set in in a cell.
After studying several functions I didn't really find what I was looking for: a fast function that was easily extended to 64 bit. So I decided to design my own. The new one is completely unrolled and only uses additions and bitwise words. It works bytewise and uses an array to get the bit inside the byte.
I tested all positive numbers (zero included) to make sure it was absolutely identical and then I benchmarked it. Results:
habe@linux-471m:~/
4th.dev> time pp4th -x ffsfls.4th (lastbit old)
real 1m40.189s
user 1m40.014s
sys 0m0.072s
habe@linux-471m:~/
4th.dev> time pp4th -x ffsfls.4th (lastbit new)
real 0m7.975s
user 0m7.968s
sys 0m0.008s
habe@linux-471m:~/
4th.dev>
habe@linux-471m:~/
4th.dev> time pp4th -x ffsfls.4th (firstbit old)
real 0m11.172s
user 0m11.169s
sys 0m0.000s
habe@linux-471m:~/
4th.dev> time pp4th -x ffsfls.4th (firstbit new)
real 0m6.844s
user 0m6.840s
sys 0m0.000s
habe@linux-471m:~/
4th.dev>
You got it right: the performance of the old LASTBIT was abysmal. I managed to speed it up 3 fold by designing a new one. You see, the old one shifted right until only the number "1" was left. The new one works the other way around: it shifts left until the high bit is set. The new ones though do 100 million tests in about 7 seconds, another 2-4 times speedup.
Maybe I'll try some other stuff, just for fun. In the meanwhile, I think a 12 fold speedup for LASTBIT isn't half bad..
Hans Bezemer