Bit Scan Reverse in Racket

30 views
Skip to first unread message

Dominik Pantůček

unread,
Feb 25, 2021, 3:52:38 AM2/25/21
to Racket Users
Hello Racketeers,

I'm slightly stuck with speeding up some calculations and the reason is
that I need to compute the index of highest set bit in a number. So for
1 it is 0, for 2 it is 1, for 8 3, 1023 9 and 1024 10 ...

The fastest Racket code I can come up with is as follows:

(begin-encourage-inline
(define (highest-bit num)
(let loop ((num num)
(bit 0))
(if (unsafe-fx> num 0)
(loop (unsafe-fxrshift num 1)
(unsafe-fx+ bit 1))
bit))))

(Yes, it returns incorrect result for 0, but that must be checked
elsewhere as the result cannot be defined for 0 anyway).

However this is a single instruction operation on x86 (bsr) or two
instruction operation (rbit and clz if I am not mistaken) on ARM. Dunno
about PPC yet.

I looked at CS internals a bit and although there is a "name collision"
as the bsr mnemonics is used for ret (branch subroutine return?), it
should be fairly easy to add something like fxbsr (-> fixnum? fixnum?)
to Racket core.

I also experimented with x64asm package but the results were even worse
than the aforementioned tight loop (there is a lot of overhead in
define-λ! generated functions).

As the routine is typically called 40k-60k times per frame in my code
(real-time rendering) it could really make a difference.

Should I try to add it? How should it be named? What about BC?

And more generic question - aren't there more similar operations that
can be implemented efficiently on all supported CPUs that might be
useful in general? Yes, I am aiming at SIMD as many of you know :)
Especially with the expanded number of available FP registers to 8 on
64-bit x86 CS there surely is quite some potential to it.


Cheers,
Dominik

Jens Axel Søgaard

unread,
Feb 25, 2021, 4:01:38 AM2/25/21
to Dominik Pantůček, Racket Users

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/a2a4fe83-877d-d7ed-4812-bd628c128659%40trustica.cz.


--
--
Jens Axel Søgaard

Dominik Pantůček

unread,
Feb 25, 2021, 6:26:37 AM2/25/21
to racket...@googlegroups.com


On 25. 02. 21 10:01, Jens Axel Søgaard wrote:
> Try integer-length.
>
> https://docs.racket-lang.org/reference/generic-numbers.html?q=integer-length#%28def._%28%28quote._~23~25kernel%29._integer-length%29%29

Oh, I completely missed this one. Thank you!!!

>
> I don't know how it is implemented.

And now I do. On CS it is implemented in terms of fxlength, which is a
SW implementation, but still an order of magnitude faster than pure
Racket implementation.

So for my project, integer-length is an improvement I wanted. But the
general question remains. Instead of almost 100 instructions, it could
be just one or two. I'll look into it later on if no-one else finds it
interesting enough to fiddle with cpnanopass.ss and x86_64.ss ;-)


Cheers,
Dominik
Reply all
Reply to author
Forward
0 new messages