TL;DR: Challenge: try to beat these integer square root algorithms using pure Go!
So, because I started working on a little (at the moment highly unreliable)
math library, I've been trying out a bunch of different integer square root algorithms. This is the fastest code I've been able to come up with:
func Sqrt(x uint64) uint64 {
// The most significant bit of the square root of x is equal to log2(x)/2.
// Using uint instead of uint64 guarantees native word size, which is
// more than 60% faster on my x86 Netbook.
r := uint(1 << (u64.Log2(x) >> 1))
s := r >> 1
t := r + s
// Try to set a bit in the intermediate result, see if the square of the
// resulting number is smaller than the input. If not, set it as the new
// intermediate result. Shift a bit to the right, repeat.
for s > 0 {
if uint64(t)*uint64(t) <= x {
r = t
}
s >>= 1
t = r + s
}
return uint64(r)
}
This kind of surprised me, being such a simple algorithm, and involving a multiplication. It's the fastest option on an x86 simply because it actually avoids using uint64 most of the time. When that optimisation is taken out (or in the case of int32/uint32) it still appears to tie with
Jack Crenshaw's algorithm, which I adapted as follows:
func Sqrt(x uint64) (r uint64) {
// p starts at the highest power of four less or equal to x
p := uint64(1 << ((Log2(x) >> 1) << 1))
for p > 0 {
if x >= r+p {
x -= r + p
r += p << 1
}
r >>= 1
p >>= 2
}
return
}
However, using the standard math package with casting to/from float64 is still two to three times faster (although I'm not entirely sure if I'm benchmarking in a way that is representative of real world situations - if not, the difference is probably even bigger). The FPU is hard to beat when it comes to square roots. This kind of frustrates me, because with everything else so far I've already managed to beat it :P. Any ideas?
PS: Also, I'm not entirely sure if using
u64.Log2 is really effective as an optimisation with the last algorithm: with evenly distributed numbers, the highest bit is set in 50% of the cases, right? So is it worth it? Should I assume even distribution when designing these algorithms?