I have done this (or equivalent) in my own code, because it was the most
efficient algorithm I could find to achieve calculation of integral
square roots.
The reason is that floating point square roots have hardware support and
execute (at least in my environment) considerably quicker than anything
else I found.
The downside is that due to resolution issues, the floating point square
root is NOT always equal to the required integral square root, so
FURTHER PROCESSING IS REQUIRED TO TEST FOR THIS AND CORRECT IF REQUIRED.
[The floating result was sometimes out by one from the correct
result.] Even so, in timed tests this (with the follow-up corrections)
was clearly quicker than any other approach I looked at.
(So, it /might/ be good, but probably only as part of some larger
routine that handles any required corrections. Maybe on other
architectures or with other argument sizes the correction would not be
required. My environment was Windows 10 on a DELL laptop with Intel i7,
calculating square roots for 64-bit arguments. No, I don't think I have
the timing results any more...)
Mike.