Rules about optimisation are often misunderstood - perhaps because
"optimisation" is such a general term, and is used for everything
between enabling a compiler switch to algorithmic changes to fine-tuning
hand-written assembly routines.
The first rule of optimising is from Knuth - "Premature optimisation is
the root of all evil." The emphasis is on /premature/. That
encompasses not putting effort into optimisation that is not needed -
but equally, it allows for optimisation when you know that it /is/ needed.
Measurement is useful - invaluable, even - when you are comparing
different solutions, finding bottlenecks, or identifying which changes
made which effects. But it is not necessary when you can get a good
enough estimation of the scales involved. If you want to plan a route
across a country, you might want to calculate carefully the times for
driving, trains, or taking a plane - but you can already rule out
walking from the start as a poor "algorithm", and also rule out rockets
and submarines for various reasons. You don't need any sort of
measurements to start with.
The OP here says specifically that he needs to do lots of square roots,
and needs them fast, with small integers. Until I know otherwise, I
assume he has a reasonable idea of what he is doing, and can have a
reasonable estimate that a software-only double-precision IEEE accuracy
full range square root function is going to be too slow. It makes sense
to think about this from the start, not after benchmarking and
measurement - it could easily be a make-or-break issue for the algorithm
and his whole project. I assume that the OP /does/ have useful data,
such as the number of square roots per second and the clock speed of the
device, even if he does not have /all/ the relevant data.
There are also plenty of optimisations that are very "cheap" to do - and
in general, should /always/ be in place. Basic compiler optimisation -
at least "-O1", but usually "-O2" or "-Os", should be standard. It
costs nothing, gives better code (assembly code with "-O1" is usually a
lot easier to understand than code from "-O0"), and enables better
static warning and error checking.
If you are using floating point, "-ffast-math" should also be standard
for most uses - unless you /really/ need full IEEE support.
You don't need to measure anything before using these - just as you
don't need to time your car journeys before going out of first gear.
And there is also the common misconception that "optimised code" is
difficult to understand, difficult to write, and difficult to debug.
Sometimes that is the case - often it is not. The OP needs the square
roots of small integers - the first idea that comes to my mind is a
lookup table, which is pretty simple.