Yes, on my typical usage I know exactly what cpu the code will run on,
and can tune precisely.
For PC software that is delivered in source form, then this will also
work - it is up to the user to have flags suitable for their system.
(You could put -march=native in makefiles, which is the ideal choice for
people using software on the same system as they use to build it, but
makes it less portable to other systems.)
For software delivered as binaries, it's a different matter. The best
you can usually do is guess a minimal requirement for typical users.
And if you have code that is particularly performance sensitive, and
particularly affected by the exact processor capabilities (such as
POPCNT instruction or SIMD versions), then you can use gcc "function
multiversioning" and the "target_clones" function attribute to
automatically generate several versions of a function, tuned to
different processors. This adds a layer of indirection to the call,
which spoils some of the benefits of having a fast popcnt instruction -
it is best used on a function that does a bit more work, such as one
that calls popcount() in a loop. But it is simple to use:
__attribute__((target_clones("default,popcnt")))
int popcount(unsigned int x) {
return __builtin_popcount(x);
}
Then you just use "popcount()" as a normal function in your code, and
you'll get either a generic version (using a lookup table, I think) or a
popcnt instruction if your cpu supports it.
>
>> The important point is not to try to figure out clever tricks in your
>> own code, but to work /with/ the compiler as best you can. "Clever"
>> tricks have a tendency to work against the compiler, perhaps relying on
>> undefined behaviour (though not in this case as far as I have noticed)
>> or limit its ability to do wider optimisations such as constant
>> propagation. The place for clever tricks these days is in the compiler,
>> rather than in the user code.
>
> That is generally true, and probably true in this case as well because a
> factor of two (as here) is not much of a price to pay in most software
> that will use a popcount function. It's fast even when it's not the
> fastest.
>
> There's also the problem that you might be writing code that can't
> assume a particular compiler. It's then not going to be 100% clear what
> working with the compiler really means.
>
I would say "working with the compiler" means something like :
#if defined(__GNUC__)
// Optimised version using gcc extensions
#elif defined(_MSC_VER)
// Optimised version using MSVC extensions
#elif ...
#else
// Fallback generic version
#endif
You can't tune for every compiler, but you might want to tune for (or
"work with") specific compilers.