The (outer) guard variable doesn't need to be atomic because the worst
that can happen in the case where it doesn't correctly reflect that
initialization has been performed, is that the code goes into needless
mutex locking and deeper checking, which is just an efficiency concern.
However, this can only happen the first few times.
Using an atomic (outer) guard variable would avoid needless rounds of
mutex locking, but accessing that variable can be slower and would then
be a cost incurred on every access of the static. Evidently the gcc devs
decided to not do that. I.e., they /support/ the maximal efficiency for
a general solution, without guaranteeing it, because it can't be
guaranteed against all kinds of user code.
For most calls, in practice all calls after the first, the overhead is
then exactly the same as before C++ got thread support, the C++03 era,
namely a simple checking of an ordinary non-atomic boolean flag.
One way to ensure that in user code is to call the function from the
main thread, before it can possibly be called from other threads.
- Alf