Thanks a million for it really helps clear things up. The funny part is
that MSVC still calls the constructor for the thread local in main, even
if we delete the lines of code that actually use it. For instance:
________________________
int main()
{
//ct_per_thread& self = g_per_thread;
//std::cout << "main::(" << self.m_id << ")\n";
{
std::thread threads[THREADS];
for (unsigned long i = 0; i < THREADS; ++i)
{
threads[i] = std::thread(ct_worker);
}
for (unsigned long i = 0; i < THREADS; ++i)
{
threads[i].join();
}
}
unsigned long a = g_per_thread_ctor.load(CT_MB_RLX);
unsigned long b = g_per_thread_dtor.load(CT_MB_RLX) + 1;
std::cout << "a = " << a << "\n";
std::cout << "b = " << b << "\n";
assert(a == b);
std::cout << "\n\nmain() - exit\n";
std::cout.flush();
return 0;
}
________________________
MSVC outputs:
________________________
ct_worker::(1)
ct_worker::(2)
ct_worker::(3)
ct_worker::(4)
a = 5
b = 5
main() - exit
________________________
GCC outputs:
________________________
ct_worker::(1)
ct_worker::(2)
ct_worker::(0)
ct_worker::(3)
a = 4
b = 5
Assertion failed!
________________________
GCC is giving the correct output according to the standard. MSVC
constructs a ct_per_thread in main no matter what. Humm...