thread local ctor on main...

Chris M. Thomasson

unread,

Dec 15, 2018, 1:25:44 AM12/15/18

to

Fwiw, there is an interesting difference between GCC and MSVC. MSVC
automatically calls the thread local constructors. Take this program
into account:
__________________

#include <iostream>
#include <thread>
#include <atomic>
#include <mutex>
#include <cassert>

#define THREADS 4

#define CT_MB_ACQ std::memory_order_acquire
#define CT_MB_REL std::memory_order_release
#define CT_MB_RLX std::memory_order_relaxed

static std::atomic<unsigned long> g_per_thread_ctor(0);
static std::atomic<unsigned long> g_per_thread_dtor(0);
static std::mutex g_cout_mutex;

struct ct_per_thread
{
unsigned long m_id;

ct_per_thread()
: m_id(g_per_thread_ctor.fetch_add(1, CT_MB_RLX))
{

}

~ct_per_thread()
{
g_per_thread_dtor.fetch_add(1, CT_MB_RLX);
}
};

static thread_local ct_per_thread g_per_thread;

void ct_worker()
{
ct_per_thread& self = g_per_thread;

g_cout_mutex.lock();
std::cout << "ct_worker::(" << self.m_id << ")\n";
g_cout_mutex.unlock();
}

int main()
{
ct_per_thread& self = g_per_thread;

std::cout << "main::(" << self.m_id << ")\n";

{
std::thread threads[THREADS];

for (unsigned long i = 0; i < THREADS; ++i)
{
threads[i] = std::thread(ct_worker);
}

for (unsigned long i = 0; i < THREADS; ++i)
{
threads[i].join();
}
}

unsigned long a = g_per_thread_ctor.load(CT_MB_RLX);
unsigned long b = g_per_thread_dtor.load(CT_MB_RLX) + 1;

std::cout << "a = " << a << "\n";
std::cout << "b = " << b << "\n";

assert(a == b);

std::cout << "\n\nmain() - exit\n";
std::cout.flush();

return 0;
}
__________________

This should product output like, where a is always equal to b:
__________________
ct_worker::(1)
ct_worker::(2)
ct_worker::(3)
ct_worker::(4)
a = 5
b = 5
__________________

However, on GCC, when one omits the first two lines of code in main:
__________________
ct_per_thread& self = g_per_thread;

std::cout << "main::(" << self.m_id << ")\n";
__________________

GCC does not call the ctor for self, while MSVC does. Try getting rid of
those two lines of code and run it under GCC. It should go something like:
__________________
ct_worker::(0)
ct_worker::(1)
ct_worker::(2)
ct_worker::(3)
a = 4
b = 5
__________________

Where a does not equal b. Humm...

Should the ctor for a thread local be automatically called on thread
creation, even main, or... Should it be delayed until a thread actually
uses it? Need to dig into the standard.

Chris M. Thomasson

unread,

Dec 15, 2018, 2:01:42 AM12/15/18

to

On 12/14/2018 10:25 PM, Chris M. Thomasson wrote:
> Fwiw, there is an interesting difference between GCC and MSVC. MSVC
> automatically calls the thread local constructors. Take this program
> into account:
> __________________
>
> #include <iostream>
> #include <thread>
> #include <atomic>
> #include <mutex>
> #include <cassert>
>
>
> #define THREADS 4
>
>
> #define CT_MB_ACQ std::memory_order_acquire
> #define CT_MB_REL std::memory_order_release
> #define CT_MB_RLX std::memory_order_relaxed
>
>
> static std::atomic<unsigned long> g_per_thread_ctor(0);
> static std::atomic<unsigned long> g_per_thread_dtor(0);
> static std::mutex g_cout_mutex;
>
>
> struct ct_per_thread
> {
>     unsigned long m_id;
>
>     ct_per_thread()
>         : m_id(g_per_thread_ctor.fetch_add(1, CT_MB_RLX))
>     {
>
>     }
>
>     ~ct_per_thread()
>     {
>         g_per_thread_dtor.fetch_add(1, CT_MB_RLX);
>     }
> };

[...]

Can a thread local be a non-POD type?

Paavo Helde

unread,

Dec 15, 2018, 10:07:11 AM12/15/18

to

On 15.12.2018 8:25, Chris M. Thomasson wrote:
>
> Should the ctor for a thread local be automatically called on thread
> creation, even main, or... Should it be delayed until a thread actually
> uses it? Need to dig into the standard.

The standard says a thread local "shall be initialized before its first
odr-use". I guess this is for supporting the "zero overhead" principle:
no resources should be wasted for construction of a thread local in the
threads which don't use it.

Chris M. Thomasson

unread,

Dec 15, 2018, 2:48:47 PM12/15/18

to

Thanks a million for it really helps clear things up. The funny part is
that MSVC still calls the constructor for the thread local in main, even
if we delete the lines of code that actually use it. For instance:
________________________
int main()
{
//ct_per_thread& self = g_per_thread;

//std::cout << "main::(" << self.m_id << ")\n";

{
std::thread threads[THREADS];

for (unsigned long i = 0; i < THREADS; ++i)
{
threads[i] = std::thread(ct_worker);
}

for (unsigned long i = 0; i < THREADS; ++i)
{
threads[i].join();
}
}

unsigned long a = g_per_thread_ctor.load(CT_MB_RLX);
unsigned long b = g_per_thread_dtor.load(CT_MB_RLX) + 1;

std::cout << "a = " << a << "\n";
std::cout << "b = " << b << "\n";

assert(a == b);

std::cout << "\n\nmain() - exit\n";
std::cout.flush();

return 0;
}

________________________

MSVC outputs:
________________________

ct_worker::(1)
ct_worker::(2)
ct_worker::(3)
ct_worker::(4)
a = 5
b = 5

main() - exit
________________________

GCC outputs:
________________________
ct_worker::(1)
ct_worker::(2)
ct_worker::(0)

ct_worker::(3)
a = 4
b = 5

Assertion failed!
________________________

GCC is giving the correct output according to the standard. MSVC
constructs a ct_per_thread in main no matter what. Humm...

James Kuyper

unread,

Dec 15, 2018, 3:37:15 PM12/15/18

to

On 12/15/18 02:01, Chris M. Thomasson wrote:
...
> Can a thread local be a non-POD type?

The only restrictions on the use of thread_local are it "shall be
applied only to the names of variables of namespace or block scope and
to the names of static data members." (7.1.1p3). Whether or not it's a
POD type has no relevance.

Paavo Helde

unread,

Dec 15, 2018, 5:05:53 PM12/15/18

to

On 15.12.2018 21:48, Chris M. Thomasson wrote:
> On 12/15/2018 7:06 AM, Paavo Helde wrote:
>> On 15.12.2018 8:25, Chris M. Thomasson wrote:
>>>
>>> Should the ctor for a thread local be automatically called on thread
>>> creation, even main, or... Should it be delayed until a thread actually
>>> uses it? Need to dig into the standard.
>>
>> The standard says a thread local "shall be initialized before its
>> first odr-use". I guess this is for supporting the "zero overhead"
>> principle: no resources should be wasted for construction of a thread
>> local in the threads which don't use it.
>>
>
> Thanks a million for it really helps clear things up. The funny part is
> that MSVC still calls the constructor for the thread local in main, even
> if we delete the lines of code that actually use it.

Everything is "before" an event which never happens, so MSVC is arguably
correct here from a legal viewpoint.

Chris M. Thomasson

unread,

Dec 15, 2018, 10:46:55 PM12/15/18

to

It does seem to violate the "zero-overhead" principle.

Chris M. Thomasson

unread,

Dec 15, 2018, 11:11:53 PM12/15/18

to

I am very familiar with POSIX thread "specific" data for threads:

http://pubs.opengroup.org/onlinepubs/007904975/functions/pthread_getspecific.html

Never really messed around with C++11 thread specific data. I just
wanted to ask this question here wrt when ctor/dtor's can be called. I
am not used to a ctor being called before its first use like MSVC does
wrt thread_local. I like GCC's behavior better because it fits in with
the POSIX way of per-thread data.

Fwiw, I think C11 threads are more inline with POSIX:

https://en.cppreference.com/w/c/thread/tss_get
https://en.cppreference.com/w/c/thread/tss_set

There is no ambiguity here.

David Brown

unread,

Dec 16, 2018, 7:20:18 AM12/16/18

to

This is a matter of optimisation and code quality - both compilers
generate correct code. And while gcc's method is more efficient in this
case, it could be less efficient in other cases (needing more overhead
when you actually use the thread locals).

The "zero overhead" principle is not about having zero overhead for
things in your code that you don't use. It is about having zero
overhead for features in the language that you don't use. The idea is
that when the language (or the standard library) introduces new
features, they will not cause overhead in code that does not use them.

Overhead for objects or functions that you don't use is purely a matter
of optimisation.

Chris M. Thomasson

unread,

Dec 16, 2018, 6:00:39 PM12/16/18

to

Fair enough. I am just not really used to a thread local ctor being
called before I use it, coming from C and POSIX; C11 even.

Cholo Lennon

unread,

Dec 17, 2018, 10:32:38 PM12/17/18

to

On 12/15/18 3:25 AM, Chris M. Thomasson wrote:
> Fwiw, there is an interesting difference between GCC and MSVC. MSVC
> automatically calls the thread local constructors. Take this program
> into account:
> __________________
>
> #include <iostream>
> #include <thread>
> #include <atomic>
> #include <mutex>
> #include <cassert>
>
>
> #define THREADS 4

Unbelievable, 2018 and people still using the preprocessor to define
constants :-O

>
> #define CT_MB_ACQ std::memory_order_acquire
> #define CT_MB_REL std::memory_order_release
> #define CT_MB_RLX std::memory_order_relaxed
>

Again, unbelievable :-O

--
Cholo Lennon
Bs.As.
ARG

Chris M. Thomasson

unread,

Dec 17, 2018, 11:27:22 PM12/17/18

to

On 12/17/2018 7:32 PM, Cholo Lennon wrote:
> On 12/15/18 3:25 AM, Chris M. Thomasson wrote:
>> Fwiw, there is an interesting difference between GCC and MSVC. MSVC
>> automatically calls the thread local constructors. Take this program
>> into account:
>> __________________
>>
>> #include <iostream>
>> #include <thread>
>> #include <atomic>
>> #include <mutex>
>> #include <cassert>
>>
>>
>> #define THREADS 4
>
> Unbelievable, 2018 and people still using the preprocessor to define
> constants :-O

A habit from C: Yikes!

>> #define CT_MB_ACQ std::memory_order_acquire
>> #define CT_MB_REL std::memory_order_release
>> #define CT_MB_RLX std::memory_order_relaxed
>>
>
> Again, unbelievable :-O

Actually, imvvho, these can be "useful". It allows one to change memory
order. Imagine observing what happens when one defines CT_MB_ACQ to
std::memory_order_relaxed? Btw, they are slightly shorter... I should
label them with do not panic?

[...]

David Brown

unread,

Dec 18, 2018, 7:12:57 AM12/18/18

to

How about:

const auto ct_mb_acq = std::memory_order_acquire;

?

But don't give them names like this and then change them "to see what
happens". Use a name like "per_thread_memory_order", or something that
identifies how you are using the constant. /Then/ you can change it to
see the effect.

You do that with "THREADS" - it is named by usage, and you can change
the value without making the program nonsensical. It would be highly
confusing to write:

#define FOUR 4

std::thread threads[FOUR];

and then experiment with:

#define FOUR 8