Thread-safe initialization of static objects

Bonita Montero

unread,

Sep 6, 2023, 2:15:29 PM9/6/23

to

With C++11 static local or global data is initialized thread-safe.
This is usually done with double checked locking (DCL). DCL needs
a flag along with the static data which shows if the data already
is initialized and a mutex which guards the initalization.
I assumed that an implementation simply would use a central mutex
for all static data objects since it includes a kernel semaphore,
which is a costly resource.
To find out if this is true I wrote the below application:

#include <iostream>
#include <thread>
#include <vector>

using namespace std;

template<unsigned Thread>
struct SleepAtInitialize
{
SleepAtInitialize()
{
this_thread::sleep_for( 1s );
}
};

int main()
{
auto unroll = []<size_t ... Indices>( index_sequence<Indices ...>, auto
fn )
{
((fn.template operator ()<Indices>()), ...);
};
constexpr unsigned N_THREADS = 10;
vector<jthread> threads;
threads.reserve( N_THREADS );
unroll( make_index_sequence<N_THREADS>(),
[&]<unsigned Thread>()
{
threads.emplace_back( [&]<unsigned IObj>(integral_constant<unsigned,
IObj> )
{
static SleepAtInitialize<IObj> guard;
}, integral_constant<unsigned, Thread>() );
} );

}

SleepAtInitialize<IObj> is instantiated only once per thread with this
code. So the threads don't share a central object. If there would be
a central mutex used the above code would run for about 10s. But the
code does run about one second, i.e. there are indivual mutexes per
each instantiation of SleepAtInitiaize<>. I.e. the creation of the
mutex is also done with the static initialization. The mutex constructor
is noexcept, so mutexes always must be created on their first use. This
is done while the DCL-locked creation of the static object. So at last
static initialization should be declared to throw a system_errror. But
I can't find anything about that in the standard.

Pavel

unread,

Sep 6, 2023, 6:44:30 PM9/6/23

to

Bonita Montero wrote:
> With C++11 static local or global data is initialized thread-safe.
> This is usually done with double checked locking (DCL). DCL needs
> a flag along with the static data which shows if the data already
> is initialized and a mutex which guards the initalization.

Not needed. A test-and-set instruction on a flag -- that is itself
constant-initialized -- is sufficient.

Bonita Montero

unread,

Sep 6, 2023, 10:54:51 PM9/6/23

to

Am 07.09.2023 um 00:44 schrieb Pavel:
> Bonita Montero wrote:
>> With C++11 static local or global data is initialized thread-safe.
>> This is usually done with double checked locking (DCL). DCL needs
>> a flag along with the static data which shows if the data already
>> is initialized and a mutex which guards the initalization.

> Not needed. A test-and-set instruction on a flag -- that is itself
> constant-initialized -- is sufficient.

Using only one flag would require spin-locking. However, spin-locking
is not possible in userspace because a thread holding a spinlock could
keep other threads spinning for a long time. Therefore, there is no
getting around a solution with a mutex. And creation and mutex synchro-
nization may fail.

Pavel

unread,

Sep 7, 2023, 12:18:22 AM9/7/23

to

Bonita Montero wrote:
> Am 07.09.2023 um 00:44 schrieb Pavel:
>> Bonita Montero wrote:
>>> With C++11 static local or global data is initialized thread-safe.
>>> This is usually done with double checked locking (DCL). DCL needs
>>> a flag along with the static data which shows if the data already
>>> is initialized and a mutex which guards the initalization.
>
>> Not needed. A test-and-set instruction on a flag -- that is itself
>> constant-initialized -- is sufficient.
>
> Using only one flag would require spin-locking. However, spin-locking
> is not possible in userspace

not true, a loop with yield in the body is very possible.

> because a thread holding a spinlock could
> keep other threads spinning for a long time. Therefore, there is no
> getting around a solution with a mutex.

does not have to be a mutex, can be call_once, a semaphore, anything,
actually.

> And creation and mutex synchro-
> nization may fail.

If initialization synchronization fails, the initialization can catch
and terminate. No need to throw a system error. No need to use C++
synchronization primitives in the initialization code either; nothing
prevents the implementation from being implemented in a
platform-specific manner.

Bonita Montero

unread,

Sep 7, 2023, 12:27:51 AM9/7/23

to

Am 07.09.2023 um 06:18 schrieb Pavel:

> not true, a loop with yield in the body is very possible.

No one would accept that because that would make the waiters to wait
much more longer than necessary.

> does not have to be a mutex, can be call_once, a semaphore, anything,
> actually.

Every applicable facility for that would rely on kernel-synchronization.
You'd need to create a binary semaphore for that and you need so syn-
chronize on that; both may fail.

> If initialization synchronization fails, the initialization can catch

> and terminate. ...

Nothing like that is specified.

Chris M. Thomasson

unread,

Sep 7, 2023, 2:34:01 AM9/7/23

to

On 9/6/2023 3:44 PM, Pavel wrote:
> Bonita Montero wrote:
>> With C++11 static local or global data is initialized thread-safe.
>> This is usually done with double checked locking (DCL). DCL needs
>> a flag along with the static data which shows if the data already
>> is initialized and a mutex which guards the initalization.
> Not needed. A test-and-set instruction on a flag -- that is itself
> constant-initialized -- is sufficient.

[...]

You also need to use the appropriate memory barriers. An acquire after
the first check, and a release before making the object visible.

// pseudo code, its been a while.
// Damn, I used to work with threads all of the time.
___________________________
static foo* g_foo = nullptr;

foo* local = g_foo; // atomic load

if (! local)
{
hash_lock(&g_foo);

local = g_foo; // atomic load

if (! local)
{
local = new foo;

// release mb #LoadStore | #StoreStore

g_foo = local; // atomic store

}

else
{
// acquire mb #LoadStore | #LoadLoad
}

hash_unlock(&g_foo);
}

else
{
// acquire mb #LoadStore | #LoadLoad
}

local->foobar();
___________________________

Iirc, that is a bare bones DCL.

Bonita Montero

unread,

Sep 7, 2023, 2:53:52 AM9/7/23

to

Am 07.09.2023 um 08:33 schrieb Chris M. Thomasson:

> You also need to use the appropriate memory barriers. ...
That's all inside the Wikipedia example about DCL. But the discussion
was about whether the thead safe-initialization may fail before or
after the object's constructor is called because the mutex creation
or the kernel-synchronization may fail.

Chris M. Thomasson

unread,

Sep 7, 2023, 3:17:15 AM9/7/23

to

Usually, the hash table of mutexes is created before any of the programs
logic is executed...

Paavo Helde

unread,

Sep 7, 2023, 5:56:00 AM9/7/23

to

06.09.2023 21:15 Bonita Montero kirjutas:
> With C++11 static local or global data is initialized thread-safe.
> This is usually done with double checked locking (DCL). DCL needs
> a flag along with the static data which shows if the data already
> is initialized and a mutex which guards the initalization.
> I assumed that an implementation simply would use a central mutex
> for all static data objects since it includes a kernel semaphore,
> which is a costly resource.
> To find out if this is true I wrote the below application:

[...]

> SleepAtInitialize<IObj> is instantiated only once per thread with this
> code. So the threads don't share a central object. If there would be
> a central mutex used the above code would run for about 10s. But the
> code does run about one second,

There is a note in the standard: "[Note: This definition permits
initialization of a sequence of ordered variables concurrently with
another sequence. —end note]"

i.e. there are indivual mutexes per
> each instantiation of SleepAtInitiaize<>. I.e. the creation of the
> mutex is also done with the static initialization. The mutex constructor
> is noexcept, so mutexes always must be created on their first use. This
> is done while the DCL-locked creation of the static object. So at last
> static initialization should be declared to throw a system_errror. But
> I can't find anything about that in the standard.

Some debugging with VS2022 seems to indicate it is using a Windows
critical section for thread-safe statics initialization.
EnterCriticalSection() does not return any error code and of course does
not throw any C++ exceptions either, so it is supposed to never fail.

Yes, it's true it can throw a Windows structured exception
EXCEPTION_POSSIBLE_DEADLOCK (after 30 days by default). But this would
be considered as a fault in the program. This is what the C++ standard
says about deadlocks (again a footnote):

"The implementation must not introduce any deadlock around execution of
the initializer. Deadlocks might still be caused by the program logic;
the implementation need only avoid deadlocks due to its own
synchronization operations."

So I gather that in case the thread-safe static init synchronization
fails, there must be a bug in the implementation. No C++ exceptions
would be thrown anyway.

Bonita Montero

unread,

Sep 7, 2023, 8:18:03 AM9/7/23

to

Am 07.09.2023 um 09:16 schrieb Chris M. Thomasson:

> Usually, the hash table of mutexes is created before
> any of the programs logic is executed...

That sounds too complex for me. Creating individual mutexes on
demand would be o.k. and I think a lock-free stack with a pool
of mutexes would be fancy. A hashtable is too slow for that.

Pavel

unread,

Sep 7, 2023, 10:45:22 AM9/7/23

to

Bonita Montero wrote:
> Am 07.09.2023 um 06:18 schrieb Pavel:
>
>> not true, a loop with yield in the body is very possible.
>
> No one would accept that because that would make the waiters to wait
> much more longer than necessary.

How so? Waiters will wait for the time of initialization. The
initializing thread will be yielded to and receive virtually as many and
as complete time slices as it would under any other scheduling
discipline so it will complete its job in same or virtually same time
(actually, the higher system contention level is the more efficient
user-space waiting with yield becomes). Hence the time waiters will wait
is exactly or virtually same as when using mutex.

Also it should be taken into account that all above (and below) is only
relevant to the rare case when the initialization is contended; in other
words, "no one would accept" would be an exaggeration of the year even
if your speculation on "wait much longer" were true -- which it isn't.

>
>> does not have to be a mutex, can be call_once, a semaphore, anything,
>> actually.
>
> Every applicable facility for that would rely on kernel-synchronization.
> You'd need to create a binary semaphore for that and you need so syn-
> chronize on that; both may fail.

try_lock in a loop with a sleep or yield wouldn't fail. But as said
above, it's not needed.

Regardless, your argument assumes too much C++-morphism in the
implementation whereas the implementation can use any platform-specific
approach available to it. E.g. pthread_once on Linux does not fail if
given valid arguments (which C++ implementation can provide). Other
platforms may have tools of their own to do the job.

>
>> If initialization synchronization fails, the initialization can catch
>> and terminate. ...
>
> Nothing like that is specified.

Correct, the above was wrong. But initialization can catch and try
again. This does not have to be specified as it is not observable from
outside.

Bonita Montero

unread,

Sep 7, 2023, 12:39:36 PM9/7/23

to

Am 07.09.2023 um 16:44 schrieb Pavel:

> How so? Waiters will wait for the time of initialization.

> The initializing thread will be yielded ...

Initializing is usually much faster than a whole timeslice,
so yielding would be incacceptabel. That's just a stupid idea.

> try_lock in a loop with a sleep or yield wouldn't fail. ...

Do you really think someone would accept spinning with that ?
This means that an initializing thread which is scheduled away
while holding the mutex might keep other threads spinning for
a long time.

> Regardless, your argument assumes too much C++-morphism in the
> implementation whereas the implementation can use any platform
> -specific approach available to it. E.g. pthread_once on Linux

> does not fail if given valid arguments ...

pthread_once could be implemented with a single central semaphore for
all operations and if the implementers know that the synchronization
itself doesn't fail and the semphore is pre-allocated by the runtime
it's possible to survive that synchronization without error.
But check that code:

#include <iostream>
#include <thread>
#include <vector>

using namespace std;

template<unsigned Thread>
struct SleepAtInitialize
{
SleepAtInitialize()
{
this_thread::sleep_for( 1s );
}
};

int main()
{
auto unroll = []<size_t ... Indices>( index_sequence<Indices ...>, auto
fn )
{
((fn.template operator ()<Indices>()), ...);
};
constexpr unsigned N_THREADS = 10;
vector<jthread> threads;
threads.reserve( N_THREADS );
unroll( make_index_sequence<N_THREADS>(),
[&]<unsigned Thread>()
{
threads.emplace_back(

[&]<unsigned IObj>( integral_constant<unsigned, IObj> )

{
static SleepAtInitialize<IObj> guard;
}, integral_constant<unsigned, Thread>() );
} );

}

This code runs with individual mutexes per object, i.e. the time
taken is about one second (with MSVC, libc++ and libstdc++). So
when individual mutexes are used the initialization may fail.

> Correct, the above was wrong. But initialization can catch and try

> again. ...

That a bumbler solution.

Scott Lurndal

unread,

Sep 7, 2023, 1:28:01 PM9/7/23

to

Bonita Montero <Bonita....@gmail.com> writes:
>Am 07.09.2023 um 16:44 schrieb Pavel:
>
>> How so? Waiters will wait for the time of initialization.
>> The initializing thread will be yielded ...
>
>Initializing is usually much faster than a whole timeslice,
>so yielding would be incacceptabel. That's just a stupid idea.

So tell us, on which operating systems will there be more than
one thread running when application static objects are
initialized (which happens generally before the application
'main' function is called, and thus before the application
has a chance to create any threads)?

Richard Damon

unread,

Sep 7, 2023, 2:07:05 PM9/7/23

to

While GLOBAL static objects get initialized before main starts, function
local static objects don't get initialized until the first call of the
function. These will need some synchronization if the function is called
from multiple threads at "the same time".

Also, some global object could start up a thread in its constructor.

Bonita Montero

unread,

Sep 7, 2023, 2:24:33 PM9/7/23

to

Am 07.09.2023 um 19:27 schrieb Scott Lurndal:

> So tell us, on which operating systems will there be more
> than one thread running when application static objects are
> initialized (which happens generally before the application
> 'main' function is called, and thus before the application
> has a chance to create any threads)?

I've shown with my code that each static object gets its own mutex
with MSVC, libstdc++ and libc++. Here it is again in a simplified
version:

#include <iostream>
#include <thread>
#include <vector>

using namespace std;

int main()
{

struct SleepAtInitialize
{
SleepAtInitialize()
{
this_thread::sleep_for( 1s );
}
};

auto unroll = []<size_t ... Indices>( index_sequence<Indices ...>, auto
fn )
{
((fn.template operator ()<Indices>()), ...);
};
constexpr unsigned N_THREADS = 10;
vector<jthread> threads;
threads.reserve( N_THREADS );
unroll( make_index_sequence<N_THREADS>(),
[&]<unsigned Thread>()
{
threads.emplace_back(
[&]<unsigned IObj>( integral_constant<unsigned, IObj> )
{

static SleepAtInitialize guard;

}, integral_constant<unsigned, Thread>() );
} );

}

The code runs about one second with all three implementations,
so ther's a mutex per statically initialized object.

Chris M. Thomasson

unread,

Sep 7, 2023, 3:06:47 PM9/7/23

to

No. A simple hash of a pointer into an index works out okay, not too
slow at all. Fwiw, check this out, tell me what you think:

https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/wwYQCG2hAwAJ

It is a quick and crude example simulation of one way to do it. The hash
lock table is created before program logic is executed.

Any thoughts?

Also, we are talking about a slow path wrt DCL.

Chris M. Thomasson

unread,

Sep 7, 2023, 3:20:56 PM9/7/23

to

A simple global hash lock scheme is where we can hash addresses directly
into a static locking table. The lock table is created _before_ any
program logic is executed.

https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/wwYQCG2hAwAJ

> Also, some global object could start up a thread in its constructor.

YIKES! Shit. I have had to debug other peoples code that did this. Many
points of errors... One was a rather common peach of a bug. The
constructor would create a thread that would in turn call into a virtual
function and start using the object before its constructor was
completed. A massive race condition, nasty ones!

Bonita Montero

unread,

Sep 7, 2023, 3:27:59 PM9/7/23

to

Am 07.09.2023 um 21:06 schrieb Chris M. Thomasson:

> No. A simple hash of a pointer into an index works out okay, not

> too slow at all. Fwiw, check this out, tell me what you think.

Then the number of mutexes would be fixed.

Chris M. Thomasson

unread,

Sep 7, 2023, 3:41:22 PM9/7/23

to

Yup. It uses address based hashing into the fixed mutex table. It can be
used to simulate atomics when:

https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free

is false.

Richard Damon

unread,

Sep 7, 2023, 3:52:16 PM9/7/23

to

Yes, such a thread needs to understand that the system isn't fully
configured. And yes, a base class creating a thread needs that thread to
understand that the object isn't fully created yet and do something to
handle that issue (which likely requires some help from the most derived
class).

Chris M. Thomasson

unread,

Sep 7, 2023, 3:59:07 PM9/7/23

to

I actually got to a point where I said no creating threads in
constructors! I have seen to many problems. Create the object _first_,
then expose it to a thread, or create a thread that works with the
_fully_ constructed object. The types of race conditions I have had to
deal with wrt objects creating threads in constructors have tended to be
rather nasty in nature...

Bonita Montero

unread,

Sep 7, 2023, 4:00:18 PM9/7/23

to

Am 07.09.2023 um 21:41 schrieb Chris M. Thomasson:

> Yup. It uses address based hashing into the fixed mutex table.

I guess that's not used at all in the runtimes.

Richard Damon

unread,

Sep 7, 2023, 4:31:57 PM9/7/23

to

I have ONE class that does this, but it is for a somewhat specific
embedded environment with deterministic scheduling and just a single
core, so the thread is created with a lower priority then the creator so
it can't start running until the creator blocks, which it isn't allowed
to do until the constructor finishes. It also is normally used early in
the program before the "OS" is turned on, so the created thread can't
actually run until the OS is turned on.

But, for the more general case where you can't control that, (under more
"normal" conditions on the typical "powerful" computer), yes, that would
be a risky thing to be doing.

Scott Lurndal

unread,

Sep 7, 2023, 5:13:41 PM9/7/23

to

The race condition is pretty easy to resolve. Turn thread initialization
into a two-phase operation.

When we have an object that requires its own thread, it will inherit
from a base class called c_thread. The constructor for c_thread will
create the thread which eventually executes the objects 'run' function. The
pthread_create call starts the thread executing at the static function
c_thread::run(void *obj), where the argument is a pointer to the object
itself.

That static function sets the process name (prctl(PR_SET_NAME)) and
then waits on a pthread_cond_t (thread_initialized) declared as a data member of
c_thread.

Meanwhile, the derived object constructor runs in the original thread after the c_thread
constructor returns and initializes the object. Once the object is fully
initialized (which may happen in the constructor, or may happen
at some later point in time during program startup), the c_thread::run
method condition variable (thread_initialized) is signaled and the
static ::run function transfers control to the derived class virtual ::run
function.

/**
* Static pthread start function. Invokes thread specific run virtual
* function.
*
* @param arg The class object pointer passed in from the pthread_create.
*/
void *
c_thread::run(void *arg)
{
c_thread *tp = (c_thread *)arg;
int diag;

pthread_mutex_lock(&tp->t_threadlock);
while (!tp->t_thread_initialized) {
pthread_cond_wait(&tp->t_ready, &tp->t_threadlock);
}
pthread_mutex_unlock(&tp->t_threadlock);

diag = prctl(PR_SET_NAME, tp->t_thread_name, NULL, NULL, NULL, NULL);
if (diag == -1) {
tp->t_logger->log("Unable to set thread name: %s\n", strerror(errno));
}

tp->t_running = true;
tp->run();
tp->t_running = false;

return NULL;
}

Pavel

unread,

Sep 7, 2023, 8:54:07 PM9/7/23

to

Bonita Montero wrote:
> Am 07.09.2023 um 16:44 schrieb Pavel:
>
>> How so? Waiters will wait for the time of initialization.
>> The initializing thread will be yielded ...
>
> Initializing is usually much faster than a whole timeslice,
> so yielding would be incacceptabel. That's just a stupid idea.

Mutex makes thread waiting for initialization not runnable. Yield leaves
it runnable. Therefore, waking up a thread that acquired the mutex is
actually slower than the time needed by thread that yielded to get
running again. This is regardless of whether initializing is slow or fast.

>
>> try_lock in a loop with a sleep or yield wouldn't fail. ...
>
> Do you really think someone would accept spinning with that ?

I don't have to think about this because I know many do.

> This means that an initializing thread which is scheduled away
> while holding the mutex might keep other threads spinning for
> a long time.

No, it can be scheduled away only momentarily because the waiting
threads will relinquish the CPU instantly after being scheduled.

>
>> Regardless, your argument assumes too much C++-morphism in the
>> implementation whereas the implementation can use any platform
>> -specific approach available to it. E.g. pthread_once on Linux
>> does not fail if given valid arguments ...
>
> pthread_once could be implemented with a single central semaphore

pthread_once takes a statically-initialized pthread_once_t "once
control" that the calling code must supply. It is against the purpose of
pthread_once to wait for a single central semaphore. On the opposite,
the goal is to let the client code control the concurrency by selecting
how many "once control" they want to use.

> for
> all operations and if the implementers know that the synchronization
> itself doesn't fail and the semphore is pre-allocated by the runtime
> it's possible to survive that synchronization without error.

correct. My point exactly.

See above for why pthread_once implementation should also take one
second with the individual pthread_once_t once controls per object.

>
>> Correct, the above was wrong. But initialization can catch and try
>> again. ...
>
> That a bumbler solution.

mutex locking does not fail for no reason. In practice on Linux if the
mutex is in the committed memory and is propertly prioritized and
permissioned and is non-robust (all of which one should assume to be
true for a mutex used by a C++ implementation for initialization) I am
unaware of any reason for why non-timed locking may fail other than
deadlock detection. C++ behavior for the scenario when static
initialization deadlocks is unspecified so the above solution is no
better or worse than any other.

Bonita Montero

unread,

Sep 8, 2023, 1:37:24 AM9/8/23

to

Am 08.09.2023 um 02:53 schrieb Pavel:

> Bonita Montero wrote:

>> so yielding would be incacceptabel. That's just a stupid idea.

> Mutex makes thread waiting for initialization not runnable. Yield leaves
> it runnable. Therefore, waking up a thread that acquired the mutex is
> actually slower than the time needed by thread that yielded to get
> running again. This is regardless of whether initializing is slow or fast.

However, using sth. like yield is unacceptably slow.

> No, it can be scheduled away only momentarily because the waiting
> threads will relinquish the CPU instantly after being scheduled.

The thread might be scheduled away many timeslices - that's
unacceptable.

> See above for why pthread_once implementation should also take one
> second with the individual pthread_once_t once controls per object.

As you can see from my source the common runtimes use a mutex per
object. This creation may fail.

> mutex locking does not fail for no reason. ...

It's like when a memory allocation would fail and you would
spin until it doesn't fail again. That's also unacceptable.

Marcel Mueller

unread,

Sep 8, 2023, 12:56:04 PM9/8/23

to

Am 07.09.23 um 18:39 schrieb Bonita Montero:

> Am 07.09.2023 um 16:44 schrieb Pavel:
>
>> How so? Waiters will wait for the time of initialization.
>> The initializing thread will be yielded ...
>
> Initializing is usually much faster than a whole timeslice,
> so yielding would be incacceptabel. That's just a stupid idea.

You do not loose a whole time slice.
This is only true if all available CPU cores are already at 100% load.
In this case you likely loose some time slices anyway.

Normally the spin lock has only the overhead of the additional context
switches. This turns into some short delay if all CPU cores are at full
load, which is of course intended.

> Do you really think someone would accept spinning with that ?
> This means that an initializing thread which is scheduled away
> while holding the mutex might keep other threads spinning for
> a long time.

This is almost impossible since the spinning thread will block the CPU
for only one additional context switch. So the maximum impact is the
number of spinning threads vs. the number of working threads. Assuming
the the working threads likely do not call yield all the time they get
much more resources than this worst case scenario.

Marcel

Pavel

unread,

Sep 8, 2023, 2:55:03 PM9/8/23

to

Bonita Montero wrote:
> Am 08.09.2023 um 02:53 schrieb Pavel:
>
>> Bonita Montero wrote:
>
>>> so yielding would be incacceptabel. That's just a stupid idea.
>
>> Mutex makes thread waiting for initialization not runnable. Yield
>> leaves it runnable. Therefore, waking up a thread that acquired the
>> mutex is actually slower than the time needed by thread that yielded
>> to get running again. This is regardless of whether initializing is
>> slow or fast.
>
> However, using sth. like yield is unacceptably slow.

As per the above yield-based waiting for initialization is not slower
than mutex-based. Hence, this "unacceptably slow is nonsense.

>
>> No, it can be scheduled away only momentarily because the waiting
>> threads will relinquish the CPU instantly after being scheduled.
>
> The thread might be scheduled away many timeslices -

A thread that waits for the contended mutex is, upon locking the mutex,
scheduled identically to yield so no difference with mutex

> that's
> unacceptable.
as per the above, this "unacceptale" is nonsense, too.

>
>
>> See above for why pthread_once implementation should also take one
>> second with the individual pthread_once_t once controls per object.
>
> As you can see from my source the common runtimes use a mutex per
> object.

On the opposite, I can see in glibc source code that pthread_once for
linux (which is most common) does not use mutex. There is no "creation",
pthread_once_t is an int.

> This creation may fail.
as per the above, there is nothing to fail unless someone decides to
allocate the int in free memory -- which is not done for static
initialization.

>
>> mutex locking does not fail for no reason. ...
>
> It's like when a memory allocation would fail and you would
> spin until it doesn't fail again.

as per the above, there is nothing to fail unless someone decides to
allocate the int in free memory -- which is not done for static
initialization.

> That's also unacceptable.
as per the above, this "unacceptale" is nonsense, too.

Chris M. Thomasson

unread,

Sep 8, 2023, 4:20:21 PM9/8/23

to

Actually, one way that works pretty good, aka not make me freak out,
lol, wrt creating a thread in a constructor is to create a little proxy
object. Something along the lines of:

template<typename T>
struct thread_proxy
{
T m_object;
std::thread m_thread;
};

We can launch the thread in the constructor of thread_proxy<T>

The thread calls a function in T called thread_entry or whatever. This
ensures that T is 100% _fully_ constructed before the thread is created
and starts working with the damn thing. Imvvvho, this is a lot better
than creating the thread directly in T's constructor.

Actually, I need to get back into threads for some rendering work of mine.

Chris M. Thomasson

unread,

Sep 8, 2023, 4:26:19 PM9/8/23

to

On 9/6/2023 7:54 PM, Bonita Montero wrote:

> Am 07.09.2023 um 00:44 schrieb Pavel:
>> Bonita Montero wrote:

>>> With C++11 static local or global data is initialized thread-safe.
>>> This is usually done with double checked locking (DCL). DCL needs
>>> a flag along with the static data which shows if the data already
>>> is initialized and a mutex which guards the initalization.
>

>> Not needed. A test-and-set instruction on a flag -- that is itself
>> constant-initialized -- is sufficient.
>

> Using only one flag would require spin-locking. However, spin-locking
> is not possible in userspace because a thread holding a spinlock could
> keep other threads spinning for a long time. Therefore, there is no
> getting around a solution with a mutex. And creation and mutex synchro-
> nization may fail.
>

Limited spin locking in user space is fine. Think about it for a
moment... Think of adaptive mutex logic... They will try to spin wait a
couple of times on a contended state before using the kernel to actually
wait. No problem with that, right?

Chris M. Thomasson

unread,

Sep 8, 2023, 4:26:35 PM9/8/23

to

Why not?

Chris M. Thomasson

unread,

Sep 8, 2023, 4:29:21 PM9/8/23

to

On 9/7/2023 10:37 PM, Bonita Montero wrote:
> Am 08.09.2023 um 02:53 schrieb Pavel:
>
>> Bonita Montero wrote:
>
>>> so yielding would be incacceptabel. That's just a stupid idea.
>
>> Mutex makes thread waiting for initialization not runnable. Yield
>> leaves it runnable. Therefore, waking up a thread that acquired the
>> mutex is actually slower than the time needed by thread that yielded
>> to get running again. This is regardless of whether initializing is
>> slow or fast.
>
> However, using sth. like yield is unacceptably slow.

Huh? Think of an adaptive mutex for a moment... :^)

Pavel

unread,

Sep 8, 2023, 8:15:14 PM9/8/23

to

I think it's not needed in general for static initialization purpose.

This is because, if our design choice for our "C++ compiler" is to use a
dedicated mutex (or once_flag control) per a static variable that
requires dynamic initialization, nothing prevents our compiler from
allocating a static instance of such mutex or control next to the static
variable.

Because mutex and once_flag can be constant-initialized (at least in
POSIX, notably Linux), the code that dynamically initializes a variable
can directly use the address of its respective constant-initialized
mutex or control, as in the following example:

code in C++ source file:

static type1 var1 = InitVar1();

**pseudo-code** actually generated by the compiler:

pthread_once_t __var1_once_control = PTHREAD_ONCE_INIT; /* generated
by the compiler when */

static type1 var1 = InitVar1(); /* this is compiled to,
**in pseudo-code**, to something like

pthread_once( [&var1]() { var1 = InitVar1(); } );
*/

Chris M. Thomasson

unread,

Sep 8, 2023, 9:05:29 PM9/8/23

to

pthread_once is perfectly fine. Just wondering, are you creating the
POSIX impl directly here?

Chris M. Thomasson

unread,

Sep 8, 2023, 9:10:58 PM9/8/23

to

Think of how you would directly implement pthread_once... There are many
different ways.

Chris M. Thomasson

unread,

Sep 8, 2023, 9:23:00 PM9/8/23

to

On 9/7/2023 2:55 AM, Paavo Helde wrote:

> 06.09.2023 21:15 Bonita Montero kirjutas:
>> With C++11 static local or global data is initialized thread-safe.
>> This is usually done with double checked locking (DCL). DCL needs
>> a flag along with the static data which shows if the data already
>> is initialized and a mutex which guards the initalization.

>> I assumed that an implementation simply would use a central mutex
>> for all static data objects since it includes a kernel semaphore,
>> which is a costly resource.
>> To find out if this is true I wrote the below application:
> [...]
>> SleepAtInitialize<IObj> is instantiated only once per thread with this
>> code. So the threads don't share a central object. If there would be
>> a central mutex used the above code would run for about 10s. But the
>> code does run about one second,
>
> There is a note in the standard: "[Note: This definition permits
> initialization of a sequence of ordered variables concurrently with
> another sequence. —end note]"
>
> i.e. there are indivual mutexes per
>> each instantiation of SleepAtInitiaize<>. I.e. the creation of the
>> mutex is also done with the static initialization. The mutex constructor
>> is noexcept, so mutexes always must be created on their first use. This
>> is done while the DCL-locked creation of the static object. So at last
>> static initialization should be declared to throw a system_errror. But
>> I can't find anything about that in the standard.
>
> Some debugging with VS2022 seems to indicate it is using a Windows
> critical section for thread-safe statics initialization.
> EnterCriticalSection() does not return any error code and of course does
> not throw any C++ exceptions either, so it is supposed to never fail.

If it does fail, then some rather radical shit has hit the fan.

>
> Yes, it's true it can throw a Windows structured exception
> EXCEPTION_POSSIBLE_DEADLOCK (after 30 days by default). But this would
> be considered as a fault in the program. This is what the C++ standard
> says about deadlocks (again a footnote):
>
> "The implementation must not introduce any deadlock around execution of
> the initializer. Deadlocks might still be caused by the program logic;
> the implementation need only avoid deadlocks due to its own
> synchronization operations."
>
> So I gather that in case the thread-safe static init synchronization
> fails, there must be a bug in the implementation. No C++ exceptions
> would be thrown anyway.

Right. The implementation must get it right using fine grain locking,
amortized table locking, or some other means. The compiler and POSIX
worth together like a system. Iirc, I saw another way that was
lock-free, but allowed a thread to create an object, then might have to
delete it because it was not the first one to be installed in the sense
of being visible to other threads. This was decades ago, iirc it used
CAS. No mutex.

Chris M. Thomasson

unread,

Sep 8, 2023, 9:46:14 PM9/8/23

to

Iirc it was something like:

static foo* g_foo = nullptr;

foo* l_foo = g_foo; // atomic load

if (! l_foo)
{
foo* t_foo = new foo;
foo* cmp = nullptr;

// CAS would update the comparand on failure.
// Take special note of that...

// mb release

if (! CAS(&g_foo, &cmp, t_foo)) // atomic rmw
{
// failed!!! well, shit happens!
delete t_foo;
t_foo = cmp;
// mb acquire
}
}

else
{
// mb acquire
}

l_foo->bar();

Bonita Montero

unread,

Sep 8, 2023, 11:26:56 PM9/8/23

to

Am 08.09.2023 um 22:26 schrieb Chris M. Thomasson:

> Why not?

Because there are simpler ideas that work.

Bonita Montero

unread,

Sep 8, 2023, 11:45:11 PM9/8/23

to

Am 08.09.2023 um 20:54 schrieb Pavel:

> As per the above yield-based waiting for initialization is not slower
> than mutex-based. Hence, this "unacceptably slow is nonsense.

Of course it is slower because you wait N timeslices.

> A thread that waits for the contended mutex is, upon locking the mutex,
> scheduled identically to yield so no difference with mutex

No, a thread waiting for a mutex doesn's spin when the thread holding
the spinlock is scheduled away but it is sleeping inside the kernel.
That's more efficient.

> On the opposite, I can see in glibc source code that pthread_once for
> linux (which is most common) does not use mutex. There is no "creation",
> pthread_once_t is an int.

pthread_once uses a mutex internally for sure.

Paavo Helde

unread,

Sep 9, 2023, 10:55:09 AM9/9/23

to

09.09.2023 06:44 Bonita Montero kirjutas:
>
> pthread_once uses a mutex internally for sure.
>

pthread_once source code can be found by google:
https://elixir.bootlin.com/glibc/latest/source/nptl/pthread_once.c

From there I can see that it calls atomic_load_acquire(). Only if this
fails, will it call __pthread_once_slow() in the slow branch, which uses
futexes. Some other googling reveals that: "a futex ("Fast Userspace
Mutex") isn't a mutex at all"

Bonita Montero

unread,

Sep 9, 2023, 1:26:47 PM9/9/23

to

Am 09.09.2023 um 16:54 schrieb Paavo Helde:

> From there I can see that it calls atomic_load_acquire(). Only if this
> fails, will it call __pthread_once_slow() in the slow branch, which uses
> futexes. Some other googling reveals that: "a futex ("Fast Userspace
> Mutex") isn't a mutex at all"

Doesn't matter since the standard should honor static initialization
that fails because of synchronization woes. And as I've shown with
my code MSVC, libstdc++ and libc++ have a mutex per static initiali-
zation, so pthread_once doesn't apply here.

Pavel

unread,

Sep 9, 2023, 1:43:03 PM9/9/23

to

If Microsoft implemented standard wrongly, it's Microsoft's failure, not
the standard's.

Pavel

unread,

Sep 9, 2023, 1:44:45 PM9/9/23

to

I am not creating, just demonstrating in pseudo-code how Linux
implementation could work.

Pavel

unread,

Sep 9, 2023, 2:00:52 PM9/9/23

to

You are correct, there are. Fortunately, I don't have to reinvent this
particular wheel as the implementation source code is available for
everyone to read. On Linux, Futex is a natural choice.

Other platforms can have theirs. I did not program for Windows for long
time but I would at least try to implement call_once with a critical
section (whose creation, contrary to OP's insinuations, cannot fail
since Windows Vista).

Chris M. Thomasson

unread,

Sep 9, 2023, 2:41:51 PM9/9/23

to

Iirc, Windows has a futex like "thing". I cannot remember the API right
now. Will get back to you.

Chris M. Thomasson

unread,

Sep 9, 2023, 2:45:51 PM9/9/23

to

Not exactly sure how to parse that, but sure. As long as it works.

Chris M. Thomasson

unread,

Sep 9, 2023, 2:48:58 PM9/9/23

to

On 9/8/2023 8:44 PM, Bonita Montero wrote:
> Am 08.09.2023 um 20:54 schrieb Pavel:
>
>> As per the above yield-based waiting for initialization is not slower
>> than mutex-based. Hence, this "unacceptably slow is nonsense.
>
> Of course it is slower because you wait N timeslices.
>
>> A thread that waits for the contended mutex is, upon locking the
>> mutex, scheduled identically to yield so no difference with mutex
>
> No, a thread waiting for a mutex doesn's spin when the thread holding
> the spinlock is scheduled away but it is sleeping inside the kernel.
> That's more efficient.

So, you must really dislike adaptive mutex designs, right? If so, why?
They can spin several times before waiting in the kernel... Really slow
path, so to speak. Btw, do you think that adaptive mutex are really bad?
If so, why?

Chris M. Thomasson

unread,

Sep 9, 2023, 2:49:56 PM9/9/23

to

Huh? pthread_once totally applies here. Windows has futex like abilities.

Bonita Montero

unread,

Sep 9, 2023, 3:15:49 PM9/9/23

to

Am 09.09.2023 um 19:42 schrieb Pavel:

> If Microsoft implemented standard wrongly, it's Microsoft's failure, not
> the standard's.

The standard doesn't say anything to the issue I've shown.

Bonita Montero

unread,

Sep 9, 2023, 3:17:35 PM9/9/23

to

Am 09.09.2023 um 20:49 schrieb Chris M. Thomasson:

> Huh? pthread_once totally applies here. Windows has futex like abilities.

It doesn't matter what Posix or Win32 says. The standard should
honour that static initialization might fail because of creating
a synchronization primitive or synchronizing on them. The standard
shoudln't require that this never fails.

Chris M. Thomasson

unread,

Sep 9, 2023, 3:20:22 PM9/9/23

to

On 9/9/2023 12:17 PM, Bonita Montero wrote:
> Am 09.09.2023 um 20:49 schrieb Chris M. Thomasson:
>
>> Huh? pthread_once totally applies here. Windows has futex like abilities.
>
> It doesn't matter what Posix or Win32 says.

Are you sure about that?

> The standard should
> honour that static initialization might fail because of creating
> a synchronization primitive or synchronizing on them. The standard
> shoudln't require that this never fails.
>

Might fail? For what reasons?

Chris M. Thomasson

unread,

Sep 9, 2023, 3:21:37 PM9/9/23

to

On 9/9/2023 12:17 PM, Bonita Montero wrote:

Are you talking about a user object that is statically initialized
failing, throwing an exception, calling exit, or something?

Chris M. Thomasson

unread,

Sep 9, 2023, 3:49:45 PM9/9/23

to

// God damn it! There is a bug right here.
// I forgot to set l_foo to t_foo.

SHIT!

l_foo = t_foo;

God damn it!

Chris M. Thomasson

unread,

Sep 9, 2023, 3:53:30 PM9/9/23

to

On 9/7/2023 2:55 AM, Paavo Helde wrote:

[...]

I should create a new thread for this, but this is my corrected
pseudo-code. I forgot to set l_foo to t_foo in my previous code. So
sorry about that nonsense.

static foo* g_foo = nullptr;

foo* l_foo = g_foo; // atomic load

if (! l_foo)
{
foo* t_foo = new foo;
foo* cmp = nullptr;

// CAS would update the comparand on failure.
// Take special note of that...

// mb release

if (! CAS(&g_foo, &cmp, t_foo)) // atomic rmw
{
// failed!!! well, shit happens!
delete t_foo;
t_foo = cmp;
// mb acquire
}

// I forgot to do this in my prior pseudo code.
// ARGH!!! ;^o

l_foo = t_foo; // damn it, sorry again.

Bonita Montero

unread,

Sep 9, 2023, 3:55:18 PM9/9/23

to

Am 09.09.2023 um 21:20 schrieb Chris M. Thomasson:

>> It doesn't matter what Posix or Win32 says.

> Are you sure about that?

The standard is operating system agnostic.

> Might fail? For what reasons?

F.e. because there's no more non-pageable memory for a semaphore.

Chris M. Thomasson

unread,

Sep 9, 2023, 3:59:42 PM9/9/23

to

An address based hash into a table based approach works by creating
everything up front. If this fails, then the user program logic is not
even executed at all.

Chris M. Thomasson

unread,

Sep 9, 2023, 4:00:43 PM9/9/23

to

Just one way.

Richard Damon

unread,

Sep 9, 2023, 4:38:07 PM9/9/23

to

The Standard defines the final results that must happen (or
possibilities that can happen in some cases).

If the implementation doesn't meet that result, it is non-conforming and
could be said to have implemented things "wrongly"

Richard Damon

unread,

Sep 9, 2023, 4:41:49 PM9/9/23

to

Why?

Unless you can show that NO method can be created that meet the
requirements, using an insufficient method is just incorrect.

Since the use of a single synchronization primative meets the
requirement, (and a system that can't generate at least one is
insufficient to meet the standard), any system that can't meet the
requirement is just incorrect.

Chris M. Thomasson

unread,

Sep 9, 2023, 10:30:46 PM9/9/23

to

Windows has:

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-initonceexecuteonce

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-initonceinitialize

This is their part of their "futex" API:

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress

Bonita Montero

unread,

Sep 10, 2023, 4:57:47 AM9/10/23

to

Am 09.09.2023 um 22:37 schrieb Richard Damon:

> The Standard defines the final results that must happen
> (or possibilities that can happen in some cases).

The standards also should consider side effects as in this example
when a static initialization would fail because of a synchronization
error.

Bonita Montero

unread,

Sep 10, 2023, 4:59:07 AM9/10/23

to

Do you want to follow in Amine's footsteps ?

Bonita Montero

unread,

Sep 10, 2023, 4:59:50 AM9/10/23

to

Am 09.09.2023 um 22:41 schrieb Richard Damon:

> Why?

Because the standard should honour that kernel-synch might fail
on static initialization.

Richard Damon

unread,

Sep 10, 2023, 9:51:26 AM9/10/23

to

You don't seem to understand that sincd it CAN be done in a way that
always works, any method that doesn't always work is just WRONG and
non-conforming.

I guess you think that 1 + 1 must be allowed to be 1 in case of
"synchronization errors".

Static Initialization has DEFINED behavior, and that behavior must be
honored, and the implementation do what is need to make that happen.

Methods have been shown to do it (perhaps less performant than this is
some conditions), so it is possible, so you can't say the Standard is
asking for impossible behavior, thus any "optimizations" that don't meet
that defined behavior are just WRONG.

Saying something wrong must be made right is just WRONG, so you are
WRONG in your claim.

PERIOD.

Bonita Montero

unread,

Sep 10, 2023, 11:46:31 AM9/10/23

to

Am 10.09.2023 um 15:51 schrieb Richard Damon:

> You don't seem to understand that sincd it CAN be done in a way that

> always works, ...

A standard shouldn't mandate that this never fails.

> Static Initialization has DEFINED behavior, and that behavior must be
> honored, and the implementation do what is need to make that happen.

Locking-errors while static initialitation are unspecified; whether
they might occur or not on a specific operating system doesn't matter.

Paavo Helde

unread,

Sep 10, 2023, 12:09:11 PM9/10/23

to

10.09.2023 18:46 Bonita Montero kirjutas:
> Am 10.09.2023 um 15:51 schrieb Richard Damon:
>
>> You don't seem to understand that sincd it CAN be done in a way that
>> always works, ...
>
> A standard shouldn't mandate that this never fails.

So suppose the standard committee agrees with you and adds a sentence to
the standard:

Dynamic initialization of a non-local variable with static storage
duration can fail even if its initializing function does not throw, in
which case the behavior is ...undefined... / ..unspecified... /
...calling std::terminate() after unspecified time...

Which variant do you prefer? And how would this help the programmer who
needs to write programs in C++? Should they just avoid creating any
static variables at all, in order to not trigger this failure scenario?

Bonita Montero

unread,

Sep 10, 2023, 12:16:13 PM9/10/23

to

Am 10.09.2023 um 18:08 schrieb Paavo Helde:

> Which variant do you prefer? And how would this help the programmer who
> needs to write programs in C++? Should they just avoid creating any
> static variables at all, in order to not trigger this failure scenario?

It should be handled like synchronization on a std::mutex,
which can fail with a system_error. If this would happen
the object would remain uninitialized.

Richard Damon

unread,

Sep 10, 2023, 12:32:44 PM9/10/23

to

On 9/10/23 8:46 AM, Bonita Montero wrote:
> Am 10.09.2023 um 15:51 schrieb Richard Damon:
>
>> You don't seem to understand that sincd it CAN be done in a way that
>> always works, ...
>
> A standard shouldn't mandate that this never fails.

You WANT a standard that says even if your program is written totally
correct, the implementation can decide that it will just make it run wrong?

>
>> Static Initialization has DEFINED behavior, and that behavior must be
>> honored, and the implementation do what is need to make that happen.
>
> Locking-errors while static initialitation are unspecified; whether
> they might occur or not on a specific operating system doesn't matter.
>

Since it has been shown that with a simpler locking method, you can
ALWAYS succeed (or die with a resource error before starting user code),
not meeting the requirement is just an error in the implementation.

PERIOD.

ANY operating system that can correctly run multi-threaded code can
handle that. If it can't, it can't actually handle multi-threaded code.

If an "optimization" can cause failures that can't otherwise occur, and
are not allowed by the standard, then that "optimization" isn't allowed
(except by switching to a non-conforming mode).

You don't seem to understand what a "Standard" is.

Chris M. Thomasson

unread,

Sep 10, 2023, 2:06:15 PM9/10/23

to

What do you mean by that comment?

Chris M. Thomasson

unread,

Sep 10, 2023, 2:08:58 PM9/10/23

to

On 9/10/2023 1:58 AM, Bonita Montero wrote:

Iirc, some systems have a large table of mutexes or semaphores created
up front in the kernel itself. They use them for contended futexes to
wait on. A futex is address based, and when they have to block that
index the address into said table. Have you ever used them before? Also,
think of Windows keyed events.

Chris M. Thomasson

unread,

Sep 10, 2023, 2:10:11 PM9/10/23

to

Strange answer. Humm...

Chris M. Thomasson

unread,

Sep 10, 2023, 2:12:19 PM9/10/23

to

On 9/10/2023 1:59 AM, Bonita Montero wrote:

So, are you suggesting that the C++ standard should be tightly
integrated with a given kernel impl? Humm... Strange. POSIX aside for a
moment...

Chris M. Thomasson

unread,

Sep 10, 2023, 2:13:07 PM9/10/23

to

What error?

Chris M. Thomasson

unread,

Sep 10, 2023, 2:14:22 PM9/10/23

to

then what?

Chris M. Thomasson

unread,

Sep 10, 2023, 2:14:31 PM9/10/23

to

Ditto.

Paavo Helde

unread,

Sep 10, 2023, 2:39:15 PM9/10/23

to

Meaning that a program can never access any static variable, because
accessing of an uninitialized object would be UB. Great job!

Pavel

unread,

Sep 10, 2023, 5:59:17 PM9/10/23

to

Thanks, this does seem to be a simple user-friendly API that is most
appropriate to implement static initialization as in C++.

Pavel

unread,

Sep 10, 2023, 6:05:57 PM9/10/23

to

Bonita Montero wrote:
> Am 09.09.2023 um 19:42 schrieb Pavel:
>
>> If Microsoft implemented standard wrongly, it's Microsoft's failure,
>> not the standard's.
>
> The standard doesn't say anything to the issue I've shown.
>

You did not show any issue, you just guessed what the implementation
could be and assumed there could be an issue.

After looking at InitOnceExecuteOnce API pointed out by Chris I think it
is very likely that your guess on implementation was wrong.

Pavel

unread,

Sep 10, 2023, 6:51:07 PM9/10/23

to

Richard Damon wrote:
> On 9/10/23 1:57 AM, Bonita Montero wrote:
>> Am 09.09.2023 um 22:37 schrieb Richard Damon:
>>
>>> The Standard defines the final results that must happen
>>> (or possibilities that can happen in some cases).
>>
>> The standards also should consider side effects as in this example
>> when a static initialization would fail because of a synchronization
>> error.
>>
> You don't seem to understand that sincd it CAN be done in a way that
> always works, any method that doesn't always work is just WRONG and
> non-conforming.
>
> I guess you think that 1 + 1 must be allowed to be 1 in case of
> "synchronization errors".
>
> Static Initialization has DEFINED behavior, and that behavior must be
> honored, and the implementation do what is need to make that happen.
>
> Methods have been shown to do it (perhaps less performant than this is
> some conditions),

Actually, I think there should be always a better-performing method to
implement C++ static initialization than using windows or pthread mutex
-- e.g. using a specialized control like those used for call_once
implementation -- at least on UNIX or Windows.

This is because IMHO mutex API is a textbook case of feature bloat on
both platforms.

E.g. in standard UNIX (former POSIX), you can have recursive / robust /
shared / errorcheck mutex (not all mutually exclusive) and who knows
what else. pthread_mutex_lock shall be able to deal with either
combination and the feature-check code makes (at least the current)
implementation quite messy, with multiple dispatches on the same condition.

On Windows, the number of options is less; but, to compensate for that
:-), a mutex is always recursive plus wait functions have to know how to
deal with inter-process mutexes because any handle passed to them can
turn out to be of an inter-process mutex (or even not a mutex at all).
Therefore, I do not think their use for static initialization (that does
not require either recursive or inter-process lock) even could be optimal.

Bonita Montero

unread,

Sep 10, 2023, 10:03:05 PM9/10/23

to

Am 10.09.2023 um 20:38 schrieb Paavo Helde:

> Meaning that a program can never access any static variable, because
> accessing of an uninitialized object would be UB. Great job!

Exactly the same thing can happen when locking a mutex,
which can also throw system_error. Do you think it is
unnecessary that a mutex can throw this exception ?

Richard Damon

unread,

Sep 10, 2023, 11:34:15 PM9/10/23

to

Why does the Mutex throw "system_error" if it has already been
constructed and not misused?

Chris M. Thomasson

unread,

Sep 11, 2023, 12:42:39 AM9/11/23

to

I might be missing your point here. Humm... Even though POSIX has all of
these different types of mutexs, we can use them by organization. So,
wrt windows, an intra-process mutex can be a CRITICAL_SECTION. Or an
inter-process as a mutex HANDLE.

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-createmutexexa

Wrt the inter-process route we may need to deal with WAIT_ABANDONED, or
EOWNERDEAD wrt POSIX. So we create different types:

struct intra_process_mutex
{
CRITICAL_SECTION m_os_lock;
};

struct inter_process_mutex
{
HANDLE m_os_lock;
};

struct inter_process_mutex_robust
{
HANDLE m_os_lock;
};

These simple types can be rather easy to reason about. Wrt POSIX, the
different types would be:

struct intra_process_mutex
{
pthread_mutex_t m_os_lock;
};

struct inter_process_mutex
{
pthread_mutex_t m_os_lock;
};

struct inter_process_mutex_robust
{
pthread_mutex_t m_os_lock;
};

The point is trying to organize things a bit here.

Sound somewhat decent?

Chris M. Thomasson

unread,

Sep 11, 2023, 12:51:02 AM9/11/23

to

Read about what can happen to the windows "futex" during periods of hard
stress on the OS, low memory, most likely very low non-paged memory,
"almost" getting to a blue screen type of shit! damn, anyway:
___________________________
Note WaitOnAddress is guaranteed to return when the address is
signaled, but it is also allowed to return for other reasons. For this
reason, after WaitOnAddress returns the caller should compare the new
value with the original undesired value to confirm that the value has
actually changed. For example, the following circumstances can result in
waking the thread early:
Low memory conditions
A previous wake on the same address was abandoned
Executing code on a checked build of the operating system
___________________________

Take special note "Low memory conditions"...

I bet their call once logic is using their "futex" logic. Humm... Anyone
have any insight into the windows possible internal impl? I want to read
sysinternals magazine again... I used to have it delivered to my house. ;^)

https://learn.microsoft.com/en-us/sysinternals/

;^)

Bonita Montero

unread,

Sep 11, 2023, 9:44:30 AM9/11/23

to

Am 10.09.2023 um 20:38 schrieb Paavo Helde:

> Meaning that a program can never access any static variable, because
> accessing of an uninitialized object would be UB. Great job!

That's the same as when the static object's constructor throws,
i.e. the object will get another try if you get again to the
initialization. And if an exception is thrown the context of
the definition is left to the next exception handler that you
won't operate on a non initialized object.
I don't see where's the problem here.

Bonita Montero

unread,

Sep 11, 2023, 9:49:28 AM9/11/23

to

Am 11.09.2023 um 15:44 schrieb Bonita Montero:

> That's the same as when the static object's constructor throws,
> i.e. the object will get another try if you get again to the
> initialization. And if an exception is thrown the context of
> the definition is left to the next exception handler that you
> won't operate on a non initialized object.
> I don't see where's the problem here.

Look at this code:

#include <iostream>

using namespace std;

int main()
{
struct WillReInit
{
WillReInit()
{
throw "hello world";
}
};
for( ; ; )
try
{
static WillReInit wi;
}
catch( char const *str )
{
cout << str << endl;
}
}

Things would be similar if the synchronization would throw, i.e. wri
would be initialized over and over. So there won't be any accessible
uninitialized object.

Bonita Montero

unread,

Sep 11, 2023, 10:29:40 AM9/11/23

to

Am 10.09.2023 um 20:12 schrieb Chris M. Thomasson:

> So, are you suggesting that the C++ standard should be tightly
> integrated with a given kernel impl? Humm... Strange. POSIX
> aside for a moment...

Not more than it is integrated to the operating system through
throwing system_error, therby having an operating dependent
error code which is translated by the operating system (inter-
nally through strerror() or FormatMessage()).
So it would be most appropriate to throw system_error with an
operating system dependent error code and system_category as
the translator - just like std::mutex does.

Paavo Helde

unread,

Sep 11, 2023, 11:20:53 AM9/11/23

to

Good, finally some code! Alas, this is not very relevant as it demoes a
local static object, my post which you replied to started with "Dynamic
initialization of a non-local variable with static storage duration..."

Even for local objects, what's the point? If the issue could be resolved
by simply retrying the operation, the C++ implementation could easily do
the same internally and avoid the issue. If the issue cannot be resolved
by a simple retry, then what code should a programmer write, to get the
static initialized? Especially considering that the issue is
non-reproducible and thus cannot be played out or tested. Writing
untestable code to work around non-existing errors is not a healthy way
to spend ones time IMO.

Bonita Montero

unread,

Sep 11, 2023, 12:34:43 PM9/11/23

to

Am 11.09.2023 um 17:20 schrieb Paavo Helde:

> Good, finally some code! Alas, this is not very relevant as it demoes a
> local static object, my post which you replied to started with "Dynamic
> initialization of a non-local variable with static storage duration..."

Then you misunderstood me and talked nonsense. Of course I didn't talk
about global objects, because exceptions make no sense because they
can't be caught in any functional context. It goes without saying.

> Even for local objects, what's the point?

> If the issue could be resolved by simply retrying the operation, ...

If you think spinning would make sense, then you could handle most
system_error errors like that. But that's not how it's designed.

> Especially considering that the issue is non-reproducible and thus
> cannot be played out or tested.

A failing mutex isn't reproducible either; nevertheless there are
system_error exceptions which could be thrown while synchonizing.

Chris M. Thomasson

unread,

Sep 11, 2023, 3:41:50 PM9/11/23

to

What if somebody writing C++ wants to be abstract and try to avoid a
particular systems idiosyncrasies?

if std::mutex throws, then shit hit the fan. Not good.

Bonita Montero

unread,

Sep 11, 2023, 10:28:58 PM9/11/23

to

Am 11.09.2023 um 21:41 schrieb Chris M. Thomasson:

> What if somebody writing C++ wants to be abstract
> and try to avoid a particular systems idiosyncrasies?

Then he would have to ignore specific error codes and
just take the what() in the system_error object.

Pavel

unread,

Sep 11, 2023, 11:59:04 PM9/11/23

to

The point is that, even assuming that all types of mutices are very well
organized in the implementation, some dispatch of control still has to
occur when one calls "pthread_mutex_lock" because it is a single entry
point for locking a mutex with any combination of features. Given that
locking of a non-contended mutex is supposed to be (and in fact is, on
current Linux) very cheap (but not necessarily same-cheap if the mutex
is recursive :-) ), the relative cost of this dispatch is not guaranteed
to be negligible (even if that dispatch is implemented via a single
extra level of indirection).

Further, I am unsure if these features actually *can* be very well
organized because not all features of mutices are mutually exclusive (no
pun intended) so a single extra level of indirection might not be enugh.
As an example, mutexattr type can be one of

PTHREAD_MUTEX_NORMAL PTHREAD_MUTEX_ERRORCHECK PTHREAD_MUTEX_RECURSIVE
PTHREAD_MUTEX_DEFAULT

but each of these can be additionally either robust or non-robust (and,
pthread_mutex_lock is supposed to behave differently for some
combinations of type and robustness). Unless of course we dispatch to a
cartesian product of options (which is not impossible, but see below).

And, lastly, the above are all hypotheticals and the actual code (see
e.g. https://codebrowser.dev/glibc/glibc/nptl/pthread_mutex_lock.c.html
) is slightly messier than the best organized implementation possible.
From practical perspective, I would expect that someone implementing a
thread-safe static initialization in a C++ compiler has their mouth full
with the tasks other than improving the existing pthread_mutex (or
pthread_once) implementation as a distraction; at most (I would
speculate), they would take a quick (~5 min) look into the API and the
available code of the existing established synchronization primitives
and decide which one is likely to better fit their spec and be faster
as-is to use to implement that pesky initialization spec. In this
scenario, I would guess that pthread_once_t would win the beauty contest
and get selected.

And, sure enough, (likely due to no feature bloat) the *existing*
pthread_once implementation code is much cleaner (see
https://codebrowser.dev/glibc/glibc/nptl/pthread_once.c.html) and at
first glance should be measurably faster.

I guess, instead of guessing, (no pun untended again), we could just
disassemble some static initialization and see how it's done in, say,
gcc. Maybe later...

Chris M. Thomasson

unread,

Sep 12, 2023, 1:40:07 AM9/12/23

to

what() = shit hit the fan...

Bonita Montero

unread,

Sep 12, 2023, 5:45:12 AM9/12/23

to

You seem to be psychotic, but I'll describe it anyway. If you have
a system_error and it is created with a system_category the error
code is converted through streror() (or the thread -safe variant)
on Posix and with FormatMessage() in Windows.
With MSVC there was a bug which I filed in forum for MSVC that
system_category() does not honor the current threads's locale
when translating the error code with FormatMessage(). They
said they'll fix that and actually it was fixed with the next
update.

Chris M. Thomasson

unread,

Sep 12, 2023, 2:33:38 PM9/12/23

to

On 9/12/2023 2:44 AM, Bonita Montero wrote:
> Am 12.09.2023 um 07:39 schrieb Chris M. Thomasson:
>
>> On 9/11/2023 7:28 PM, Bonita Montero wrote:
>
>>> Then he would have to ignore specific error codes and
>>> just take the what() in the system_error object.
>
>> what() = shit hit the fan...
>
> You seem to be psychotic, but I'll describe it anyway.

If a static initialization fails, things have gone horribly wrong.
What's your big beef with static initialization?

Bonita Montero

unread,

Sep 12, 2023, 2:43:32 PM9/12/23

to

Am 12.09.2023 um 20:33 schrieb Chris M. Thomasson:
> On 9/12/2023 2:44 AM, Bonita Montero wrote:
>> Am 12.09.2023 um 07:39 schrieb Chris M. Thomasson:
>>
>>> On 9/11/2023 7:28 PM, Bonita Montero wrote:
>>
>>>> Then he would have to ignore specific error codes and
>>>> just take the what() in the system_error object.
>>
>>> what() = shit hit the fan...
>>
>> You seem to be psychotic, but I'll describe it anyway.
>
> If a static initialization fails, things have gone horribly wrong.
> What's your big beef with static initialization?

At this point we weren't talking about static initialization but about
the properties of system_error. But static initialization is less likely
to go wrong because of a synchronization error but more likely because
the constructor of the static object throws. As I've shown with my code
in this thread the object is left uninitialized then and if the code
reaches the initialization again it gets another chance.

Chris M. Thomasson

unread,

Sep 12, 2023, 6:55:45 PM9/12/23

to

You mean where you wrote:

"The mutex constructor
is noexcept, so mutexes always must be created on their first use."

Well... Why do you state that mutexes must always be created on their
first use? Hummm. Just because you say so, does not make it true at all.

Chris M. Thomasson

unread,

Sep 12, 2023, 6:56:45 PM9/12/23

to

The subject is about static initialization, and the sync that the impl
uses to get that does does not have to follow your rules at all.

Bonita Montero

unread,

Sep 13, 2023, 5:53:32 AM9/13/23

to

Am 13.09.2023 um 00:55 schrieb Chris M. Thomasson:

> Well... Why do you state that mutexes must always be created on their
> first use? Hummm. Just because you say so, does not make it true at all.

Of course, because the mutex constructor is noexcept. The interesting
thing is that the situation can arise where several threads simultan-
eously try to create the binary semaphore associated with the mutex
and only one ultimately assigns this semaphore to the mutex. And if
the other threads notice this afterwards, they have to return the
semaphore to the kernel. Since the mutex constructor is noexcept
there is no other way for an implementation to handle this.

Chris M. Thomasson

unread,

Sep 13, 2023, 7:24:05 PM9/13/23

to

How do you think futexes work when they are completely addressed based?
Nothing to create. I guess you are not that familiar with them. So be it.

Bonita Montero

unread,

Sep 14, 2023, 5:11:01 AM9/14/23

to

Am 14.09.2023 um 01:23 schrieb Chris M. Thomasson:

> How do you think futexes work when they are completely addressed based?
> Nothing to create. I guess you are not that familiar with them. So be it.

Why should a futex be uses on static initializaton, where the
mutex is used only once by the thread which initializes the
object and there's not much likehood of contention and even
if, this contention happens only once per competing thread ?