It claims that function statics are by no means initialized safely. A
bit more digging turned up a specific patch that was applied to GCC to
support thread-safe initialization: http://gcc.gnu.org/ml/gcc-patches/2004-08/msg01598.html
Upon reaching this platform dependent cross-roads I decided my own
testing may be the way to figure out if the platforms we need to
support use thread-safe static initializations. I came up with the
following template to create a bunch of function statics:
template<unsigned int i>
unsigned int TriangleNumberTmplt()
{
static unsigned int x = i + TriangleNumberTmplt<i-1>();
return x;
}
template<>
unsigned int TriangleNumberTmplt<1>()
{
static unsigned int x = 1;
return x;
}
Which should work with the following assertion:
TriangleNumberTmplt<X>() == (X*X - X)/2
The full test is here: http://rafb.net/p/ktRVaY44.html
I can convert it to a more portable threads implementation if someone
pushes me too.
The test passes on the following systems:
CentOS 3 - x86 - g++3.2
CentOS 4 - x64 - g++3.4
Ubuntu 7.10 - x64 - g++4.1
WinXP - x86 - msvc 7
WinXP - x86 - msvc 8
Is my test good enough to catch this race condition? Do these systems
actually support thread-safe function static initialization?
Thanks,
Brian
> Is my test good enough to catch this race condition?
Not unless you test on every possible combination of hardware,
compiler versions, and threading libraries (both in development and
deployment).
> Do these systems
> actually support thread-safe function static initialization?
It's illegal under POSIX. But it's also inherently impossible to
support because the semantics are not defined.
For example, consider a function that instantiates two statics. Some
would argue it's only thread-safe if the two atomics are instantiated
atomically (and it's easy to create code that breaks if this is not
the case). But the C++ standard permits that to deadlock (and it's
easy to create code that breaks if this is the case).
If the platform internally uses a lock, it must deal with all the
consequences of using such a lock. These includes priority inversion,
lock order issues (suppose the constructor for the static itself
constructs statics), and deadlock.
So how can the compiler know which form of thread-safe static
initialization is the one you are counting on?
I argued against the GCC support for "thread safe statics" and showed
several cases where the GCC code does the wrong thing and breaks code
that conforms to both the C++ and POSIX pthreads standards.
I consider GCC's use of allegedly thread safe static when POSIX
pthreads support is selected to be at best a pessimization and at
worst a bug.
DS
> > Is my test good enough to catch this race condition?
> Not unless you test on every possible combination of hardware,
> compiler versions, and threading libraries (both in development and
> deployment).
> Do these systems
> actually support thread-safe function static initialization?
> It's illegal under POSIX. But it's also inherently impossible to
> support because the semantics are not defined.
[...]
> If the platform internally uses a lock, it must deal with all the
> consequences of using such a lock. These includes priority inversion,
> lock order issues (suppose the constructor for the static itself
> constructs statics), and deadlock.
Not if the lock is recursive.
[...]
One more thing, the lock needs to be singular. In other words, there needs
to be a single lock for all singletons to use. If this is not the case, then
locking order violations will occur.
> [...]
Here is example implementation:
http://groups.google.com/group/comp.programming.threads/msg/c5c68a037d98452e
What do you think David? AFAICT, its immune from deadlocking even in the
presence of nested singletons in multiple ctors called from multiple
threads.
>> [...]
> I talked to a GCC guy a while back about function statics. He assured
> me they were initialized in a thread-safe manner.
> The full test is here: http://rafb.net/p/ktRVaY44.html
>
> Is my test good enough to catch this race condition? Do these systems
> actually support thread-safe function static initialization?
Your test uses integers. A better test is to use objects with a
constructor: you can then use a lock in the constructor (correctly
initialized elsewhere, e.g. with PTHREAD_MUTEX_INITIALIZER at global
scope) to record the constructor calls in a thread-safe manner. You
will find that many compilers will currently call the constructor
twice (or more) under heavy load, or proceed past the call whilst the
constructor is still in progress.
Note that this all changes in C++0x: initialization of local statics
will be required to be thread-safe, but with undefined behaviour if
the initialization of a local static reenters the same function (and
thus reaches the declaration of the local static being initialized)
recursively.
Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
> One more thing, the lock needs to be singular. In other words, there needs
> to be a single lock for all singletons to use. If this is not the case, then
> locking order violations will occur.
That's another point. It both needs to be singular and cannot be
singular. You explained why it needs to be singular. But here's why it
cannot be singular:
Thread A is constructing a singleton, blocks on an application lock.
Thread B holds that application lock, goes to construct a singleton.
Oops.
DS
> Here is example implementation:
>
> http://groups.google.com/group/comp.programming.threads/msg/c5c68a037...
>
> What do you think David? AFAICT, its immune from deadlocking even in the
> presence of nested singletons in multiple ctors called from multiple
> threads.
The problem is not the implementation, the problem is the semantics.
You cannot get the implementation right until the semantics are agreed
upon. Personally, I don't think there can be sensible semantics for
this and I believe that POSIX got it right by requiring the
application to place locks where and when it needs them.
DS
> Oops.
I don't really see a way around that! Oh well, shi% happens.
Ouch.
The mutex needs to be singular for a specific type, however, in this case it
can't be singular for a specific type. It seems like this deadlock could
happen in any initialize once routine; even POSIX pthread_once:
pthread_once_t once = PTHREAD_ONCE_INIT;
pthread_mutex_t app_mutex = PTHREAD_MUTEX_INITIALIZER;
void init(void) {
I1: pthread_mutex_lock(&app_mutex);
I2: pthread_mutex_unlock(&app_mutex);
}
void* thread_a(void* s) {
A1: pthread_once(&once, init);
return 0;
}
void* thread_b(void* s) {
B1: pthread_mutex_lock(&app_mutex);
B2: pthread_once(&once, init);
B3: pthread_mutex_unlock(&app_mutex);
return 0;
}
if the execution sequence happens to go:
B1 - thread B has app_mutex
A1
A1::I1 - thread A blocks in init routine on app_mutex
B2 - waits for thread A to finish init routine == DEADLOCK!
Bang!
Here is a program that illustrates the problem nicely:
__________________________________________________________________
#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>
static pthread_once_t g_once = PTHREAD_ONCE_INIT;
static pthread_mutex_t g_mtx = PTHREAD_MUTEX_INITIALIZER;
static sem_t g_in_init;
void init(void) {
sem_post(&g_in_init);
puts("Thread waiting in init routine for `g_mtx'...");
pthread_mutex_lock(&g_mtx);
pthread_mutex_unlock(&g_mtx);
}
void* thread(void* state) {
pthread_once(&g_once, init);
return 0;
}
int main() {
pthread_t tid;
sem_init(&g_in_init, 0, 0);
pthread_mutex_lock(&g_mtx);
puts("Main thread has `g_mtx'...");
pthread_create(&tid, NULL, thread, NULL);
sem_wait(&g_in_init);
puts("Main thread waiting for init routine to finish...");
pthread_once(&g_once, init);
pthread_mutex_unlock(&g_mtx);
pthread_join(tid, NULL);
sem_destroy(&g_in_init);
return 0;
}
__________________________________________________________________
OUCH! There is no way around this. Your 100% correct that the semantics need
to be worked out. Perhaps there should be a guideline that says you cannot
take a mutex in the `init()' routine. Or perhaps you cannot hold a mutex
during a call to `pthread_once()'. This would translate to C++ in that you
cannot take a mutex in the ctor of an object that may be used as a
singleton. Or you cannot hold a mutex before you make a call into the
`instance()' function of a singleton.
Yes, this will deadlock. It's just a problem of waiting for something
whilst holding a lock, if the thing you are waiting for might need to
acquire the lock you hold.
The same applies with mutexes in general --- don't acquire a second
mutex whilst holding one in case another thread tries to acquire them
in the opposite order. It also applies to joining threads --- don't
join a thread whilst holding a mutex if that thread could try and
acquire the mutex.
I think I'd go for not acquiring locks in the once routine, as you
shouldn't need them too often, but IMHO the general guideline is
enough.
> OUCH! There is no way around this. Your 100% correct that the semantics need
> to be worked out. Perhaps there should be a guideline that says you cannot
> take a mutex in the `init()' routine. Or perhaps you cannot hold a mutex
> during a call to `pthread_once()'. This would translate to C++ in that you
> cannot take a mutex in the ctor of an object that may be used as a
> singleton. Or you cannot hold a mutex before you make a call into the
> `instance()' function of a singleton.
>
> What do you think David?
I think the attempt to graft automatic locking onto C++ is disastrous.
I argued against it. I still consider this a bug in GCC. The
automatic locks will break code that complies with both the C++ and
POSIX standards.
You are right that these same issues could be created by a programmer
error using pthread_once or even regular mutexes. We all deal with
problems like lock ordering issues, deadlock, and the like.
But we take these things into account when we design our locking
hierarchy and place our lock acquisition and release codes. We
carefully place each lock or unlock operation to ensure that all the
operations between them are safe to perform while holding that
particular lock. We choose whether to use two different locks or the
same lock for things, we choose the order in which we acquire those
locks, we choose what order to release them in and what code to place
between those two releases.
The attempt to make locking automatic works in some cases and doesn't
work in others. It largely works with standard I/O streams. It largely
works with malloc and free. Those automatic locks were added as part
of a sensible standardization process.
However, what GCC does simply doesn't work. It breaks code that
complies with C++ and POSIX. It was done as part of a standardization
process, but one that was largely not focused on thread-safety issues
(the x86 C++ ABI standardization, if memory serves me).
This is a bug in GCC and in the x86 C++ ABI. I've reported it. The GCC
developers stand by it.
Here's one example:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20099
DS
> On Jul 30, 12:20 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
>> OUCH! There is no way around this. Your 100% correct that the semantics need
>> to be worked out. Perhaps there should be a guideline that says you cannot
>> take a mutex in the `init()' routine. Or perhaps you cannot hold a mutex
>> during a call to `pthread_once()'. This would translate to C++ in that you
>> cannot take a mutex in the ctor of an object that may be used as a
>> singleton. Or you cannot hold a mutex before you make a call into the
>> `instance()' function of a singleton.
>>
>> What do you think David?
>
> I think the attempt to graft automatic locking onto C++ is disastrous.
> I argued against it. I still consider this a bug in GCC. The
> automatic locks will break code that complies with both the C++ and
> POSIX standards.
I think you are seriously overstating your case. Given your link
below, I presume you are referring to the thread-safe initialization
of local statics that is required by the C++0x standard?
It is up to the compiler to ensure this does not cause problems on its
own. Obviously, if you hold a lock when you reach the declaration of
such a variable, and that lock is required by the constructor of the
variable then you have deadlock, but this is a problem even if you are
the one and only thread to call this function.
Initialization of separate local statics need not and should not by
synchronized, so it is perfectly acceptable for thread A to take a
lock and initialize a local static x whilst thread B initializes a
local static y that needs to acquire that same lock: the
initialization of x will complete (whether or not it started before
the initialization of y), thread A will release its lock, then the
constructor of y can acquire the lock and the initialization of y can
complete.
If that doesn't work with GCC, it's a bug in their implementation, not
a bug in the feature itself.
> I think you are seriously overstating your case. Given your link
> below, I presume you are referring to the thread-safe initialization
> of local statics that is required by the C++0x standard?
Not only. Also the thread-safe initialization of local statics that
GCC implemented. However, I think the thread-safe initialization of
local statics in the C++0x standard is a bad idea because it conflicts
badly with POSIX. If the C++0x standard were writing on a clean slate,
you could argue that it made a reasonable choice. But the slate is not
clean, there's already a threading standard.
> It is up to the compiler to ensure this does not cause problems on its
> own. Obviously, if you hold a lock when you reach the declaration of
> such a variable, and that lock is required by the constructor of the
> variable then you have deadlock, but this is a problem even if you are
> the one and only thread to call this function.
The problem with the compiler magically adding synchronization code is
that synchronization code is very specific to exactly what you're
doing. If you know that one and only one thread calls the code, then
the extra synchronization is pure wasted overhead. If you know you
need the synchronization, adding it yourself is likewise trivial.
POSIX already requires such synchronization. So POSIX-compliant code
will already have it. This makes the extra synchronization a pure
overhead for POSIX code.
> Initialization of separate local statics need not and should not by
> synchronized, so it is perfectly acceptable for thread A to take a
> lock and initialize a local static x whilst thread B initializes a
> local static y that needs to acquire that same lock: the
> initialization of x will complete (whether or not it started before
> the initialization of y), thread A will release its lock, then the
> constructor of y can acquire the lock and the initialization of y can
> complete.
Except that you can then deadlock if thread A has user-level mutex 1
and needs implementation magic mutex alpha while thread B has
implementation magic mutex alpha and needs user-level mutex 1. Code
has to know what locks it holds in order to be written sanely, and it
needs precise control over where that lock is acquired and where it is
released. When it holds multiple locks, it needs precise control over
how they're acquired and released and what goes in-between releases of
multiple locks.
> If that doesn't work with GCC, it's a bug in their implementation, not
> a bug in the feature itself.
The primary bug in the feature itself is that it's a pure
pessimization for POSIX code.
DS
I could not agree more. Long, long, long time ago I had a "quarrel" here
about the idiocy of burying synchronization code down and away from the
control of the programmers.
In MT, synchronization is just a confusing word for "coordination", and
"coordination" means "communication". As for living beings,
communication is part of the highest computational functions, and is to
be performed after and above the bulk of inner computational tasks.
The seek for "transparent thread safety" can be fascinating, and may
allow for elegant theoretical solution of parallel problems. But it
actually hides a complexity that needs not to be hidden, as "parallel
problems" that can be more elegantly solved with opaque synchronization
primitives are rarely "parallel programs". What I am meaning here is
that parallel programming goes a bit beyond solving theoretical
problems, and is much more related with having complex programs made of
interacting "mutually stateless" subprograms to perform real-world tasks.
Worse; even in the case of the "more elegant theoretical solution" of
basic parallel problems, the actual code needed to enforce those
closed-box concurrency solutions is as complex as the unmanaged ones or
worse (the pure pessimization pointed out by David).
When David says "But we take these things into account when we design
our locking hierarchy and place our lock acquisition and release codes",
etc., he's not just saying that threading code must be carefully
hand-crafted; he's saying that threading code synchronization is to be
exposed and lifted in the upper logical layers of the program agents,
and used for explicit coordination.
IMHO, this try at a language-driven thread-safety in C++ was tried both
by overestimating a set of theoretical studies advocating for
abstraction of parallelism on one side, and by the apparent difficulty
to think programs as interacting agents each one state-blind with
respect to the others on the other; difficulty that leaded those
developers into thinking that "the other way" would have been better.
But "simple to think out" is not always "better".
BTW, at the time where this issued gained moment, I posted a bug to GCC
too, and a complain to ABI about this point, with absolutely no reply.
GN.
>
> I think the attempt to graft automatic locking onto C++ is disastrous.
> I argued against it.
And let me add that, at the time, while the arguments of David were
intrinsically correct and untouched by any critique, the arguments of
ABI was "Well, there no interested on this point, let's do as HP
compiler does" (literally); on this "closure", GCC went like "Oh well, I
feel safer this way so shut up". And other guys arguing against David
position were like "Oh, well, if GCC guys said it..."
How scientific.
Giancarlo.
> >> If the platform internally uses a lock, it must deal with all the
> >> consequences of using such a lock. These includes priority inversion,
> >> lock order issues (suppose the constructor for the static itself
> >> constructs statics), and deadlock.
>
> > Not if the lock is recursive.
>
> One more thing, the lock needs to be singular. In other words, there needs
> to be a single lock for all singletons to use. If this is not the case, then
> locking order violations will occur.
Can you provide example when deadlock will occur if lock is not
singular?
Dmitriy V'jukov
> In MT, synchronization is just a confusing word for "coordination", and
> "coordination" means "communication". As for living beings,
> communication is part of the highest computational functions, and is to
> be performed after and above the bulk of inner computational tasks.
Do you want to say that ALL locking is highest computational function?
What about, for example, lock in memory allocator? Is it needed to be
performed above "inner computational tasks"?
Dmitriy V'jukov
This would have been much better if the first post was stated more
clearly ... however in explaining the problem the casual reader is led
through an unclear problem statement, some template specialization
(does this highlight the problem or obfuscate it?), incorrect math and
a bad URL link.
It goes a long way when you get an accurate problem statement with a
test case, just like a good bug report ... but I won't grumble too
much the problem is real one and enough for me to be curious. This
thread has been a good read ...
The problem: Static local variables initialization is not thread safe.
As stated by the original poster static local variable initialization
is not considered thread safe. GCC has some hooks to attempt to
resolve this, but this is considered an unclean implementation by some
(as already explained).
----
A slight aside is that functions with static variables are considered
non-reentrant. An article ( http://www.ibm.com/developerworks/linux/library/l-reent.html
) from IBM developer works states that:
A reentrant function:
* Does not hold static data over successive calls
* Does not return a pointer to static data; all data is provided
by the caller of the function
* Uses local data or ensures protection of global data by making a
local copy of it
* Must not call any non-reentrant functions
----
The original poster presents some test code:
// A template class that generates the sum X from a sequence of
// n natural numbers initially starting at 1.
// This is called a triangular number because when arranged
// can form an equilateral triangle
template<unsigned int i>
unsigned int TriangleNumberTmplt()
{
static unsigned int x = i + TriangleNumberTmplt<i-1>();
return x;
}
// The above template function is specialized below for
// the initial condition of 1. This stops all further recursion
template<>
unsigned int TriangleNumberTmplt<1>()
{
static unsigned int x = 1;
return x;
}
A triangular number holds the property that:
triNum = ((n^2)+n)/2
A triangular number has the property that numbers can form an
equilateral triangle ... just like you do with the balls on pool
table.
So:
5+4+3+2+1 = 15 = ((5^2)+5)/2 = (25+5)/2 = 30/2 = 15
6+5+4+3+2+1 = 21 = ((6^2)+6)/2 = (36+6)/2 = 42/2 = 21
Now those who consider the book "Beautiful Code" by Andy Oram, Greg
Wilson as their Bible would ask ~ "Is this code an elegant way to test
this problem"?
I would like to see the test code, and even more interestingly a case
where the test fails. I imagine if you were doing this on an RTOS you
could run into a situation where it breaks ...
A contrived example if all singletons could map to either internal mutex a
or b:
mutex singleton_locks[2];
void lock(singleton* _this) {
mutex_lock(&singleton_locks[hashptr(_this)];
}
void unlock(singleton* _this) {
mutex_unlock(&singleton_locks[hashptr(_this)];
}
1 - thread a locks singleton internal mutex a
2 - thread b locks singleton internal mutex b
3 - thread a constructs another singleton which uses internal mutex b
4 - thread b constructs another singleton which uses internal mutex a
or if all singletons used a mutex per-type:
1 - thread a locks singleton of type a
2 - thread b locks singleton of type b
3 - thread a tries to lock singleton of type b
4 - thread b tries to lock singleton of type a
Well, I don't think the last example is valid because is suggested recursive
singleton between type a and b. So, forget the last one.
Humm...
Or what about a thread-safe queue... IMHO, sometimes, its quite convenient
to know that queue operations are automatically synchronized.
> "Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
> news:cb9f4fc2-7816-4d0b...@c58g2000hsc.googlegroups.com...
> On 30 июл, 07:48, "Chris Thomasson" <x...@xxx.xxx> wrote:
>
>> > >> If the platform internally uses a lock, it must deal with all the
>> > >> consequences of using such a lock. These includes priority inversion,
>> > >> lock order issues (suppose the constructor for the static itself
>> > >> constructs statics), and deadlock.
>> >
>> > > Not if the lock is recursive.
>> >
>> > One more thing, the lock needs to be singular. In other words,
>> there > needs
>> > to be a single lock for all singletons to use. If this is not the
>> case, > then
>> > locking order violations will occur.
>
>> Can you provide example when deadlock will occur if lock is not
>> singular?
>
> A contrived example if all singletons could map to either internal
> mutex a or b:
That says "don't share a mutex between singletons" to me. You could,
for example, have a mutex per singleton. This wouldn't suffer from
this problem unless there was interdependency between the singletons,
so the initialization of singleton A required singleton B, but the
initialization of singleton B required singleton A. This is just plain
circular dependency, and is broken anyway.
Basically, yes.
> You could,
> for example, have a mutex per singleton. This wouldn't suffer from
> this problem unless there was interdependency between the singletons,
> so the initialization of singleton A required singleton B, but the
> initialization of singleton B required singleton A. This is just plain
> circular dependency, and is broken anyway.
Right. My last example showed that. It would be endlessly recursive as well,
analogous to:
struct foo {
foo() {
static foo f;
}
};
Well, I was _wrong_ when I said the mutex needs to be singular!
;^(...
A mutex per-type would work fine.
> In MT, synchronization is just a confusing word for "coordination", and
> "coordination" means "communication". As for living beings,
> communication is part of the highest computational functions, and is to
> be performed after and above the bulk of inner computational tasks.
Sometimes it is important to express the synchronization at the user
level. Sometimes it is not. I think that correctly initializing local
statics (even in the presence of threads) is one of those areas where
the user shouldn't have to worry. Getting it right with plain POSIX
mutexes is also not as simple as it might be, and can actually be a
pessimization over what the compiler can do.
> IMHO, this try at a language-driven thread-safety in C++ was tried both
> by overestimating a set of theoretical studies advocating for
> abstraction of parallelism on one side, and by the apparent difficulty
> to think programs as interacting agents each one state-blind with
> respect to the others on the other; difficulty that leaded those
> developers into thinking that "the other way" would have been
> better.
As one of those actively involved in drafting the thread support for
C++0x, I believe you are mistaken. The thread support in the C++
standard is designed in the light of what people actually do when
writing multi-threaded code, with input from a number of people from a
wide variety of backgrounds. In part, the support is designed to make
it easier for people to write correct code. Thread-safe initialization
of local statics is one of those areas.
> But "simple to think out" is not always "better".
Nobody said it was.
> On Jul 31, 12:27 am, Anthony Williams <anthony....@gmail.com> wrote:
>
>> I think you are seriously overstating your case. Given your link
>> below, I presume you are referring to the thread-safe initialization
>> of local statics that is required by the C++0x standard?
>
> Not only. Also the thread-safe initialization of local statics that
> GCC implemented. However, I think the thread-safe initialization of
> local statics in the C++0x standard is a bad idea because it conflicts
> badly with POSIX. If the C++0x standard were writing on a clean slate,
> you could argue that it made a reasonable choice. But the slate is not
> clean, there's already a threading standard.
Firstly, the POSIX threading standard only applies to POSIX platforms,
and not to the large number of other systems that do not support
POSIX.
Also, the C++0x standard does not conflict with POSIX, it builds on
it. Some things that are undefined under the POSIX C standard are
defined behaviour under C++0x. There are new compiler facilities and
new library facilities, and the C++0x standard defines how they
interact in the face of threads.
There is also a POSIX-C++ working group currently working on designing
a new POSIX-C++ binding which specifies how the new C++0x standard
interacts with those parts of POSIX not currently covered by C++0x
(such as networking).
>> It is up to the compiler to ensure this does not cause problems on its
>> own. Obviously, if you hold a lock when you reach the declaration of
>> such a variable, and that lock is required by the constructor of the
>> variable then you have deadlock, but this is a problem even if you are
>> the one and only thread to call this function.
>
> The problem with the compiler magically adding synchronization code is
> that synchronization code is very specific to exactly what you're
> doing. If you know that one and only one thread calls the code, then
> the extra synchronization is pure wasted overhead. If you know you
> need the synchronization, adding it yourself is likewise trivial.
The overhead is (should be) minimal. It is hard to write portable
POSIX code that is as low overhead as the compiler can write.
> POSIX already requires such synchronization. So POSIX-compliant code
> will already have it. This makes the extra synchronization a pure
> overhead for POSIX code.
C++0x is a new standard, which introduces better ways of doing things
in some cases. Yes, this means that old code might now be superfluous,
but the combination of POSIX threads and C++ has always been platform
and compiler-specific: POSIX doesn't acknowledge the existence of C++,
and C++ prior to C++0x doesn't acknowledge the existence of
multithreaded code.
>> Initialization of separate local statics need not and should not by
>> synchronized, so it is perfectly acceptable for thread A to take a
>> lock and initialize a local static x whilst thread B initializes a
>> local static y that needs to acquire that same lock: the
>> initialization of x will complete (whether or not it started before
>> the initialization of y), thread A will release its lock, then the
>> constructor of y can acquire the lock and the initialization of y can
>> complete.
>
> Except that you can then deadlock if thread A has user-level mutex 1
> and needs implementation magic mutex alpha while thread B has
> implementation magic mutex alpha and needs user-level mutex 1. Code
> has to know what locks it holds in order to be written sanely, and it
> needs precise control over where that lock is acquired and where it is
> released. When it holds multiple locks, it needs precise control over
> how they're acquired and released and what goes in-between releases of
> multiple locks.
Having a single global "magic mutex alpha" that is locked whilst
initializing a local static is a bad design. There are better ways of
doing it, such as not holding the lock whilst the local static is
being initialized (just before and after to synchronize the state),
having a separate mutex per local-static instance, and even just using
atomic operations to do the synchronization (which you can't do with
plain POSIX).
>> If that doesn't work with GCC, it's a bug in their implementation, not
>> a bug in the feature itself.
>
> The primary bug in the feature itself is that it's a pure
> pessimization for POSIX code.
In many cases the compiler will be able to generate BETTER code than
you can write in plain POSIX C. For example, the compiler can make use
of atomic operations to reduce or eliminate the need for a lock. If
you write it in portable POSIX C code, you cannot use atomic
operations, and must use a lock or pthread_once.
Using pthread_once to initialize something that requires parameters is
a real pain, and requires jumping through rather too many hoops for my
liking (e.g. storing the parameters in thread-local storage, or in
separate static data protected by a mutex).
If you use a mutex to protect the initialization, then you have to
lock the mutex *every time* through the function, which is an
unnecessary pessimization once the static has been initialized. You
cannot even check a flag to see if it has been initialized
(double-checked locking anyone?) since this is not allowed by POSIX.
Well, not holding the "magic" lock during the initialization can still
deadlock if the user does not rigorously follow one of the guidelines:
1. Never take a lock in the initialization procedure.
2. Never hold a lock while you call the initialization procedure.
I think I agree with you and Dmitriy in that guideline 1 should be
sufficient, and be the most flexible.
Another pattern I have seen does something like this:
___________________________________________________________
template<typename T>
struct singleton {
static atomic_ptr<T>
instance() {
static atomic_ptr<T> gptr;
local_ptr<T> lptr(gptr);
if (! lptr) {
local_ptr<T> optr(new T());
if (gptr.cas(lptr, optr)) {
return optr;
}
return gptr;
}
return lptr;
}
};
___________________________________________________________
which works with reference counted pointers that provide strong
thread-safety and a compare-and-swap operation, such as Joe Seighs excellent
atomic_ptr:
http://atomic-ptr-plus.sourceforge.net
I need to think about it some more, but AFAICT the following deadlock which
David explains here:
http://groups.google.com/group/comp.programming.threads/msg/bfb9bc1f2517614e
cannot happen with the code above because there is no locking or waiting
involved with the singleton.
[...]
Well, not holding the "magic" lock during the initialization can still
"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:CnBkk.7746$KZ....@newsfe03.iad...
[...]
> Another pattern I have seen does something like this:
>
> ___________________________________________________________
> template<typename T>
> struct singleton {
> static atomic_ptr<T>
> instance() {
> static atomic_ptr<T> gptr;
> local_ptr<T> lptr(gptr);
> if (! lptr) {
> local_ptr<T> optr(new T());
> if (gptr.cas(lptr, optr)) {
> return optr;
> }
> return gptr;
> }
> return lptr;
> }
> };
> ___________________________________________________________
>
> which works with reference counted pointers that provide strong
> thread-safety and a compare-and-swap operation, such as Joe Seighs
> excellent atomic_ptr:
>
> http://atomic-ptr-plus.sourceforge.net
[...]
With the total lock-freedom comes a major caveat: The object may be
constructed and destroyed several times if multiple threads concurrently hit
the singleton for the _first_ time.
> >> A contrived example if all singletons could map to either internal
> >> mutex a or b:
>
> > That says "don't share a mutex between singletons" to me.
>
> Basically, yes.
>
> > You could,
> > for example, have a mutex per singleton. This wouldn't suffer from
> > this problem unless there was interdependency between the singletons,
> > so the initialization of singleton A required singleton B, but the
> > initialization of singleton B required singleton A. This is just plain
> > circular dependency, and is broken anyway.
>
> Right. My last example showed that. It would be endlessly recursive as well,
> analogous to:
>
> struct foo {
> foo() {
> static foo f;
> }
>
> };
>
> Well, I was _wrong_ when I said the mutex needs to be singular!
>
> ;^(...
>
> A mutex per-type would work fine.
So, maybe all is not lost yet. And one can construct really bullet-
proof multi-threaded singleton w/o any user visible caveats, possible
deadlock etc. Like internal synchronization in memory allocator. User
just have to know that he CAN execute malloc/free from different
threads simultaneously. Point.
And internal implementation can be changed over the time under the
hood. For example, straightforward mutex-based implementation can be
changed to DCL. Or for example, on platform where load-acquire is
implied (x86) implementation can use one global pointer, and on the
platform where load-acquire have cost (Itanium) implementation can
cache pointer in thread-local storage.
I think such flexibility is important, but impossible if once
initialization brought to user level.
Dmitriy V'jukov
> > > In MT, synchronization is just a confusing word for "coordination", and
> > > "coordination" means "communication". As for living beings,
> > > communication is part of the highest computational functions, and is to
> > > be performed after and above the bulk of inner computational tasks.
> > Do you want to say that ALL locking is highest computational function?
> > What about, for example, lock in memory allocator? Is it needed to be
> > performed above "inner computational tasks"?
>
> Or what about a thread-safe queue... IMHO, sometimes, its quite convenient
> to know that queue operations are automatically synchronized.
Especially taking into consideration that allocator/queue can be
implemented with global lock, or fine-grained locking, of using thread-
local storage, or lock-free techniques etc.
If we considering only POSIX, then yes, there is a little difference
whether mutex is acquired/released inside component or outside
component (and even here we have some problems with fine-grained
locking). But if we consider whole diversity of synchronization
methods, it's just impossible to bring synchronization to user-level.
Advanced highly optimized lock-free algorithms is not what you want to
see spread all over your application ;)
But note, that this doesn't say where synchronization MUST be, it only
says that synchronization CANNOT be at user level (in some cases).
Dmitriy V'jukov
There is a "problem" with a singleton. It involves a callback into
user-code. A user can use malloc/free without worrying about any
multi-threading issues because malloc/free provide their own
synchronization, and more importantly, they do not involve calling into
user-defined code.
AFAICT, the callback into the user-code can lead to rather nasty problems if
one of the guidelines are not explicitly followed:
1. Never take a lock in the initialization procedure.
2. Never hold a lock while you call the initialization procedure.
[...]
> And internal implementation can be changed over the time under the
> hood. For example, straightforward mutex-based implementation can be
> changed to DCL. Or for example, on platform where load-acquire is
> implied (x86) implementation can use one global pointer, and on the
> platform where load-acquire have cost (Itanium) implementation can
> cache pointer in thread-local storage.
> I think such flexibility is important, but impossible if once
> initialization brought to user level.
Right. The only caveat is that there is a callback into the user-level.
However, if the user follows the rules, everything will work just fine.
> There is a "problem" with a singleton. It involves a callback into
> user-code. A user can use malloc/free without worrying about any
> multi-threading issues because malloc/free provide their own
> synchronization, and more importantly, they do not involve calling into
> user-defined code.
>
> AFAICT, the callback into the user-code can lead to rather nasty problems if
> one of the guidelines are not explicitly followed:
>
> 1. Never take a lock in the initialization procedure.
>
> 2. Never hold a lock while you call the initialization procedure.
Ok. I am foolish.
I was thinking, if implementation uses mutex per singleton, then there
can't be deadlock, so it's bullet-proof solution... But deadlock can
be, even if we use just one singleton. And this deadlock is
unavoidable.
Dmitriy V'jukov
> Do you want to say that ALL locking is highest computational function?
> What about, for example, lock in memory allocator? Is it needed to be
> performed above "inner computational tasks"?
Good: The 'malloc' and 'free' functions internally manipulate a shared
structure. The person who wrote them has dealt with that. The person
who uses them doesn't have to know or care about this. If I had to
wrap them in mutexes myself, I might acquire them too early or release
them too late, which would be a pessimization. (For example, if
free(NULL) is a no-op, 'free' can make it one without ever acquiring
any mutexes. The caller cannot.)
Bad: I'm writing 'my_malloc' and 'my_free' functions. They internally
have a very clever scheme so that they rarely manipulate any shared
structures. But the compiler realizes that I'm implementing memory
allocation functions and 'helpfully' wraps them in a big mutex for me.
See the difference? I'm perfectly capable of wrapping my functions in
a big mutex if that's what I want. But I can also acquire the mutex
late and release it early if I'm clever. I can use multiple mutexes to
reduce contention if I'm clever. And I know it's there, so I can deal
with priority inversion, lock acquisition order, and other issues.
DS
You are talking about GCC's automatic synchronization of local
static's initialization. Right? I'm not arguing about that. I'm
arguing about Giancarlo Niccolai's statement that locking always must
be at very high level, and if locking is at low level or hidden, then
it's always bad.
Dmitriy V'jukov
> You are talking about GCC's automatic synchronization of local
> static's initialization. Right? I'm not arguing about that. I'm
> arguing about Giancarlo Niccolai's statement that locking always must
> be at very high level, and if locking is at low level or hidden, then
> it's always bad.
I guess he can respond for himself, but I can't find any comments he
made in this thread that could be understood in that way. In fact,
it's generally a good idea to push the locks as low as possible in the
code. This way, they're not held longer than necessary and the code
that needs to manage a particular lock is as self-contained as
possible.
This minimizes the complexity of the lock hierarchy. You don't
normally need to think about the 'malloc' internal mutex(es) when you
design an application lock hierarchy because that lock is self-
contained, having been 'pushed down' into the internals of the memory-
management code.
Not only would it be a pain to have to acquire a lock to call
'malloc', but the caller would have no choice but to hold the lock
through the entire function invocation and return. And with an
efficient 'malloc', function invocation and return can be 30+% of the
cost of 'malloc'.
There aren't really hard and fast rules about locking. I'm sure you
could come up with realistic cases where it would be better if you
could hold the lock across two calls to 'malloc' then to have to drop
and reacquire the lock. Engineering and software design is always full
of trade-offs.
DS
Each one in its own context. David has already been asked a similar
question and answered a competent reply.
Saying ALL and ALWAYS is ...well... always dumb. For a sure thing, the
things you write in your program are not the things that you want to
have from a standard library. I don't want automatic locks to be added
to my program, where I want to control where coordination takes place,
but this is another "level of existence" when compared to standard (and
opaque) library functions.
However... if you really want to dig into it... well, there are
lock-free allocators, and they are usually preferable.
Giancarlo.
>
> As one of those actively involved in drafting the thread support for
> C++0x, I believe you are mistaken. The thread support in the C++
> standard is designed in the light of what people actually do when
> writing multi-threaded code, with input from a number of people from a
> wide variety of backgrounds. In part, the support is designed to make
> it easier for people to write correct code. Thread-safe initialization
> of local statics is one of those areas.
>
A lot of people doing a thing doesn't mean a thing should be done in
that way. This was the design principle behind the first JAVA threading
too, AFAIK, and even Microsoft drawn a MT library which is not exactly
shining, at least according to the estimated opinion of the people in
this group.
That non-shining support often required (or more often lured) developers
into writing non-good MT code. If questioned, I would sustain that the
vast majority of MT code "out in the wild" is far from correct, and
constant breakage demonstrates it. Even the mars explorer OS had a
priority inversion error which blocked the the kid for a long while
(till a complete reboot of the OS with a patched version without that
priority scheduling was delivered).
Sorry, but "What people actually do" is far from being a reasonable base
for multithreading programming system design, and the most evident flaws
in Java and first MS SDKs were introduced exaclty because trying to
"make it easier for people to write correct code".
I am sorry, but one system where "it's harder for you to shoot in a
foot, but when you do you blow up your whole leg" was already enough for me.
So, I respectfully believe the one who's mistaken is not me.
Bests,
Giancarlo Niccolai.
> > Do you want to say that ALL locking is highest computational function?
> > What about, for example, lock in memory allocator? Is it needed to be
> > performed above "inner computational tasks"?
>
> Each one in its own context. David has already been asked a similar
> question and answered a competent reply.
>
> Saying ALL and ALWAYS is ...well... always dumb. For a sure thing, the
> things you write in your program are not the things that you want to
> have from a standard library. I don't want automatic locks to be added
> to my program, where I want to control where coordination takes place,
> but this is another "level of existence" when compared to standard (and
> opaque) library functions.
Ok. So some forms of "transparent thread safety" is Ok.
By "ALL and ALWAYS" I mean ALL classes of situations, not all
particual situations. Things like malloc/queues is the whole class of
situations.
Dmitriy V'jukov
Sorry, I think there is the need to specify because the context of the
conversation changed.
I was referring to the point that malloc() inner lock and automatic lock
in static initializers should be considered "a similar problem" or "the
same thing", to which David competently replied.
With "each one in its own context" I mean that synchronization for
malloc(), at malloc() level, happens at its highest logical level and
becomes opaque (abstract, virtually nonexistent) at higher levels.
Static allocators guards in user code cannot be considered the same
thing by any mean; they happen at the lowest possible logical level of
the target application, and even below and beyond the control scope of
the code being targeted by them. THAT, imho, is a problem and THAT was
what I was meaning for synchronization that must be intended as
coordination.
This is orthogonal to the previous consideration of David, I mean, it's
the same thing seen from a different point of view:
http://groups.google.com/group/comp.programming.threads/msg/2f5c410e5debab41
(And again, in the case of malloc, there are the lock free solutions...)
Bests,
Giancarlo.
> The automatic locks will break code that complies with both the
> C++ and POSIX standards.
Do note that reentering the function while a local static is being
initialized is undefined behavior in C++98/03. (More precisely,
allowing the control to pass the variable's definition while the
variable is being initialized.)
I can't think of an example which deadlocks under the new rules and
has defined behavior under the old ones.
Forget deadlocks. The new rules do introduce pessimization and add
unneeded synchronization to otherwise correct existing programs.
Untagged (classic) function statics should remain usynchronized. Adding
another class of syncronized function statics is totally okay. Replacing
one with another is not okay.
regards,
alexander.
> Forget deadlocks. The new rules do introduce pessimization and add
> unneeded synchronization to otherwise correct existing programs.
void f()
{
// static X x;
char __x[ sizeof(X) ];
atomic<X*> __px = 0;
X * __p = __px.load( ddacq );
if( !__p )
{
// slow path
}
// use *__p as x
}
The above is a clear pessimization regarding
void f()
{
// static X x;
static char __x[ sizeof(X) ]; // properly aligned
static bool __fx = 0; // static init'ed flag
if( !__fx )
{
// init
}
// use *__x as X
}
regards,
alexander.