atomically thread-safe Meyers singleton impl...

Chris Thomasson

unread,

Jul 30, 2008, 6:28:25 AM7/30/08

to

Here is the FIXED version of my atomically thread-safe singleton
implementation using pthreads, x86, MSVC and the double-checked locking
pattern with some error checking omitted for brevity:
__________________________________________________________________
#include <cstdio>
#include <cassert>
#include <cstdlib>
#include <pthread.h>

#if ! defined(_MSC_VER)
# error MSVC REQUIRED FOR NOW!
#elif (_MSC_VER > 1300)
using namespace std;
#endif

class mutex_guard {
pthread_mutex_t* const m_mtx;

public:
mutex_guard(pthread_mutex_t* const mtx)
: m_mtx(mtx) {
pthread_mutex_lock(m_mtx);
printf("pthread_mutex_lock(%p);\n", (void*)m_mtx);
}

~mutex_guard() throw() {
printf("pthread_mutex_unlock(%p);\n", (void*)m_mtx);
pthread_mutex_unlock(m_mtx);
}
};

namespace atomic {
__declspec(naked)
static void*
ldptr_acq(void* volatile*) {
_asm {
MOV EAX, [ESP + 4]
MOV EAX, [EAX]
RET
}
}

__declspec(naked)
static void*
stptr_rel(void* volatile*, void* const) {
_asm {
MOV ECX, [ESP + 4]
MOV EAX, [ESP + 8]
MOV [ECX], EAX
RET
}
}
}

#if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER)
static pthread_mutex_t singleton_mtx =
PTHREAD_RECURSIVE_MUTEX_INITIALIZER;
#else
static pthread_mutex_t* volatile singleton_mtx_ptr = NULL;
static pthread_mutex_t singleton_mtx;

static void
singleton_mutex_static_init_destroy() {
assert(singleton_mtx_ptr == &singleton_mtx);
pthread_mutex_destroy(&singleton_mtx);
printf("pthread_mutex_destroy(%p);\n", (void*)&singleton_mtx);
}
#endif

static pthread_mutex_t*
singleton_mutex_static_init() {
pthread_mutex_t* mtx;
#if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER)
mtx = &singleton_mtx;
#else
mtx = (pthread_mutex_t*)atomic::ldptr_acq(
(void* volatile*)&singleton_mtx_ptr
);
if (! mtx) {
static pthread_mutex_t this_mtx_sentinel =
PTHREAD_MUTEX_INITIALIZER;
mutex_guard lock(&this_mtx_sentinel);
if (! (mtx = singleton_mtx_ptr)) {
pthread_mutexattr_t mattr;
pthread_mutexattr_init(&mattr);
pthread_mutexattr_settype(&mattr, PTHREAD_MUTEX_RECURSIVE);
pthread_mutex_init(&singleton_mtx, &mattr);
pthread_mutexattr_destroy(&mattr);
atexit(singleton_mutex_static_init_destroy);
mtx = (pthread_mutex_t*)atomic::stptr_rel(
(void* volatile*)&singleton_mtx_ptr, &singleton_mtx
);
printf("pthread_mutex_init(%p);\n", (void*)mtx);
}
}
#endif
assert(mtx);
return mtx;
}

template<typename T>
struct singleton {
static T* instance() {
static T* volatile this_ptr = NULL;
T* ptr = (T*)atomic::ldptr_acq((void* volatile*)&this_ptr);
if (! ptr) {
mutex_guard lock(singleton_mutex_static_init());
if (! (ptr = this_ptr)) {
static T this_instance;
ptr = (T*)atomic::stptr_rel(
(void* volatile*)&this_ptr, &this_instance
);
}
}
assert(ptr);
return ptr;
}
};

struct foo {
foo() {
printf("(%p)->foo::foo();\n", (void*)this);
}

~foo() throw() {
printf("(%p)->foo::~foo();\n", (void*)this);
}
};

struct foo1 {
foo1() {
foo* ptr1 = singleton<foo>::instance();
foo* ptr2 = singleton<foo>::instance();
foo* ptr3 = singleton<foo>::instance();
assert(ptr1 == ptr2 && ptr2 == ptr3);
printf("(%p)->foo1::foo1();\n", (void*)this);
}

~foo1() throw() {
printf("(%p)->foo1::~foo1();\n", (void*)this);
}
};

struct foo2 {
foo2() {
printf("(%p)->foo2::foo2();\n", (void*)this);
}

~foo2() throw() {
printf("(%p)->foo2::~foo2();\n", (void*)this);
}
};

int main() {
foo1* ptr1 = singleton<foo1>::instance();
foo1* ptr2 = singleton<foo1>::instance();
foo1* ptr3 = singleton<foo1>::instance();
foo2* ptr11 = singleton<foo2>::instance();
foo2* ptr22 = singleton<foo2>::instance();
foo2* ptr33 = singleton<foo2>::instance();
assert(ptr1 == ptr2 && ptr2 == ptr3);
assert(ptr11 == ptr22 && ptr22 == ptr33);
return 0;
}
__________________________________________________________________

I think this is about as good as I can do. It uses a single recursive mutex
as a guard for the singleton slow-path. This is needed because a singleton
can contain other singletons in there ctor's. The pthread-win32 library
features a `PTHREAD_RECURSIVE_MUTEX_INITIALIZER' definition which statically
initialized a recursive mutex. However, I don't think that this is standard.
Therefore, the code will automatically compensate for this if it is not
defined. This means that this singleton will work even if threads are
created before main. Also, it should be rather trivial to convert this over
to GCC and Linux. Alls you would need to do is create the atomic functions
in AT&T inline assembler syntax.

Any thoughts on this approach?

I think the only way to break this would be to do something extremely stupid
like:

struct foo {
foo() {
foo* f = singleton<foo>::instance();
}
};

which would be analogous to doing:

struct foo {
foo() {
static foo f;
}
};

For now, AFAICT this thread-safe singleton is looking fairly bullet-proof.
Humm...

P.S.

Here is the BROKEN version:

http://groups.google.com/group/comp.lang.c++.moderated/msg/270cd69180dbe8af
(damn it!)

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Mathias Gaunard

unread,

Jul 30, 2008, 4:52:47 PM7/30/08

to

On 30 juil, 12:28, "Chris Thomasson" <x...@xxx.xxx> wrote:
> Here is the FIXED version of my atomically thread-safe singleton
> implementation using pthreads, x86, MSVC and the double-checked locking
> pattern with some error checking omitted for brevity:

Isn't this already extremely simple and thread-safe?

Singleton& instance()
{
static Singleton singleton;
return singleton;
}

The next C++ standard says that it is thread-safe, and I suppose good
existing implementations already do it.

Joshua...@gmail.com

unread,

Jul 30, 2008, 6:43:54 PM7/30/08

to

On Jul 30, 1:52 pm, Mathias Gaunard <loufo...@gmail.com> wrote:
> On 30 juil, 12:28, "Chris Thomasson" <x...@xxx.xxx> wrote:
>
> > Here is the FIXED version of my atomically thread-safe singleton
> > implementation using pthreads, x86, MSVC and the double-checked locking
> > pattern with some error checking omitted for brevity:
>
> <snip complex code />
>
> Isn't this already extremely simple and thread-safe?
>
> Singleton& instance()
> {
> static Singleton singleton;
> return singleton;
>
> }
>
> The next C++ standard says that it is thread-safe, and I suppose good
> existing implementations already do it.

Does it? I'm not sure I like this. I'll have to look at it. What if my
program is single threaded? Or what if I know that the singleton will
only be used during the single threaded portion of my program? I will
have to pay the overhead of locking with the above example code? The
static keyword now implies a lock? This is very much against the
design goal of C++ of not paying for things you don't use.

Moreover, I don't know anything, but I would be greatly surprised if
most of the c++ compilers currently guarantee the semantics you
suggest.

Anthony Williams

unread,

Jul 30, 2008, 11:57:53 PM7/30/08

to

Joshua...@gmail.com writes:

> On Jul 30, 1:52 pm, Mathias Gaunard <loufo...@gmail.com> wrote:
>> On 30 juil, 12:28, "Chris Thomasson" <x...@xxx.xxx> wrote:
>>
>> > Here is the FIXED version of my atomically thread-safe singleton
>> > implementation using pthreads, x86, MSVC and the double-checked locking
>> > pattern with some error checking omitted for brevity:
>>
>> <snip complex code />
>>
>> Isn't this already extremely simple and thread-safe?
>>
>> Singleton& instance()
>> {
>> static Singleton singleton;
>> return singleton;
>>
>> }
>>
>> The next C++ standard says that it is thread-safe, and I suppose good
>> existing implementations already do it.
>
> Does it? I'm not sure I like this. I'll have to look at it. What if my
> program is single threaded? Or what if I know that the singleton will
> only be used during the single threaded portion of my program? I will
> have to pay the overhead of locking with the above example code? The
> static keyword now implies a lock? This is very much against the
> design goal of C++ of not paying for things you don't use.

Yes, this is required to be thread-safe in C++0x. It does not
necessarily imply a lock, especially if only one thread ever calls the
function.

> Moreover, I don't know anything, but I would be greatly surprised if
> most of the c++ compilers currently guarantee the semantics you
> suggest.

The only current compiler I am aware of that provides this guarantee
is g++, and only if you provide the correct options (though these
might be default in some installations).

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Mathias Gaunard

unread,

Jul 31, 2008, 12:28:20 AM7/31/08

to

On 31 juil, 00:43, JoshuaMaur...@gmail.com wrote:

> Does it? I'm not sure I like this. I'll have to look at it.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2513.html

> What if my
> program is single threaded? Or what if I know that the singleton will
> only be used during the single threaded portion of my program? I will
> have to pay the overhead of locking with the above example code?

I suppose a nice compiler would give you a flag to disable it.

> This is very much against the
> design goal of C++ of not paying for things you don't use.

The thing is, it's not possible to have construction/destruction
really work otherwise.

> Moreover, I don't know anything, but I would be greatly surprised if
> most of the c++ compilers currently guarantee the semantics you
> suggest.

I didn't say most. GCC does (albeit not in the way recommended by the
standard).

Joshua...@gmail.com

unread,

Jul 31, 2008, 7:13:00 AM7/31/08

to

On Jul 30, 9:28 pm, Mathias Gaunard <loufo...@gmail.com> wrote:
> > What if my
> > program is single threaded? Or what if I know that the singleton will
> > only be used during the single threaded portion of my program? I will
> > have to pay the overhead of locking with the above example code?
>
> I suppose a nice compiler would give you a flag to disable it.
>
> > This is very much against the
> > design goal of C++ of not paying for things you don't use.
>
> The thing is, it's not possible to have construction/destruction
> really work otherwise.

What about the status quo? It works alright now. I wouldn't be opposed
to a new construct, a thread-safe static which guarantees at most one
construction, but I don't want all statics to possibly incur this
penalty because the optimizer cannot determine if the first call needs
to be locked or not.

Yes, it's a hole in the rules defining lifetimes of objects. It's a
hole I can live with.

I'd imagine the general case would be the compiler adding the lock
because it cannot determine if the first call will occur safely or
not. Then again, I don't write optimizing compilers, so I could be
wrong. Let's take the scenario I have off the top of my head. I have
my main thread, I spawn a couple more, I then call the function
containing my static variable from thread X. I know that, for my
program, no currently existing thread besides thread X will access
this static, so I do not need any locking.

This may or may not be a trivial speed cost. For the class singleton
interface T * getInstance(), this locking could be adding on
significant overhead.

Moreover, it just irks me that I would have to jump through several
hoops to construct an object in static storage and not lock to do it
while abandoning some type safety. (I imagine I could just have a
global long array of the appropriate size, then at first use I could
placement new my object into it.)

Alberto Ganesh Barbati

unread,

Jul 31, 2008, 10:28:20 AM7/31/08

to

Joshua...@gmail.com ha scritto:

> On Jul 30, 9:28 pm, Mathias Gaunard <loufo...@gmail.com> wrote:
>>> What if my
>>> program is single threaded? Or what if I know that the singleton will
>>> only be used during the single threaded portion of my program? I will
>>> have to pay the overhead of locking with the above example code?
>> I suppose a nice compiler would give you a flag to disable it.
>>
>>> This is very much against the
>>> design goal of C++ of not paying for things you don't use.
>> The thing is, it's not possible to have construction/destruction
>> really work otherwise.
>
> What about the status quo? It works alright now. I wouldn't be opposed
> to a new construct, a thread-safe static which guarantees at most one
> construction, but I don't want all statics to possibly incur this
> penalty because the optimizer cannot determine if the first call needs
> to be locked or not.

If you had read paper N2513, you would have noticed that it includes a
portable algorithm that does not incur any significant performance cost.
With such algorithm, an objection based only on the presumption of an
added cost loses strength considerably.

> Yes, it's a hole in the rules defining lifetimes of objects. It's a
> hole I can live with.

I don't.

> I'd imagine the general case would be the compiler adding the lock
> because it cannot determine if the first call will occur safely or
> not. Then again, I don't write optimizing compilers, so I could be
> wrong. Let's take the scenario I have off the top of my head. I have
> my main thread, I spawn a couple more, I then call the function
> containing my static variable from thread X. I know that, for my
> program, no currently existing thread besides thread X will access
> this static, so I do not need any locking.

The draft clearly states "The implementation shall not introduce any
locking around execution of the initializer".

Ganesh

Anthony Williams

unread,

Jul 31, 2008, 10:28:21 AM7/31/08

to

Joshua...@gmail.com writes:

> I'd imagine the general case would be the compiler adding the lock
> because it cannot determine if the first call will occur safely or
> not. Then again, I don't write optimizing compilers, so I could be
> wrong. Let's take the scenario I have off the top of my head. I have
> my main thread, I spawn a couple more, I then call the function
> containing my static variable from thread X. I know that, for my
> program, no currently existing thread besides thread X will access
> this static, so I do not need any locking.
>
> This may or may not be a trivial speed cost. For the class singleton
> interface T * getInstance(), this locking could be adding on
> significant overhead.

I believe you are overstating the case. Even if it does have to
generate a "lock", because the compiler can't prove that only one
thread can run the initialization, there shouldn't be much cost. The
"is this initialized" test should be a simple atomic read (rather than
a plain read for the single-threaded case), and only if that fails do
we have to worry about locking. Even then, a simple atomic
compare_exchange operation can check whether this is the first thread
to enter the initialization, or whether another thread is already
processing it. If there is only one thread (and so no contention),
this is generally a low-cost operation.

> Moreover, it just irks me that I would have to jump through several
> hoops to construct an object in static storage and not lock to do it
> while abandoning some type safety. (I imagine I could just have a
> global long array of the appropriate size, then at first use I could
> placement new my object into it.)

Doing that would strike me as premature optimization.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Joshua...@gmail.com

unread,

Jul 31, 2008, 9:55:11 PM7/31/08

to

On Jul 31, 7:28 am, Alberto Ganesh Barbati <AlbertoBarb...@libero.it>
wrote:
> JoshuaMaur...@gmail.com ha scritto:

> > On Jul 30, 9:28 pm, Mathias Gaunard <loufo...@gmail.com> wrote:
> >>> What if my
> >>> program is single threaded? Or what if I know that the singleton will
> >>> only be used during the single threaded portion of my program? I will
> >>> have to pay the overhead of locking with the above example code?
> >> I suppose a nice compiler would give you a flag to disable it.
>
> >>> This is very much against the
> >>> design goal of C++ of not paying for things you don't use.
> >> The thing is, it's not possible to have construction/destruction
> >> really work otherwise.
>
> > What about the status quo? It works alright now. I wouldn't be opposed
> > to a new construct, a thread-safe static which guarantees at most one
> > construction, but I don't want all statics to possibly incur this
> > penalty because the optimizer cannot determine if the first call needs
> > to be locked or not.
>
> If you had read paper N2513, you would have noticed that it includes a
> portable algorithm that does not incur any significant performance cost.
> With such algorithm, an objection based only on the presumption of an
> added cost loses strength considerably.

I read the paper, and it seems that one can do cool things with thread
specific storage. However, there is still some overhead, minor as it
may be. They exchange locking on each access with some thread specific
storage.

Silly question? All systems which support threading also support
thread specific storage? How is this typically implemented? A direct
link from the current thread to the storage? Or a hashmap mapping
threads to thread specific storage?

I imagine it's not as simple as extra space reserved in the stack.
This could not work because a dynamically loaded library might have
thread specific storage, and you don't know how much thread specific
storage is required until you load the library, which occurs after
some of your threads have been created.

Does that leave us with just a hashmap implementation for thread
specific storage? If so, I would count this as significant overhead.

Perhaps it's done by each thread containing a hidden pointer to a
linked list of storage. Each dynamically loaded dll would just another
node to this list for all threads, but again, traversing this list
would be significant overhead.

Is there a software approach I don't see?

Is there some hardware magic?