N2444/Boost: fast_pthread_once and atomicity w/ regard to threads

bach...@gmail.com

unread,

Apr 25, 2008, 5:43:19 PM4/25/08

to

Dear C++ memory model experts,

yesterday I skimmed through the sources of the latest Boost.Thread
library, especially the POSIX Threads implementation of
boost::call_once(). It fundamentally changed in the past. The current
implementation quotes the ISO/IEC JTC1 SC22 WG21 N2444 document
(<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2444.html>)
and uses an algorithm by Mike Burrow published there.

After reading especially Meyers / Alexandrescu: "C++ and the Perils of
Double-Checked Locking" (<http://www.aristeia.com/Papers/
DDJ_Jul_Aug_2004_revised.pdf>) (and for Java: Bacon et al.: "The
'Double-Checked Locking is Broken' Declaration" (<http://
www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html>)) I
want to ask the following:

1. Following several threads on the atomicity with regard to signals
and non-atomicity with regard to threads of sig_atomic_t (or another
integral type in the Boost implementation) my question is as follows:
Can the [atomicity] and [atomicity2] requirements really be portably
fulfilled on POSIX Threads systems? But do we really need this
requirement at all for the proof?

2. I miss one argument in the correctness reasoning layed out by
Burrow: "_fast_pthread_once_per_thread_epoch" (line 2) implies a
memory barrier. He mentions that the second
"pthread_mutex_lock()" (line 8) is assumed to provide release
consistency. So in total we end up with the two barriers declared
necessary on Page 12 (the multiprocessor Section) of the Meyers /
Alexandrescu paper. Or do I miss something here?

So, I believe the code is correct, but as the correctness of such a
central piece of software is vital, someone - including me - should
take the time to contribute to such a "mental exercise" (Burrow).

Cheers,
Philipp.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Anthony Williams

unread,

Apr 26, 2008, 5:32:22 AM4/26/08

to

bach...@gmail.com writes:

> Dear C++ memory model experts,
>
> yesterday I skimmed through the sources of the latest Boost.Thread
> library, especially the POSIX Threads implementation of
> boost::call_once(). It fundamentally changed in the past. The current
> implementation quotes the ISO/IEC JTC1 SC22 WG21 N2444 document
> (<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2444.html>)
> and uses an algorithm by Mike Burrow published there.
>
> After reading especially Meyers / Alexandrescu: "C++ and the Perils of
> Double-Checked Locking" (<http://www.aristeia.com/Papers/
> DDJ_Jul_Aug_2004_revised.pdf>) (and for Java: Bacon et al.: "The
> 'Double-Checked Locking is Broken' Declaration" (<http://
> www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html>)) I
> want to ask the following:
>
> 1. Following several threads on the atomicity with regard to signals
> and non-atomicity with regard to threads of sig_atomic_t (or another
> integral type in the Boost implementation) my question is as follows:
> Can the [atomicity] and [atomicity2] requirements really be portably
> fulfilled on POSIX Threads systems? But do we really need this
> requirement at all for the proof?

They are not guaranteed by POSIX, however every architecture I am aware of has
some data type that can be read as a single unit without risk of tearing. On
some systems that's only a byte, but on the platforms supported by boost, a
suitably-aligned int satisfies that condition.

> 2. I miss one argument in the correctness reasoning layed out by
> Burrow: "_fast_pthread_once_per_thread_epoch" (line 2) implies a
> memory barrier. He mentions that the second
> "pthread_mutex_lock()" (line 8) is assumed to provide release
> consistency. So in total we end up with the two barriers declared
> necessary on Page 12 (the multiprocessor Section) of the Meyers /
> Alexandrescu paper. Or do I miss something here?

Line 2 doesn't imply a barrier. In fact, that's the whole point of the
algorithm: the "fast path" has no barriers. If a thread has seen this
once_flag be initialized already, or has seen a once_flag that was initialized
after this one, no locking is required. Since all the once_flags are protected
with the same mutex, this is acceptable.

> So, I believe the code is correct, but as the correctness of such a
> central piece of software is vital, someone - including me - should
> take the time to contribute to such a "mental exercise" (Burrow).

I believe the proof in N2444 is sufficient.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Chris Thomasson

unread,

Apr 26, 2008, 5:30:54 AM4/26/08

to

[added comp.programming.threads; is that okay?]

{ The clc++m moderator is seeing only clc++m. -mod }

<bach...@gmail.com> wrote in message
news:b9a4b2f7-95f8-4a51...@f36g2000hsa.googlegroups.com...

> Dear C++ memory model experts,
>
> yesterday I skimmed through the sources of the latest Boost.Thread
> library, especially the POSIX Threads implementation of
> boost::call_once(). It fundamentally changed in the past. The current
> implementation quotes the ISO/IEC JTC1 SC22 WG21 N2444 document
> (<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2444.html>)
> and uses an algorithm by Mike Burrow published there.

[...]

AFAICT, this is not "100%" POSIX compliant. There is a rule... You cannot
read or write a memory location that might be updated by another thread
without a lock; simple. BTW, there are ways to implement a lock without
using a memory barrier:

http://blogs.sun.com/dave/resource/Asymmetric-Dekker-Synchronization.txt

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/22b2736484af3ca6

You use an asymmetric model, and elect a "dominate" thread. Also, I am not
sure if this is anything all that new:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/98815a326299c723

Basically, you can do it like:
<crude code-sketch>
________________________________________________________________
template<typename T>
static T* once() {
static T* g_object = NULL;
static __thread bool g_flag = false;
static pthread_mutex_t g_lock = PTHREAD_MUTEX_INITIALIZER;
if (! g_flag) {
pthread_mutex_lock(&g_lock);
if (! g_object) {
T* l_object;
try {
l_object = new T;
} catch(...) {
pthread_mutex_unlock(&g_lock);
throw;
}
g_object = l_object;
}
pthread_mutex_unlock(&g_lock);
g_flag = true;
}
return g_object;
}
________________________________________________________________

Alexander Terekhov

unread,

Apr 26, 2008, 6:27:08 PM4/26/08

to

It doesn't conform to the current POSIX (which doesn't allow thread
races on any thread-shared "memory location"). I read Burrow's solution
as a non-conforming optimization of the classic DCSI-TLS aiming at
reduction of number of thread-specific (thread-local) variables to a
single one instead of having one thread-specific flag per pthread_once_t
instance (solution which is conforming under the current XBD 4.10...
modulo POSIX-undefined term "memory location" :-) ). BTW, regarding
"_fast_pthread_once_per_thread_epoch" (line 2), I don't think that there
a need for that "if" (including fetch to x) given the precondition for
entering slow path.

regards,
alexander.