scoped static singleton question

John Calcote

unread,

Feb 13, 2003, 4:04:46 PM2/13/03

to

I am working on a bit of code right now that has brought a question to
mind regarding the C++ standard. Let me preface my remarks by saying
that I know the standard doesn't deal with multi-threading issues,
yet, they exist.

Here's a bit of code:

int GetTimeZoneOffset(void)
{
time_t lts = 0;
static int ltsOffset = -int(mktime(gmtime(&lts)));
return ltsOffset;
}

My original concern was that multiple threads would contend with one
another when attempting to initialize the static variable ltsOffset,
possibly causing corruption.

This brought to mind the fact that this is a VERY common way of
managing singletons:

class MySingleton
{
public:
static MySingleton *Instance(void);
// ...
};

MySingleton *MySingleton::Instance(void)
{
static MySingleton *sp = new MySingleton;
return sp;
}

What happens here in the context of multiple threads, where
MySingleton's constructor could be quite complex and take some time to
execute? If several threads enter MySingleton::Instance at the same
time, which one actually constructs the singleton? Furthermore, do
losing threads back up against the static "sp" until it's fully
initialized? Man, what a nightmare!

John

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Hyman Rosen

unread,

Feb 14, 2003, 1:37:07 AM2/14/03

to

John Calcote wrote:
> What happens here in the context of multiple threads

That is up to your implementation. If it is intended to be
friendly to multithreaded programming, then it will protect
the static initialization with a proper lock so that things
will work correctly. If not, then you will not be able to
use this method for singletons if there is a chance of
contention.

Bjorn Roald

unread,

Feb 15, 2003, 9:57:00 PM2/15/03

to

John Calcote wrote:

> class MySingleton
> {
> public:
> static MySingleton *Instance(void);
> // ...
> };
>
> MySingleton *MySingleton::Instance(void)
> {
> static MySingleton *sp = new MySingleton;
> return sp;
> }
>
> What happens here in the context of multiple threads, where
> MySingleton's constructor could be quite complex and take some time to
> execute? If several threads enter MySingleton::Instance at the same
> time, which one actually constructs the singleton? Furthermore, do
> losing threads back up against the static "sp" until it's fully
> initialized? Man, what a nightmare!

yep,

I use this pattern,
note, I have not tested the exact typed code below...

*** header ****

class MySingleton
{
public:
static MySingleton *Instance(void);
// ...

private:
static MySingleton* m_Instance;
Mutex m_Mutex;
};

*** source file *****
MySingleton* MySingleton::m_Instance = 0;

MySingleton *MySingleton::Instance(void)
{
if(!m_Instance)
{
// concurrent early calls may get here
Guard guard(m_Mutex); // locking mutex in Guard constructor
f(!m_Instance) // only one past this
m_Instance = new MySingleton;
} // ~guard() is called here, unlocking the mutex
return m_Instance;

Alexander Terekhov

unread,

Feb 17, 2003, 12:08:34 PM2/17/03

to

John Calcote wrote:
>
> I am working on a bit of code right now that has brought a question to
> mind regarding the C++ standard. Let me preface my remarks by saying
> that I know the standard doesn't deal with multi-threading issues,
> yet, they exist.

Well said.

>
> Here's a bit of code:
>
> int GetTimeZoneOffset(void)
> {
> time_t lts = 0;
> static int ltsOffset = -int(mktime(gmtime(&lts)));
> return ltsOffset;
> }
>
> My original concern was that multiple threads would contend with one
> another when attempting to initialize the static variable ltsOffset,
> possibly causing corruption.

That's correct [unless your threaded C++ implementation(s) "just
happen" to use something along the lines of "a little bit busted"
Itanic-ABI-thing].

>
> This brought to mind the fact that this is a VERY common way of

> managing singletons: ....

http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-group-l&id=4862
(Subject: Re: pthread_once() et al and standards philosophy)

Ahh... well, <http://tinyurl.com/4xw6> <http://tinyurl.com/4xwf>.

regards,
alexander.

--
http://groups.google.com/groups?threadm=c29b5e33.0201300100.2beba77c%40posting.google.com
(Subject: Re: thread Safe singleton)

Christoph Rabel

unread,

Feb 17, 2003, 12:09:35 PM2/17/03

to

John Calcote wrote:
> This brought to mind the fact that this is a VERY common way of
> managing singletons:
>
> class MySingleton
> {
> public:
> static MySingleton *Instance(void);
> // ...
> };
>
> MySingleton *MySingleton::Instance(void)
> {
> static MySingleton *sp = new MySingleton;
> return sp;
> }
>
> What happens here in the context of multiple threads, where
> MySingleton's constructor could be quite complex and take some time to
> execute? If several threads enter MySingleton::Instance at the same
> time, which one actually constructs the singleton? Furthermore, do
> losing threads back up against the static "sp" until it's fully
> initialized? Man, what a nightmare!

A common solution is this:

> class MySingleton
> {
private:
static MySingleton * pInstance;
...
> };

MySingleton *MySingleton::Instance(void)
{
if(!pInstance)
{
LOCK(); // Do some MT-locking here
if (!pInstance)
pInstance = new MySingleton;
UNLOCK();
return sp;
}

LOCK() and UNLOCK() must be provided by you. Maybe a mutex
or something like that. Depends on your problem.

odie

Randy Maddox

unread,

Feb 17, 2003, 12:11:26 PM2/17/03

to

jcal...@novell.com (John Calcote) wrote in message news:<abb28ce2.03021...@posting.google.com>...

> I am working on a bit of code right now that has brought a question to
> mind regarding the C++ standard. Let me preface my remarks by saying
> that I know the standard doesn't deal with multi-threading issues,
> yet, they exist.
>
> Here's a bit of code:
>
> int GetTimeZoneOffset(void)
> {
> time_t lts = 0;
> static int ltsOffset = -int(mktime(gmtime(&lts)));
> return ltsOffset;
> }

No need for static here, which as you observe may cause problems.
Just return the value directly instead.

time_t lts = 0;

return -int(mktime(gmtime(&lts)));

Assuming that mktime() and gmtime() are reentrant this should be OK.
If not, you have more work to do. :-)

>
> My original concern was that multiple threads would contend with one
> another when attempting to initialize the static variable ltsOffset,
> possibly causing corruption.
>
> This brought to mind the fact that this is a VERY common way of
> managing singletons:
>
> class MySingleton
> {
> public:
> static MySingleton *Instance(void);
> // ...
> };
>
> MySingleton *MySingleton::Instance(void)
> {
> static MySingleton *sp = new MySingleton;
> return sp;
> }
>
> What happens here in the context of multiple threads, where
> MySingleton's constructor could be quite complex and take some time to
> execute? If several threads enter MySingleton::Instance at the same
> time, which one actually constructs the singleton? Furthermore, do
> losing threads back up against the static "sp" until it's fully
> initialized? Man, what a nightmare!
>
> John
>

For the singleton you need to do something along these lines:

MySingleton *MySingleton::Instance(void)
{
static MySingleton *sp = 0;

if(!sp)
{
// lock against multiple threads here using mutex, critical
section
// or whatever is appropriate
lock();

// double check to see if sp is still 0, it might not be if you
had
// to wait to get the lock while another thread initialized sp
if(!sp)
{
sp = new MySingleton;
}

unlock(); // whatever unlock() corresponds to lock()
}

return sp;
}

Of course this also needs exception handling to be safe, but this is
the basic concept.

Hope it works for you.

Randy.

Philippe Mori

unread,

Feb 17, 2003, 12:21:13 PM2/17/03

to

Since m_Mutex is not statis, it means that 2 objects will not uses
the same mutex? In any case, it should not compile because
Instance is a static method. Then the mutex must be static and be
created before the singleton object... It is fine as long as start up
in single-threaded (at least in Windows) and Instance is created
after static instance of the singleton (if the object is never
created at start-up, then it is all right but otherwise, you may need
another solution).

Scott Meyers

unread,

May 2, 2003, 5:30:29 PM5/2/03

to

On February 2, Hyman Rosen wrote:

John Calcote wrote:
> What happens here in the context of multiple threads

That is up to your implementation. If it is intended to be
friendly to multithreaded programming, then it will protect
the static initialization with a proper lock so that things
will work correctly. If not, then you will not be able to
use this method for singletons if there is a chance of
contention.

I know that this isn't really a standards question, but because this issue was
raised here and because I plan to post another followup in this thread, I'll ask
this here: which C++ implementations can be made to generate thread-safe code to
initialize static objects inside functions? I've often heard that some
implemenations do it. I'd like to know which ones.

Thanks,

Scott

Scott Meyers

unread,

May 2, 2003, 10:12:48 PM5/2/03

to

On Mon, 17 Feb 2003 17:09:35 +0000 (UTC), Christoph Rabel wrote:
> MySingleton *MySingleton::Instance(void)
> {
> if(!pInstance)
> {
> LOCK(); // Do some MT-locking here
> if (!pInstance)
> pInstance = new MySingleton;
> UNLOCK();
> return sp;
> }

This is the double-checked locking pattern. I recently drafted an article on
this topic for CUJ. As I sit here in a pool of my own blood based on the
feedback I got from pre-pub reviewers, I feel compelled to offer the following
observation: there is, as far as I know, no way to make this work on a reliable
and portable basis.

The best treatment of this topic that I know of is "The 'Double-Checked Locking
is Broken' Declaration"
(http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html). I
suggest you not fall into the trap I did in assuming that its focus on Java
implies that it doesn't really apply to C++. It does. My favorite paragraph
from that document is this:

There are lots of reasons it doesn't work. The first couple of reasons we'll
describe are more obvious. After understanding those, you may be tempted to
try to devise a way to "fix" the double-checked locking idiom. Your fixes will
not work: there are more subtle reasons why your fix won't work. Understand
those reasons, come up with a better fix, and it still won't work, because
there are even more subtle reasons.

Lots of very smart people have spent lots of time looking at this. There is no
way to make it work without requiring each thread that accesses the helper
object to perform synchronization.

As an example of one of the "more obvious" reasons why it doesn't work, consider
this line from the above code:

pInstance = new MySingleton;

Three things must happen here:
1. Allocate enough memory to hold a MySingleton object.
2. Construct a MySingleton in the memory.
3. Make pInstance point to the object.

In general, they don't have to happen in this order. Consider the following
translation. This isn't code a human is likely to write, but it is a valid
translation on the part of the compiler under certain circumstances (e.g., when
static analysis reveals that the MySingleton constructor cannot throw):

pInstance = // 3
operator new(sizeof(MySingleton)); // 1
new (pInstance) MySingleton; // 2

If we plop this into the original function, we get this:

> MySingleton *MySingleton::Instance(void)
> {
> if(!pInstance) // Line 1

> {
> LOCK(); // Do some MT-locking here
> if (!pInstance)
pInstance =

operator new(sizeof(MySingleton)); // Line 2
new (pInstance) MySingleton;
> UNLOCK();
> return sp;
> }

So consider this sequence of events:
- Thread A enters MySingleton::Instance, executes through Line 2, and is
suspended.
- Thread B enters MySingleton::Instance, executes Line 1, sees that pInstance
is non-null, and returns. It then merrily dereferences the pointer, thus
referring to memory that does not yet hold an object.

If there's a portable way to avoid this problem in the presence of aggressive
optimzing compilers, I'd love to know about it.

Scott

E. Mark Ping

unread,

May 3, 2003, 3:53:19 PM5/3/03

to

In article <MPG.191cc4c34...@news.hevanet.com>,

Scott Meyers <Use...@aristeia.com> wrote:
>The best treatment of this topic that I know of is "The
>'Double-Checked Locking is Broken' Declaration"
>(http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html).
>I suggest you not fall into the trap I did in assuming that its focus
>on Java implies that it doesn't really apply to C++.

Fascinating. However, the treatment of 'volatile' is insufficient as
I read the C++ standard (but granted, I'm no expert). Modifying the
code to read:

MySingleton *MySingleton::Instance(void)
{
if(!pInstance)
{
LOCK(); // Do some MT-locking here
if (!pInstance)

{
volatile MySingleton* helper;
helper = new MySingleton;
pInstance = helper;
}
UNLOCK();
return sp;
}

1.9/11 says in part:
"The least requirements on a conforming implementation are:
- At sequence points, volatile objects are stable in the sense that
previous evaluations are complete and subsequent evaluations have not
yet occurred."

I would expect that even the most aggressive optimizer would not be
able to assign to pInstance from helper until after helper has been
completely assigned and created.
--
Mark Ping
ema...@soda.CSUA.Berkeley.EDU

Alexander Terekhov

unread,

May 3, 2003, 3:54:03 PM5/3/03

to

Scott Meyers wrote:

[... about DCI (double-checked initialization, not locking) ...]

> > MySingleton *MySingleton::Instance(void)
> > {
> > if(!pInstance) // Line 1
> > {
> > LOCK(); // Do some MT-locking here
> > if (!pInstance)
> pInstance =
> operator new(sizeof(MySingleton)); // Line 2
> new (pInstance) MySingleton;
> > UNLOCK();
> > return sp;
> > }
>
> So consider this sequence of events:
> - Thread A enters MySingleton::Instance, executes through Line 2, and is
> suspended.
> - Thread B enters MySingleton::Instance, executes Line 1, sees that pInstance
> is non-null, and returns. It then merrily dereferences the pointer, thus
> referring to memory that does not yet hold an object.
>
> If there's a portable way to avoid this problem in the presence of aggressive
> optimzing compilers, I'd love to know about it.

pthread_once() aside for a moment, a fully "portable" way to avoid
this problem is to use the thread specific storage (TSD/TLS/__thread
or whatever you call it). Here is a compact illustration in Java:

class Singleton {
private static Singleton theInstance;

private static final ThreadLocal tlsInstance =
new ThreadLocal() {
protected synchronized Object initialValue() {
if (theInstance == null)
theInstance = new Singleton();
return theInstance;
}
};

public static Singleton getInstance() {
return (Singleton)tlsInstance.get();
}
}

Now, and the "right" way to avoid this problem in C++ (well,
explicitly synchronized static locals aside for a moment) is to use
something along the lines of "once" template:

http://tinyurl.com/7w7r

(Subject: Re: pthread_once() et al and standards philosophy)

regards,
alexander.

Jon Biggar

unread,

May 3, 2003, 4:20:48 PM5/3/03

to

Scott Meyers wrote:

> So consider this sequence of events:
> - Thread A enters MySingleton::Instance, executes through Line 2, and is
> suspended.
> - Thread B enters MySingleton::Instance, executes Line 1, sees that pInstance
> is non-null, and returns. It then merrily dereferences the pointer, thus
> referring to memory that does not yet hold an object.
>
> If there's a portable way to avoid this problem in the presence of aggressive
> optimzing compilers, I'd love to know about it.

Making pInstance volitile, assigning the result of the new to a local
variable and then copying to pInstance would help, but I'm sure there
are still other subtle difficulties...

--
Jon Biggar
Floorboard Software
j...@floorboard.com
j...@biggar.org

Alexander Terekhov

unread,

May 4, 2003, 12:29:41 AM5/4/03

to

Scott Meyers wrote:
[...]

> I know that this isn't really a standards question, but because this issue was
> raised here and because I plan to post another followup in this thread, I'll ask
> this here: which C++ implementations can be made to generate thread-safe code to
> initialize static objects inside functions? I've often heard that some
> implemenations do it. I'd like to know which ones.

A rather popular "Itanium C++ ABI" project specified {the impl-of}
thread-safe static locals. The "only"(*) problem is that they've
"simply forgot" to also add synch keyword or something like that.
Some details can be found here:

http://tinyurl.com/awkc

(Subject: Re: thread Safe singleton)

regards,
alexander.

(*) http://tinyurl.com/awkr
(Subject: Re: Threadsafe Singletons (again!))

--
http://tinyurl.com/awkd
(Subject: Re: C++0x)

Scott Meyers

unread,

May 4, 2003, 3:06:14 PM5/4/03

to

On Sat, 3 May 2003 20:20:48 +0000 (UTC), Jon Biggar wrote:
> Making pInstance volitile, assigning the result of the new to a local
> variable and then copying to pInstance would help, but I'm sure there
> are still other subtle difficulties...

Remember that good compilers do extensive dataflow analysis, and they
eliminate intermediate variables that are unnecessary. Some do this across
function call boundaries, even for non-inline functions defined in separate
translation units.

Scott

Scott Meyers

unread,

May 4, 2003, 3:07:12 PM5/4/03

to

On Sat, 3 May 2003 19:53:19 +0000 (UTC), E. Mark Ping wrote:
> MySingleton *MySingleton::Instance(void)
> {
> if(!pInstance)
> {
> LOCK(); // Do some MT-locking here
> if (!pInstance)
> {
> volatile MySingleton* helper;
> helper = new MySingleton;
> pInstance = helper;
> }
> UNLOCK();
> return sp;
> }
>

> I would expect that even the most aggressive optimizer would not be
> able to assign to pInstance from helper until after helper has been
> completely assigned and created.

Do you really want helper to be non-volatile and what it points to to be
volatile? Anyway, I don't think it matters. What makes you think that a
compiler can't use the as-if rule to eliminate helper completely and
generate the same code I originally posted, simply treating pInstance as
volatile in the region where helper would have existed?

As I wrote in a posting on this same topic in clcm:

Declaring pInstance volatile will force reads of that variable to come
from memory and writes to that variable to go to memory, but what we need
here is a way to say that pInstance should not be written until the
Singleton has been constructed. That is, we need to tell the compiler to
respect a temporal ordering that is stricter than the as-if rule. As far
as I know, there is no way to do that. Certainly volatile doesn't do it.

BTW, if somebody can tell me how to unify the threads on this newsgroup and
clcm (unfortunately, they have different subjects, because, unfortunately,
I'm a dope), I'd be please to learn what it is.

Scott

KIM Seungbeom

unread,

May 4, 2003, 4:45:23 PM5/4/03

to

ema...@soda.csua.berkeley.edu (E. Mark Ping) wrote in message news:<b8vgkn$f7v$1...@agate.berkeley.edu>...

>
> MySingleton *MySingleton::Instance(void)
> {
> if(!pInstance)
> {
> LOCK(); // Do some MT-locking here
> if (!pInstance)
> {
> volatile MySingleton* helper;
> helper = new MySingleton;
> pInstance = helper;
> }
> UNLOCK();
> return sp;
> }

Shouldn't the volatile keyword apply to the pointer itself?

MySingleton* volatile helper;

helper = new MySingleton;
pInstance = helper;

--
KIM Seungbeom <musi...@bawi.org>

Dyre Tjeldvoll

unread,

May 4, 2003, 4:45:33 PM5/4/03

to

Use...@aristeia.com (Scott Meyers) writes:

> If there's a portable way to avoid this problem in the presence of aggressive
> optimzing compilers, I'd love to know about it.

A simpler solution, using volatile, has already been presented. I
don't pretend to think that my "solution" will solve the problem, but it
would be interesting to know why it fails...

#include <iostream>
// I don't have an MT-setup available
#define LOCK()
#define UNLOCK()

struct Singleton {
int dummy; // Just to be absolutely sure...
static Singleton *pInstance;
// I assume that this pointer assignment is atomic
void useMe() { ++dummy; pInstance = this; }
static Singleton* instance() {
if (!pInstance) {
LOCK();
if (!pInstance) {
Singleton *tmp = new Singleton;
// The complier can't invoke useMe before tmp points to a
// fully constructed object, can it?
tmp->useMe();
//
}
UNLOCK();
}
return pInstance;
}
};

--
dt

Scott Meyers

unread,

May 4, 2003, 4:45:33 PM5/4/03

to

On Sat, 3 May 2003 19:54:03 +0000 (UTC), Alexander Terekhov wrote:
> pthread_once() aside for a moment, a fully "portable" way to avoid
> this problem is to use the thread specific storage (TSD/TLS/__thread
> or whatever you call it). Here is a compact illustration in Java:
>
> class Singleton {
> private static Singleton theInstance;
>
> private static final ThreadLocal tlsInstance =
> new ThreadLocal() {
> protected synchronized Object initialValue() {
> if (theInstance == null)
> theInstance = new Singleton();
> return theInstance;
> }
> };
>
> public static Singleton getInstance() {
> return (Singleton)tlsInstance.get();
> }
> }

If I read this properly (I don't know Java, sorry), there is no
double-checked locking. So it would seem that this solution boils down to
"don't use double-checked locking, and do use some non-standard mechanism
for thread-local storage." Is that an accurate summary?

> Now, and the "right" way to avoid this problem in C++ (well,
> explicitly synchronized static locals aside for a moment) is to use
> something along the lines of "once" template:

I took a look at
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/boost/boost/libs/thread/src/once.cpp?
rev=HEAD&content-type=text/vnd.viewcvs-markup
(is there a better reference?), and for pthreads, it appears that
double-checked locking is not employed. For WinThreads, it looks like it
is, but look at this part:

if (compare_exchange(&flag, 1, 1) == 0) // 2nd check
{
func(); // invoke "once" func
InterlockedExchange(&flag, 1); // set "called" bit
}

Again, assuming an aggressive optimizing compiler that can see through
function pointers and across function call boundaries (e.g., via
full-program optimization, which is available on at least two compilers I
know -- Intel's (in general) and Microsoft's (when generating managed
code)) what prevents a compiler from using the as-if rule to reorder the
block so that func is called after the call to InterLockedExchange?

Scott

Hyman Rosen

unread,

May 4, 2003, 5:07:26 PM5/4/03

to

Scott Meyers wrote:
> what prevents a compiler from using the as-if rule to reorder the
> block so that func is called after the call to InterLockedExchange?

Perhaps it recognizes it as a synchronization mechanism (because it
has been told to do so)?

James Kanze

unread,

May 4, 2003, 5:48:07 PM5/4/03

to

Use...@aristeia.com (Scott Meyers) writes:

|> I took a look at
|> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/boost/boost/libs/thread/src/once.cpp?
|> rev=HEAD&content-type=text/vnd.viewcvs-markup (is there a better
|> reference?), and for pthreads, it appears that double-checked
|> locking is not employed. For WinThreads, it looks like it is, but
|> look at this part:

|> if (compare_exchange(&flag, 1, 1) == 0) // 2nd check
|> {
|> func(); // invoke "once" func
|> InterlockedExchange(&flag, 1); // set "called" bit
|> }

|> Again, assuming an aggressive optimizing compiler that can see
|> through function pointers and across function call boundaries
|> (e.g., via full-program optimization, which is available on at
|> least two compilers I know -- Intel's (in general) and Microsoft's
|> (when generating managed code)) what prevents a compiler from
|> using the as-if rule to reorder the block so that func is called
|> after the call to InterLockedExchange?

I'm not sure about the exact code above, but I have seen some attempts
using assembler code (which presumably, the compiler optimizer will
consider inviolable, and will also not do things like code motion
around). Some tricks are available for certain architectures; I know,
for example, how to get guaranteed results on a Sparc. Note, however,
that guaranteed results involve a memory barrier instruction; the few
benchmarks I've run suggest that this isn't much faster than just
acquiring the lock (on a Sparc under Solaris 2.8, at least).

--
James Kanze mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France Tel. +33 1 41 89 80 93

James Kanze

unread,

May 4, 2003, 9:09:39 PM5/4/03

to

Use...@aristeia.com (Scott Meyers) writes:

|> On Mon, 17 Feb 2003 17:09:35 +0000 (UTC), Christoph Rabel wrote:

|> > MySingleton *MySingleton::Instance(void)
|> > {
|> > if(!pInstance)
|> > {
|> > LOCK(); // Do some MT-locking here
|> > if (!pInstance)
|> > pInstance = new MySingleton;
|> > UNLOCK();
|> > return sp;
|> > }

|> This is the double-checked locking pattern. I recently drafted an
|> article on this topic for CUJ. As I sit here in a pool of my own
|> blood based on the feedback I got from pre-pub reviewers, I feel
|> compelled to offer the following observation: there is, as far as
|> I know, no way to make this work on a reliable and portable basis.

Come now, I though we had a long discussion about this a couple of
years ago in clc++m.

|> The best treatment of this topic that I know of is "The
|> 'Double-Checked Locking is Broken' Declaration"
|> (http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html).
|> I suggest you not fall into the trap I did in assuming that its
|> focus on Java implies that it doesn't really apply to C++. It
|> does. My favorite paragraph from that document is this:

That's my reference as well. (I believe that I posted the link in the
previous discussion.)

|> pInstance = new MySingleton;

In practice, just putting the new in a function in a separate
compilation unit is sufficient, even if the standard doesn't guarantee
it. Wrapping the call to the separate function with a second mutex
works every time.

For this problem. It doesn't solve any of the others.

And the problem isn't just theoretical. There are actual machines on
which it fails. (Alpha's, from what I understand. The proposed Intel
64 bit architecture. I think even some execution models on newer
Sparcs.)

--
James Kanze mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France Tel. +33 1 41 89 80 93

---

James Kanze

unread,

May 4, 2003, 9:09:48 PM5/4/03

to

The exact semantics of volatile (what an access consists of) are
implementation defined. Modern multithreaded implementations vary,
but most that I know of do NOT wrap volatile accesses with a memory
barrier. For them, an access means only that the read or write cycle
has taken place on the CPU bus (which doesn't in any way mean that it
is necessarily visible to another processor).

Generally speaking, volatile was designed to solve a certain set of
problems. Multithreading wasn't one of them -- the problems involving
multithreading on a modern multiprocessor machine simply didn't exist
back then. You might be able to read more into the actual words than
was meant, but compiler implementers on modern machines usually
haven't.

--
James Kanze mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France Tel. +33 1 41 89 80 93

---

Scott Meyers

unread,

May 5, 2003, 12:58:41 AM5/5/03

to

On Sun, 4 May 2003 20:45:33 +0000 (UTC), Dyre Tjeldvoll wrote:
> struct Singleton {
> int dummy; // Just to be absolutely sure...
> static Singleton *pInstance;
> // I assume that this pointer assignment is atomic
> void useMe() { ++dummy; pInstance = this; }
> static Singleton* instance() {
> if (!pInstance) {
> LOCK();
> if (!pInstance) {
> Singleton *tmp = new Singleton;
> // The complier can't invoke useMe before tmp points to a
> // fully constructed object, can it?
> tmp->useMe();

No, but it can inline operator new, the Singleton constructor, and useMe,
then reorder the resulting set of instructions so that the as-if rule (or,
as Doug Lea more accurately terms it, the as-if-serial rule) is obeyed. In
that case, pInstance might not get set until after useMe has been called,
because the compiler doesn't need to get to dummy's memory via pInstance.

Scott

E. Mark Ping

unread,

May 5, 2003, 12:58:42 AM5/5/03

to

In article <MPG.191ef55a7...@news.hevanet.com>,

Scott Meyers <Use...@aristeia.com> wrote:
>Anyway, I don't think it matters. What makes you think that a
>compiler can't use the as-if rule to eliminate helper completely and
>generate the same code I originally posted, simply treating pInstance as
>volatile in the region where helper would have existed?

I would have expected that that was the *point* of volatile--to tell
the compiler not to do something like that. Given the portion of the
standard I cited, I would expect such an optimization to be
non-conforming.
--
Mark Ping
ema...@soda.CSUA.Berkeley.EDU

Scott Meyers

unread,

May 5, 2003, 1:30:05 AM5/5/03

to

===================================== MODERATOR'S COMMENT:
While there is considerably danger of this drifting off-topic,
multi-threaded code is under consideration as C++ evolves, and so there
is scope for topical responses on this subject. Discussion of what might
be useful in portable memory barriers, for example, might be topical.

===================================== END OF MODERATOR'S COMMENT

On Sun, 4 May 2003 21:48:07 +0000 (UTC), James Kanze wrote:
> around). Some tricks are available for certain architectures; I know,
> for example, how to get guaranteed results on a Sparc. Note, however,
> that guaranteed results involve a memory barrier instruction; the few

I was under the impression that memory barriers applied only to
multiprocessor machines. I understand how they solve cache coherencey
problems. Do memory barriers also apply to single-processor machines,
where cache coherency is not an issue? If so, what are the semantics of
memory barriers on single-processor machines running multiple threads?

I realize that this is drifting further and further from standard C++, so
if the moderators want to boot me to clcm, I'll understand.

Scott

James Kanze

unread,

May 5, 2003, 11:23:57 AM5/5/03

to

Use...@aristeia.com (Scott Meyers) wrote in message
news:<MPG.191f7213e...@news.hevanet.com>...

> On Sun, 4 May 2003 21:48:07 +0000 (UTC), James Kanze wrote:
> > around). Some tricks are available for certain architectures; I
> > know, for example, how to get guaranteed results on a Sparc. Note,
> > however, that guaranteed results involve a memory barrier
> > instruction; the few

> I was under the impression that memory barriers applied only to
> multiprocessor machines. I understand how they solve cache coherencey
> problems. Do memory barriers also apply to single-processor machines,
> where cache coherency is not an issue? If so, what are the semantics
> of memory barriers on single-processor machines running multiple
> threads?

In theory (and as far as I know, in fact), you never need a memory
barrier in a single processor machine. The problem is that most of the
time, you don't know whether you will be running on a multiprocessor
machine or not.

Within the OS itself, of course, different rules hold, since things like
disk controllers access the main memory in the same way another
processor would.

--
James Kanze GABI Software mailto:ka...@gabi-soft.fr

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

11 rue de Rambouillet, 78460 Chevreuse, France, Tél. : +33 (0)1 30 23 45 16

Alexander Terekhov

unread,

May 5, 2003, 2:33:47 PM5/5/03

to

Scott Meyers wrote:
>
> On Sat, 3 May 2003 19:54:03 +0000 (UTC), Alexander Terekhov wrote:
> > pthread_once() aside for a moment, a fully "portable" way to avoid
> > this problem is to use the thread specific storage (TSD/TLS/__thread
> > or whatever you call it). Here is a compact illustration in Java:
> >
> > class Singleton {
> > private static Singleton theInstance;
> >
> > private static final ThreadLocal tlsInstance =
> > new ThreadLocal() {
> > protected synchronized Object initialValue() {
> > if (theInstance == null)
> > theInstance = new Singleton();
> > return theInstance;
> > }
> > };
> >
> > public static Singleton getInstance() {
> > return (Singleton)tlsInstance.get();
> > }
> > }
>
> If I read this properly (I don't know Java, sorry), there is no
> double-checked locking. So it would seem that this solution boils down to
> "don't use double-checked locking, and do use some non-standard mechanism
> for thread-local storage." Is that an accurate summary?

Well, it IS the double-checking initialization "pattern" -- the
first check is done on the thread-specific variable and the second
one is done on the 'global' variable. The C/Win illustration can
be found here:

http://groups.yahoo.com/group/boost/message/15442
(Subject: Re: Boost.Threads - once functions)

>
> > Now, and the "right" way to avoid this problem in C++ (well,
> > explicitly synchronized static locals aside for a moment) is to use
> > something along the lines of "once" template:
>
> I took a look at
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/boost/boost/libs/thread/src/once.cpp?
> rev=HEAD&content-type=text/vnd.viewcvs-markup
> (is there a better reference?), and for pthreads, it appears that
> double-checked locking is not employed.

Sort of. Well, more info about it can be found in these threads:

<http://tinyurl.com/4xw6> <http://tinyurl.com/4xwf>

> For WinThreads, it looks like it
> is, but look at this part:
>
> if (compare_exchange(&flag, 1, 1) == 0) // 2nd check
> {
> func(); // invoke "once" func
> InterlockedExchange(&flag, 1); // set "called" bit
> }
>
> Again, assuming an aggressive optimizing compiler that can see through
> function pointers and across function call boundaries (e.g., via
> full-program optimization, which is available on at least two compilers I
> know -- Intel's (in general) and Microsoft's (when generating managed
> code)) what prevents a compiler from using the as-if rule to reorder the
> block so that func is called after the call to InterLockedExchange?

The memory synchronization semantics of Interlocked stuff sort-
of prevents it. Microsoft has recently attempted to "clarify"
their Interlocked stuff with respect to memory synch. They've
even introduced acquire and release versions. None-acq/rel stuff
imposes a bidirectional "FULL-STOP" reordering constraint (it's
ought to be both "software"-/compiler- and MP-hardware- wise)
with semantics of LOAD-ACQURE+STORE-RELEASE memory barrier(s).

BTW, you might want to take a look at:

http://terekhov.de/pthread_refcount_t/draft-edits.txt
http://terekhov.de/pthread_refcount_t/poor-man/beta2/prefcnt.h
http://terekhov.de/pthread_refcount_t/poor-man/beta2/prefcnt.c

regards,
alexander.

Alexander Terekhov

unread,

May 5, 2003, 2:33:49 PM5/5/03

to

Scott Meyers wrote:
[...]

> I was under the impression that memory barriers applied only to
> multiprocessor machines.

That's true of you're concerned with "ordinal" memory only and don't
care about memory-mapped I/O space and devices. You (especially if
you have something to do with the upcoming <iohw.h> for plain C[1]
and "<hardware>" for C++[2] ;-) ) might want to take a look at:

http://www.ibm.com/servers/esdd/articles/powerpc.html
(PowerPC storage model...)

regards,
alexander.

[1] http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n972.pdf
[2] http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1430.pdf

--
http://www.cs.umd.edu/~pugh/java/memoryModel/archive/1220.html
http://www.cs.umd.edu/~pugh/java/memoryModel/archive/1222.html
(Subject: JavaMemoryModel: Cookbook: barriers)

James Kanze

unread,

May 5, 2003, 2:33:58 PM5/5/03

to

Dyre.Tj...@sun.com (Dyre Tjeldvoll) wrote in message
news:<x1o3k7d6...@xiao.norway.sun.com>...
> Use...@aristeia.com (Scott Meyers) writes:

What is your point? In your code, tmp is a local variable, invisible
anywhere else. Depending on the rest of the code, the compiler might
not even generate it. But I don't see what you are trying to acheve.
Where do you assign to pInstance? If you have simply forgotten a
pInstance = tmp ;
after useMe, what does this change? The compiler is still free to make
the assignment before.

And of course, there is nothing here which ensures the cache consistency
of a processor which finds pInstance non-null in the first if. If the
memory in the Singleton happens to be in its cache, even in a state
corresponding to before the constructor has been run, there is nothing
to ensure that it will be reread.

--
James Kanze GABI Software mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, Tél. : +33 (0)1 30 23 45 16

---

Scott Meyers

unread,

May 5, 2003, 2:34:19 PM5/5/03

to

On Mon, 5 May 2003 15:23:57 +0000 (UTC), James Kanze wrote:
> In theory (and as far as I know, in fact), you never need a memory
> barrier in a single processor machine.

That's my understanding, too. My belief is that memory barriers are
necessary for solving this kind of problem in the general case (because the
general case includes multiprocessor systems), but they are not sufficienct
(because the general case also includes uniprocessor systems).

Scott

Francis Glassborow

unread,

May 5, 2003, 2:34:59 PM5/5/03

to

===================================== MODERATOR'S COMMENT:
The [buzz]word you want is "hyperthreading".

===================================== END OF MODERATOR'S COMMENT

In article <d6651fb6.03050...@posting.google.com>, James
Kanze <ka...@gabi-soft.de> writes

>In theory (and as far as I know, in fact), you never need a memory
>barrier in a single processor machine. The problem is that most of the
>time, you don't know whether you will be running on a multiprocessor
>machine or not.

Even on the latest Intel CPU's? Where I believe they have some clever
mechanism to make 1 look like 2?

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Andrew F. Vesper

unread,

May 5, 2003, 10:37:58 PM5/5/03

to

James Kanze wrote:

> In theory (and as far as I know, in fact), you never need a memory
> barrier in a single processor machine.

Digital's (1) Alpha (2) processors often had a small "write cache"
inside the CPU that buffered up one cache line. This could turn
the code:

a[3] = 1;
a[2] = 2;
a[1] = 3;
a[0] = 4;

into one write to the cache, which could then write to the
actual memory in the opposite order. When writing a device
driver for a graphics board, we had to analyze the order of
writes and throw in memory barriers occasionally.

However, I do agree that a standard-conforming ;-) program
would not be able to tell that the writes were re-ordered.

(1) When there was a Digital.
(2) When there was an Alpha processor from Digital.

--
Andy V (OpenGL Alpha Geek)
"In order to make progress, one must leave the door to the unknown ajar."
Richard P. Feynman, quoted by Jagdish Mehra in _The Beat of a Different Drum_.

Dyre Tjeldvoll

unread,

May 7, 2003, 12:25:51 AM5/7/03

to

Use...@aristeia.com (Scott Meyers) writes:

> On Sun, 4 May 2003 20:45:33 +0000 (UTC), Dyre Tjeldvoll wrote:
> > struct Singleton {
> > int dummy; // Just to be absolutely sure...
> > static Singleton *pInstance;
> > // I assume that this pointer assignment is atomic
> > void useMe() { ++dummy; pInstance = this; }
> > static Singleton* instance() {
> > if (!pInstance) {
> > LOCK();
> > if (!pInstance) {
> > Singleton *tmp = new Singleton;
> > // The complier can't invoke useMe before tmp points to a
> > // fully constructed object, can it?
> > tmp->useMe();
>
> No, but it can inline operator new, the Singleton constructor, and useMe,
> then reorder the resulting set of instructions so that the as-if rule (or,
> as Doug Lea more accurately terms it, the as-if-serial rule) is
> obeyed.

It seemed too simple... :)

I'm sorry, I don't want to be a pain, but I really don't understand
the following sentence:

> In that case, pInstance might not get set until after useMe has been
> called,

Why does that matter? Does it matter when it gets assigned, as long as
it happens after the object is constructed? Or did you mean that it
still could happen before the call to the constructor and/or operator new?

> because the compiler doesn't need to get to dummy's memory
> via pInstance.

OK, but... maybe I need to look at this from a different angle,
because I put dummy in there to ensure that the this-pointer was
dereferenced inside useMe()...

Maybe I should just shut up and read about the as-if-serial rule.

--
dt

Dyre Tjeldvoll

unread,

May 7, 2003, 5:33:20 AM5/7/03

to

ka...@gabi-soft.de (James Kanze) writes:

> > #include <iostream>
> > // I don't have an MT-setup available
> > #define LOCK()
> > #define UNLOCK()
>
> > struct Singleton {
> > int dummy; // Just to be absolutely sure...
> > static Singleton *pInstance;
> > // I assume that this pointer assignment is atomic
> > void useMe() { ++dummy; pInstance = this; }
> > static Singleton* instance() {
> > if (!pInstance) {
> > LOCK();
> > if (!pInstance) {
> > Singleton *tmp = new Singleton;
> > // The complier can't invoke useMe before tmp points to a
> > // fully constructed object, can it?
> > tmp->useMe();
> > //
> > }
> > UNLOCK();
> > }
> > return pInstance;
> > }
> > };

First of all, thank you for taking the time to look at my post :)

> What is your point?

That I thought it would be safe to make the assignment to pInstance inside
useMe() (a non-static member function). Which it isn't, as Scott
pointed out...

> In your code, tmp is a local variable, invisible
> anywhere else. Depending on the rest of the code, the compiler might
> not even generate it. But I don't see what you are trying to acheve.
> Where do you assign to pInstance? If you have simply forgotten a
> pInstance = tmp ;
> after useMe, what does this change? The compiler is still free to make
> the assignment before.

See comment above.

> And of course, there is nothing here which ensures the cache consistency
> of a processor which finds pInstance non-null in the first if. If the
> memory in the Singleton happens to be in its cache, even in a state
> corresponding to before the constructor has been run, there is nothing
> to ensure that it will be reread.

Agreed, but I understood Scott's question to be about the specific
compiler optimization, and not the double checking pattern in
general... maybe I was wrong.

--
dt

James Kanze

unread,

May 7, 2003, 5:34:10 AM5/7/03

to

Use...@aristeia.com (Scott Meyers) wrote in message

news:<MPG.192030b3f...@news.hevanet.com>...

> On Mon, 5 May 2003 15:23:57 +0000 (UTC), James Kanze wrote:
> > In theory (and as far as I know, in fact), you never need a memory
> > barrier in a single processor machine.

> That's my understanding, too. My belief is that memory barriers are
> necessary for solving this kind of problem in the general case
> (because the general case includes multiprocessor systems), but they
> are not sufficienct (because the general case also includes
> uniprocessor systems).

I'm not quite sure what you are getting at with necessary but not
sufficient. Obviously, you need more than just write barriers -- if
nothing else, you have to put the right things between the barriers:-).
Generally, you will also need to take steps to ensure certain operations
are atomic. However, all of these are necessary regardless of the
number of processors; memory barriers are necessary in addition to, and
not instead of, when you have more than one processor.

More or less. It is possible to implement multiprocessor architectures
where they aren't necessary, and it is probably possible to implement a
single processor architecture where they are. And of course, exactly
what is meant be a memory barrier depends on the processor -- from what
I understand, Intel architectures generate one automatically (in the
hardware) before and after an instruction with a lock prefix, for
example, whereas Sparcs have a special separate instruction (and various
modes, some of which are write through).

In the end, about the only answer which is sure to be correct is that it
depends on the implementation.

--
James Kanze GABI Software mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, Tél. : +33 (0)1 30 23 45 16

---

Scott Meyers

unread,

May 7, 2003, 1:54:28 PM5/7/03

to

On Mon, 5 May 2003 18:33:49 +0000 (UTC), Alexander Terekhov wrote:
> Scott Meyers wrote:
> [...]
> > I was under the impression that memory barriers applied only to
> > multiprocessor machines.
>
> That's true of you're concerned with "ordinal" memory only and don't
> care about memory-mapped I/O space and devices.

Which suggests that memory barriers are not helpful for solving the general
problem of double-checked locking in a threaded environment.

Scott

Alexander Terekhov

unread,

May 7, 2003, 6:08:52 PM5/7/03

to

Scott Meyers wrote:
>
> On Mon, 5 May 2003 18:33:49 +0000 (UTC), Alexander Terekhov wrote:
> > Scott Meyers wrote:
> > [...]
> > > I was under the impression that memory barriers applied only to
> > > multiprocessor machines.
> >
> > That's true of you're concerned with "ordinal" memory only and don't
> > care about memory-mapped I/O space and devices.
>
> Which suggests that memory barriers are not helpful for solving the general
> problem of double-checked locking in a threaded environment.

Let me put it this way: once you solve all problems with respect to
atomicity and compiler-induced reordering, you don't need to worry
about hardware-induced reordering on a uniprocessor... if you're

concerned with "ordinal" memory only and don't care about memory-
mapped I/O space and devices.

Well, I do believe that "memory barriers" (embedded into atomic<>
stuff) COULD be very, very helpful for solving all sorts of
synchronization problems. The only problem I see is that reaching
consensus wouldn't be an easy task, I'm afraid.

Consider the following "illustration"... also an initialization
pattern, but I wouldn't call it DCI (or DCL-if-you-so-like-it):

atomic<stuff*> instance_ptr = ATOMIC_INITIALIZER(0); // static

stuff & instance() {
stuff * ptr;
if (0 == (ptr = instance_ptr.load_ddrmb())) {
ptr = new stuff();
if (!instance_ptr.attempt_update_wmb(ptr, 0)) { // too late
delete ptr;
if (0 == (ptr = instance_ptr.load_ddrmb()))
abort();
}
else { // only one thread can reach here
static deleter<stuff> cleanup(ptr);
}
}
return *ptr;
}

Well,

http://google.com/groups?threadm=3E60CF71.9784884F%40web.de
(Subject: Re: Acquire/Release memory synchronization....)

regards,
alexander.

Mirek Fidler

unread,

May 8, 2003, 2:56:33 PM5/8/03

to

> On Mon, 17 Feb 2003 17:09:35 +0000 (UTC), Christoph Rabel wrote:

> > MySingleton *MySingleton::Instance(void)
> > {
> > if(!pInstance)
> > {
> > LOCK(); // Do some MT-locking here
> > if (!pInstance)

> > pInstance = new MySingleton;
> > UNLOCK();
> > return sp;
> > }
>
> This is the double-checked locking pattern. I recently drafted an
article on
> this topic for CUJ. As I sit here in a pool of my own blood based on
the
> feedback I got from pre-pub reviewers, I feel compelled to offer the
following

> observation: there is, as far as I know, no way to make this work on a
reliable
> and portable basis.

Seems to me that result of this whole discussion thread is:

- there is no way how to do it portable
- that is why something like static singleton or simliar tool should
be part of standard library

Now the interesting topic could be how such tool might look like...

Mirek

Scott Meyers

unread,

May 9, 2003, 1:19:42 PM5/9/03

to

On Wed, 7 May 2003 09:34:10 +0000 (UTC), James Kanze wrote:
> I'm not quite sure what you are getting at with necessary but not
> sufficient. Obviously, you need more than just write barriers -- if
> nothing else, you have to put the right things between the barriers:-).

What I meant was that the general problem includes multiple threads on a
single processor (as in my original post) and multiple threads across
multiple processors. I agree that memory barriers are necessary to make
things work in the multiprocessor scenario. However, I believe that in the
uniprocessor scenario, memory barriers don't help at all.

Scott

Scott Meyers

unread,

May 9, 2003, 1:19:53 PM5/9/03

to

On Wed, 7 May 2003 04:25:51 +0000 (UTC), Dyre Tjeldvoll wrote:
> Use...@aristeia.com (Scott Meyers) writes:
>
> > On Sun, 4 May 2003 20:45:33 +0000 (UTC), Dyre Tjeldvoll wrote:
> > > struct Singleton {
> > > int dummy; // Just to be absolutely sure...
> > > static Singleton *pInstance;
> > > // I assume that this pointer assignment is atomic
> > > void useMe() { ++dummy; pInstance = this; }
> > > static Singleton* instance() {
> > > if (!pInstance) {
> > > LOCK();
> > > if (!pInstance) {
> > > Singleton *tmp = new Singleton;
> > > // The complier can't invoke useMe before tmp points to a
> > > // fully constructed object, can it?
> > > tmp->useMe();
> >

> I'm sorry, I don't want to be a pain, but I really don't understand
> the following sentence:
>
> > In that case, pInstance might not get set until after useMe has been
> > called,

I don't blame you. It's a lousy sentence. What I meant was that, under
some conditions (e.g., when we know that the Singleton constructor cannot
throw), the compiler can inline operator new, the Singleton constructor,
and useMe, do flow analysis to determine that tmp is unnecessary, then
reorder things to generate code that looks basically like this:

pInstance = operator new(sizeof(Singleton));
new (pInstance) Singleton;
pInstance->++dummy;

Scott

Alexander Terekhov

unread,

May 9, 2003, 1:20:49 PM5/9/03

to

Mirek Fidler wrote:
[...]

> Now the interesting topic could be how such tool might look like...

New keywords aside for a moment, such tool might look like
"__attribute__((thread-shared))" or something like that. For example:

int you_name_it() {
static int i __attribute__((thread-shared)) = calculate(/*...*/);
return i;
}

regards,
alexander.

--
http://groups.google.com/groups?threadm=3EB82EA0.F40E66C4%40web.de
(Subject: __attribute__((cleanup(function)) versus try/finally)

James Dennett

unread,

May 9, 2003, 2:34:08 PM5/9/03

to

Scott Meyers wrote:
> On Wed, 7 May 2003 09:34:10 +0000 (UTC), James Kanze wrote:
>
>>I'm not quite sure what you are getting at with necessary but not
>>sufficient. Obviously, you need more than just write barriers -- if
>>nothing else, you have to put the right things between the barriers:-).
>
>
> What I meant was that the general problem includes multiple threads on a
> single processor (as in my original post) and multiple threads across
> multiple processors. I agree that memory barriers are necessary to make
> things work in the multiprocessor scenario. However, I believe that in the
> uniprocessor scenario, memory barriers don't help at all.

There are (at least) two layers that might re-order memory
accesses: hardware and software.

On uniprocessor machines, the hardware at least provides
the illusion that it is not reordering memory accesses, in
all cases I'm aware of. For multiprocessor machines this
doesn't hold.

In software, compilers rearrange memory accesses. Under
current C++ rules, they are free to do so under the "as if"
rule -- and the "as if" rule applies if a single-threaded
application couldn't tell the difference, because that is
all ISO C++ recognizes to date.

The term "memory barrier" is most often applied to the
hardware level, but can also apply to software; in software,
it is a boundary which inhibits the motion of memory
accesses.

If a future C++ standard has support for multi-threaded
environments, as I hope will be the case, it might well
specify something equivalent to a memory barrier -- which
would be required to apply both to the compiler's memory
access rearrangements and to the hardware, where applicable.

Languages such as Java (and AFAIK, C#) specify higher-level
support for ensuring visibility of writes between threads
without exposing memory barrier type functionality directly.
Given that a goal of C++ is to leave no room for a lower
level language above assembler, it's possible that C++ should
expose some more of the details.

As an aside: I searched for some time through the documentation
for a compiler on an embedded platform to find its rules for
visibility of writes to variables between tasks, but to no avail.
When I contacted technical support, and found a person who could
understand the question, the answer was that the compiler treated
any (non-inline?) function call as a memory barrier, because that
is what embedded programmers expect. He did also accept that the
documentation should have mentioned something to this effect.

-- James.

James Kanze

unread,

May 11, 2003, 3:23:07 PM5/11/03

to

Use...@aristeia.com (Scott Meyers) writes:

|> On Wed, 7 May 2003 09:34:10 +0000 (UTC), James Kanze wrote:
|> > I'm not quite sure what you are getting at with necessary but
|> > not sufficient. Obviously, you need more than just write
|> > barriers -- if nothing else, you have to put the right things
|> > between the barriers:-).

|> What I meant was that the general problem includes multiple
|> threads on a single processor (as in my original post) and
|> multiple threads across multiple processors. I agree that memory
|> barriers are necessary to make things work in the multiprocessor
|> scenario. However, I believe that in the uniprocessor scenario,
|> memory barriers don't help at all.

They don't normally change anything. But how on earth are you
supposed to limit your application to single processor setups? Most
systems today support multi-processor set-ups, and an awful lot of
servers actually run them. Even if you are writing client code, you
have to consider that some people will run the client code on the
server as well (and if it isn't multi-processor today, it might be
tomorrow).

In general, as well, I believe it is impossible to specify that all of
the threads of your process must run on a single processor. At least
not and also gain the advantages of multi-threading.

--
James Kanze mailto:ka...@gabi-soft.fr

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

11 rue de Rambouillet, 78460 Chevreuse, France Tel. +33 1 41 89 80 93

---

Alexander Terekhov

unread,

May 13, 2003, 11:11:51 AM5/13/03

to

James Dennett wrote:
[...]

> There are (at least) two layers that might re-order memory
> accesses: hardware and software.
>
> On uniprocessor machines, the hardware at least provides
> the illusion that it is not reordering memory accesses, in
> all cases I'm aware of.

When was the last time you wrote a device driver? ;-)

> For multiprocessor machines this
> doesn't hold.
>
> In software, compilers rearrange memory accesses. Under
> current C++ rules, they are free to do so under the "as if"
> rule -- and the "as if" rule applies if a single-threaded
> application couldn't tell the difference, because that is
> all ISO C++ recognizes to date.
>
> The term "memory barrier" is most often applied to the
> hardware level, but can also apply to software; in software,
> it is a boundary which inhibits the motion of memory
> accesses.
>
> If a future C++ standard has support for multi-threaded
> environments, as I hope will be the case, it might well
> specify something equivalent to a memory barrier -- which
> would be required to apply both to the compiler's memory
> access rearrangements and to the hardware, where applicable.
>
> Languages such as Java (and AFAIK, C#) specify higher-level
> support for ensuring visibility of writes between threads
> without exposing memory barrier type functionality directly.
> Given that a goal of C++ is to leave no room for a lower
> level language above assembler, it's possible that C++ should
> expose some more of the details.

Well said.

<Forward Inline> (don't miss "/*std::*/atomic<std::size_t>" ;-) )

-------- Original Message --------
Newsgroups: comp.programming.threads
Subject: Re: Atomic ops on Intel: do they sync?

Attila Feher wrote:
[...]
> Basically I rather do reference counted stuff ...

http://terekhov.de/pthread_refcount_t/draft-edits.txt
http://terekhov.de/pthread_refcount_t/poor-man/beta2/prefcnt.h

Now,

#include <cerrno> // for EDOM below
#include <cassert>
#include <cstddef>

#define __STDC_LIMIT_MACROS // see C99 std
#include <stdint.h> // see C99 std

#define PTHREAD_REFCOUNT_MAX SIZE_MAX
#define PTHREAD_REFCOUNT_DROPPED_TO_ZERO EDOM // for now
#define PTHREAD_REFCOUNT_INITIALIZER(N) { N }

struct pthread_refcount_t_ {
/*std::*/atomic<std::size_t> atomic;
};

typedef struct pthread_refcount_t_ pthread_refcount_t;

int pthread_refcount_getvalue(
pthread_refcount_t * refcount
, std::size_t * value_ptr
)
{
*value_ptr = refcount->atomic.load(); // Naked
return 0;
}

int pthread_refcount_setvalue(
pthread_refcount_t * refcount
, std::size_t value
)
{
refcount->atomic.store(value); // Naked
return 0;
}

int pthread_refcount_increment(
pthread_refcount_t * refcount
)
{
std::size_t val;
do {
val = refcount->atomic.load(); // Naked
assert(PTHREAD_REFCOUNT_MAX > val);
} while (!refcount->atomic.attempt_update(val, val+1)); // Naked
return 0;
}

int pthread_refcount_add(
pthread_refcount_t * refcount
, std::size_t value
)
{
if (PTHREAD_REFCOUNT_MAX < value) return ERANGE;
std::size_t val, max = PTHREAD_REFCOUNT_MAX - value;
do {
val = refcount->atomic.load(); // Naked
if (max < val) return ERANGE;
} while (!refcount->atomic.attempt_update(val, val+value)); // Naked
return 0;
}

int pthread_refcount_increment_positive(
pthread_refcount_t * refcount
)
{
std::size_t val;
do {
val = refcount->atomic.load(); // Naked
if (!val) return PTHREAD_REFCOUNT_DROPPED_TO_ZERO;
assert(PTHREAD_REFCOUNT_MAX > val);
} while (!refcount->atomic.attempt_update(val, val+1)); // Naked
return 0;
}

int pthread_refcount_add_to_positive(
pthread_refcount_t * refcount
, std::size_t value
)
{
if (PTHREAD_REFCOUNT_MAX < value) return ERANGE;
std::size_t val, max = PTHREAD_REFCOUNT_MAX - value;
do {
val = refcount->atomic.load(); // Naked
if (!val) return PTHREAD_REFCOUNT_DROPPED_TO_ZERO;
if (max < val) return ERANGE;
} while (!refcount->atomic.attempt_update(val, val+value)); // Naked
return 0;
}

int pthread_refcount_decrement_acqmsync(
pthread_refcount_t * refcount
)
{
std::size_t val;
do {
val = refcount->atomic.load(); // Naked
assert(val);
if (1 == val) {
refcount->atomic.store_acq(0); // Acquire
return PTHREAD_REFCOUNT_DROPPED_TO_ZERO;
}
} while (!refcount->atomic.attempt_update(val, val-1)); // Naked
return 0;
}

int pthread_refcount_decrement_relmsync(
pthread_refcount_t * refcount
)
{
std::size_t val;
do {
val = refcount->atomic.load(); // Naked
assert(val);
if (1 == val) {
refcount->atomic.store(0); // Naked
return PTHREAD_REFCOUNT_DROPPED_TO_ZERO;
}
} while (!refcount->atomic.attempt_update_rel(val, val-1)); // Release
return 0;
}

int pthread_refcount_decrement(
pthread_refcount_t * refcount
)
{
std::size_t val;
do {
val = refcount->atomic.load(); // Naked
assert(val);
if (1 == val) {
refcount->atomic.store_acq(0); // Acquire
return PTHREAD_REFCOUNT_DROPPED_TO_ZERO;
}
} while (!refcount->atomic.attempt_update_rel(val, val-1)); // Release
return 0;
}

int pthread_refcount_subtract_acqmsync(
pthread_refcount_t * refcount
, std::size_t value
)
{
if (PTHREAD_REFCOUNT_MAX < value) return ERANGE;
std::size_t val;
do {
val = refcount->atomic.load(); // Naked
if (value > val) return ERANGE;
if (value == val) {
refcount->atomic.store_acq(0); // Acquire
return PTHREAD_REFCOUNT_DROPPED_TO_ZERO;
}
} while (!refcount->atomic.attempt_update(val, val-value)); // Naked
return 0;
}

int pthread_refcount_subtract_relmsync(
pthread_refcount_t * refcount
, std::size_t value
)
{
if (PTHREAD_REFCOUNT_MAX < value) return ERANGE;
std::size_t val;
do {
val = refcount->atomic.load(); // Naked
if (value > val) return ERANGE;
if (value == val) {
refcount->atomic.store(0); // Naked
return PTHREAD_REFCOUNT_DROPPED_TO_ZERO;
}
} while (!refcount->atomic.attempt_update_rel(val, val-value)); // Release
return 0;
}

int pthread_refcount_subtract(
pthread_refcount_t * refcount
, std::size_t value
)
{
if (PTHREAD_REFCOUNT_MAX < value) return ERANGE;
std::size_t val;
do {
val = refcount->atomic.load(); // Naked
if (value > val) return ERANGE;
if (value == val) {
refcount->atomic.store_acq(0); // Acquire
return PTHREAD_REFCOUNT_DROPPED_TO_ZERO;
}
} while (!refcount->atomic.attempt_update_rel(val, val-value)); // Release
return 0;
}

Bug-reports/suggestions/objections/whatever are quite welcome. ;-)

regards,
alexander.

Ken Hagan

unread,

May 14, 2003, 1:22:27 PM5/14/03

to

> James Dennett wrote:
> [...]

>> On uniprocessor machines, the hardware at least provides
>> the illusion that it is not reordering memory accesses, in
>> all cases I'm aware of.

Alexander Terekhov wrote:
>
> When was the last time you wrote a device driver? ;-)

By which I presume you mean that some hardware device might be
able to detect (and malfunction because of) the actual order in
which memory is accessed, yes?

OK, that's a fair caveat. "If you are writing a device driver,
then the device may raise many of the same ordering issues as
a second processor. Both can see things that the CPU hides from
its own instruction stream."

Nick Thurn

unread,

May 15, 2003, 12:45:55 PM5/15/03

to

"Scott Meyers" <Use...@aristeia.com> wrote in message
news:MPG.191ef5f7e...@news.hevanet.com...
> On Sat, 3 May 2003 20:20:48 +0000 (UTC), Jon Biggar wrote:
> > Making pInstance volitile, assigning the result of the new to a local
> > variable and then copying to pInstance would help, but I'm sure there
> > are still other subtle difficulties...
>
> Remember that good compilers do extensive dataflow analysis, and they
> eliminate intermediate variables that are unnecessary. Some do this
across
> function call boundaries, even for non-inline functions defined in
separate
> translation units.
>

So presumably this won't work either?

MySingleton* MySingleton::MakeInstance()
{
LOCK();
static MySingleton* s = new MySingleton;
UNLOCK();
return s;
}

MySingleton *MySingleton::Instance()
{
static MySingleton* sp = MakeInstance();
return sp;
}

What is required for the language to become predictable in this case?

cheers
Nick

Balog Pal

unread,

May 16, 2003, 1:22:53 PM5/16/03

to

""Nick Thurn"" <thu...@bigpond.com> wrote in message
news:q9Lwa.34822$1s1.5...@newsfeeds.bigpond.com...

> MySingleton* MySingleton::MakeInstance()
> {
> LOCK();
> static MySingleton* s = new MySingleton;
> UNLOCK();
> return s;
> }
>
> MySingleton *MySingleton::Instance()
> {
> static MySingleton* sp = MakeInstance();
> return sp;
> }

That's not a bit different from what was presented before.
Init of sp is not mutexed, so that is the not safe thing.

Multiple threads can see sp as not yet inited and start the init procedure.
So you'll have multiple singletons constructed, and somewhat later all but
one pointers to them lost. And some of the first callers can get different
instances.
If access to the pointer is not atomic you may even get a broken pointer
value.

Paul

Nick Thurn

unread,

May 17, 2003, 10:33:39 PM5/17/03

to

""Balog Pal"" <pa...@lib.hu> wrote in message
news:3ec5...@andromeda.datanet.hu...

>
> ""Nick Thurn"" <thu...@bigpond.com> wrote in message
> news:q9Lwa.34822$1s1.5...@newsfeeds.bigpond.com...
>
> > MySingleton* MySingleton::MakeInstance()
> > {
> > LOCK();
> > static MySingleton* s = new MySingleton;

^^^^^

> > UNLOCK();
> > return s;
> > }
> >
> > MySingleton *MySingleton::Instance()
> > {
> > static MySingleton* sp = MakeInstance();
> > return sp;
> > }
>
> That's not a bit different from what was presented before.
> Init of sp is not mutexed, so that is the not safe thing.
>

So the issue is that despite the locking around the actual creation
in MakeInstance the allocated raw memory may still be passed back
to sp in Instance *prior* to construction?

Two things spring to mind:

First:

MySingleton* MySingleton::MakeInstance()
{
LOCK();
static MySingleton* s = new MySingleton;

MySingleton* alias = s;
alias->SomeFunctionOrOther();

UNLOCK();
return alias;
}

but then I'm no guru (as has been proved multiple times before).

or Second:
Setup an initialisation chain that runs *prior* to spawning multiple
threads.
Not a solution but maybe it would work ;-).

> Multiple threads can see sp as not yet inited and start the init
procedure.
> So you'll have multiple singletons constructed, and somewhat later all but
> one pointers to them lost. And some of the first callers can get different
> instances.
>

That shouldn't matter as my naive view is that only one singleton would be
constructed hence worst case is your point below.

> If access to the pointer is not atomic you may even get a broken pointer
> value.
>

Bit of a show stopper I guess. Third solution - don't write MT code - works
for me ;-)

cheers
Nick

Balog Pal

unread,

May 19, 2003, 1:38:37 PM5/19/03

to

""Nick Thurn"" <thu...@bigpond.com> wrote in message

news:Ivgxa.35778$1s1.5...@newsfeeds.bigpond.com...

> > > MySingleton* MySingleton::MakeInstance()
> > > {
> > > LOCK();
> > > static MySingleton* s = new MySingleton;
> ^^^^^
> > > UNLOCK();
> > > return s;
> > > }
> > >
> > > MySingleton *MySingleton::Instance()
> > > {
> > > static MySingleton* sp = MakeInstance();
> > > return sp;
> > > }
> >
> > That's not a bit different from what was presented before.
> > Init of sp is not mutexed, so that is the not safe thing.
> >
> So the issue is that despite the locking around the actual creation
> in MakeInstance the allocated raw memory may still be passed back
> to sp in Instance *prior* to construction?

Consider how a blocked static is implemented by a compiler (logically --
actual implementation may e different, but this is a legal and
used-in-practice solution):

>MySingleton *MySingleton::Instance()
>{
>static MySingleton* sp = MakeInstance();
>return sp;
>}

becomes:

static int $MySingleton$Instance = 0; // zero-inited statically
static MySingleton* $MySingleton$sp; // initial value arbitrary

MySingleton *MySingleton::Instance()
{
//line static MySingleton* sp = MakeInstance();
if($MySingleton$Instance == 0)
{ //**1
$MySingleton$sp = /* 2 */ MakeInstance(); // actual ctor call and/or
init expression here
//**3
$MySingleton$Instance = 1;
}
//line return sp;
return $MySingleton$sp;
}

You can see it's not different from the original solution with the object
itself. The makeinstance call is mutexed inside, but its call is not, so it
can happen multiple times.
Consider this thread is perrmpted at 1 or 2 or 3. Then another thread
executes this function, and yet another. How MakeInstance is implemented
is completely irrelevant.

> or Second:
> Setup an initialisation chain that runs *prior* to spawning multiple
> threads.
> Not a solution but maybe it would work ;-).

Sure, that is the proposed resolution for the cases wher it is
possible/reasonable.

However one branch of the cases concentrate on the "if needed" part of the
singleton creation. And the aim is to eliminate the construction if no one
really comes to use an instance. Preconstruction completely defeats that
aim.

> That shouldn't matter as my naive view is that only one singleton would be
> constructed hence worst case is your point below.

If threading comes in picture please do humankind a favor and put aside tha
naive views. Already people died as consequence of unobserved race
conditions in software controlling radiation dosage.

> > If access to the pointer is not atomic you may even get a broken pointer
> > value.
> >
> Bit of a show stopper I guess.

That at least manifests clearly. While a situation where some pieces of
software use distinct instances of a supposed singleton are more subtle.

As with most threading-related problems, to surface you need some
collisions. To actually have multiple threads try something at the same
moment. That's hard to produce in a test environment. Can be next to
impossible.

You must deal with all those problems in careful design, a trial and error
aproach, even if backed up with a test suite will not do.

> Third solution - don't write MT code - works for me ;-)

That's my general suggestion. Don't do anything you don't understand enough
to do it safe. Unfortunately many programmers think writing threads is
cool and sexy. And when frameworks make easy to launch threads they start
threads -- even for situations with no parallelism.

Paul

Nick Thurn

unread,

May 20, 2003, 5:03:22 PM5/20/03

to

""Balog Pal"" <pa...@lib.hu> wrote in message

news:3ec7...@andromeda.datanet.hu...

> ""Nick Thurn"" <thu...@bigpond.com> wrote in message
> news:Ivgxa.35778$1s1.5...@newsfeeds.bigpond.com...

> Consider how a blocked static is implemented by a compiler (logically --

> static int $MySingleton$Instance = 0; // zero-inited statically
> static MySingleton* $MySingleton$sp; // initial value arbitrary
>
> MySingleton *MySingleton::Instance()
> {
> //line static MySingleton* sp = MakeInstance();
> if($MySingleton$Instance == 0)
> { //**1
> $MySingleton$sp = /* 2 */ MakeInstance(); // actual ctor call
and/or
> init expression here
> //**3
> $MySingleton$Instance = 1;
> }
> //line return sp;
> return $MySingleton$sp;
> }
>
> You can see it's not different from the original solution with the object
> itself. The makeinstance call is mutexed inside, but its call is not, so
it
> can happen multiple times.
> Consider this thread is perrmpted at 1 or 2 or 3. Then another thread
> executes this function, and yet another. How MakeInstance is
implemented
> is completely irrelevant.
>

Hi Paul,

Thanks for the explaination.
Now in the spirit of "never say die" could you explain why this is flawed?

static bool sp_ok_1 = false;
static bool sp_ok_2 = false;
static MySingleton* sp = 0;

MySingleton *MySingleton::Instance()
{
if (sp_ok_1 == false)
{
LOCK();
if (sp_ok_2 == false)
{
sp = new MySingleton;
sp_ok_2 = true;
sp_ok_1 = true;
}
UNLOCK();
}
return sp;
}

my assumptions are that:
- the pointer sp is never inspected hence backwash of raw
memory during construction is not an issue
- bools (or an equivalent eg chars) should be atomically
assignable
- sp_ok_2 is always protected by a mutex so cannot be in an
undefined state when checked
- sp_ok_1 and sp will be synced due to the lock acquisition
if sp_ok_1 is seen to be false but the thread does not
win the lock first

assuming this is also a bad solution - why is it so?

cheers
Nick

Hyman Rosen

unread,

May 21, 2003, 1:19:42 PM5/21/03

to

Nick Thurn wrote:
> assuming this is also a bad solution - why is it so?

For the same reason as always. The hardware may reorder the memory
writes such that sp_ok_1 is found to be true before sp has a valid
value.

Whne multiple threads need to access a shared resource, they must
use the defined API for doing this. No other solution can work.

James Kanze

unread,

May 21, 2003, 1:21:31 PM5/21/03

to

thu...@bigpond.com ("Nick Thurn") wrote in message
news:<2Hvya.38825$1s1.5...@newsfeeds.bigpond.com>...

> Thanks for the explaination.

> Now in the spirit of "never say die" could you explain why this is
> flawed?

Simple. It is no different from the abouve.

> static bool sp_ok_1 = false;
> static bool sp_ok_2 = false;
> static MySingleton* sp = 0;

> MySingleton *MySingleton::Instance()
> {
> if (sp_ok_1 == false)
> {
> LOCK();
> if (sp_ok_2 == false)
> {
> sp = new MySingleton;
> sp_ok_2 = true;
> sp_ok_1 = true;
> }
> UNLOCK();
> }
> return sp;
> }

> my assumptions are that:
> - the pointer sp is never inspected hence backwash of raw
> memory during construction is not an issue

The pointer sp is read in the return statement. There is absolutely
nothing to ensure the ordering of the writes in the if block, nor that
of any writes in the constructor of MySingleton.

> - bools (or an equivalent eg chars) should be atomically
> assignable

Formally, not guaranteed, but practically, this seems a safe assumption.

> - sp_ok_2 is always protected by a mutex so cannot be in an
> undefined state when checked

That's true.

> - sp_ok_1 and sp will be synced due to the lock acquisition
> if sp_ok_1 is seen to be false but the thread does not
> win the lock first

Two problems:

- Even in the simplest environments, sp_ok_1 can be written before sp
(or any of the other data). The result is that
MySingleton::Instance can return a NULL pointer (or if sp has been
written, but the writes in the constructor haven't taken place, it
can return a pointer to a partially constructed object.

- In more complex environments, you can have the problem that even if
one processor has written everything in the same order, another
processor may read in a different order, and find the results of the
assignment to sp_ok_1, but not of the other writes. Same results as
above.

The Posix standard says that if more than one thread can access a
variable at a time, and any one thread may modify it, the behavior is
undefined. Since there is nothing to prevent a second thread from
accessing sp_ok_1 (and as a result sp) while a first thread is writing
them, the code has undefined behavior.

My assumption is that code which the standard says has undefined
behavior is not guaranteed to work.

In practice, it doesn't work on some systems.

--
James Kanze GABI Software mailto:ka...@gabi-soft.fr

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

11 rue de Rambouillet, 78460 Chevreuse, France, Tél. : +33 (0)1 30 23 45 16

---

Ben Hutchings

unread,

May 21, 2003, 1:22:11 PM5/21/03

to

In article <2Hvya.38825$1s1.5...@newsfeeds.bigpond.com>,
"Nick Thurn" wrote:
<snip>

> Now in the spirit of "never say die" could you explain why this is
> flawed?
>
> static bool sp_ok_1 = false;
> static bool sp_ok_2 = false;
> static MySingleton* sp = 0;
>
> MySingleton *MySingleton::Instance()
> {
> if (sp_ok_1 == false)
> {
> LOCK();
> if (sp_ok_2 == false)
> {
> sp = new MySingleton;
> sp_ok_2 = true;
> sp_ok_1 = true;
> }
> UNLOCK();
> }
> return sp;
> }
>
> my assumptions are that:
> - the pointer sp is never inspected hence backwash of raw
> memory during construction is not an issue
> - bools (or an equivalent eg chars) should be atomically
> assignable
> - sp_ok_2 is always protected by a mutex so cannot be in an
> undefined state when checked

But it's a bool, and you're already assuming bools can be assigned
atomically.

> - sp_ok_1 and sp will be synced due to the lock acquisition
> if sp_ok_1 is seen to be false but the thread does not
> win the lock first

But there is no synchronisation if sp_ok_1 is true when the first
test is done.

> assuming this is also a bad solution - why is it so?

I don't think you grasp the problem at all. Did you really read
the web page that Scott Meyers referred to? The assignment to
sp_ok_1 can be visible to threads running on other processors
before the assignment to sp or the initialisation of any part of
the MySingleton instance. LOCK and UNLOCK should form memory
barriers but will have no effect on ordering of memory accesses
between them.

The implementation - either compiler or processor - could
legitimately perform the memory accesses performed by the code
you offered in the order you might expect from this alternate
code:

MySingleton *MySingleton::Instance()
{
MySingleton * local_sp = sp;

if (sp_ok_1 == false)
{
LOCK();
if (sp_ok_2 == false)
{

sp_ok_1 = true;
sp_ok_2 = true;
void * buf = operator new(sizeof(MySingleton));
sp = local_sp = static_cast<MySingleton*>(buf);
new (buf) MySingleton;
}
UNLOCK();
}
return local_sp;
}

It is essential to add read and write memory barriers to
prevent such reordering. See <http://www.cs.umd.edu/~pugh/java/
memoryModel/DoubleCheckedLocking.html#explicitMemoryBarriers>
for a solution that really works.

James Dennett

unread,

May 21, 2003, 3:27:18 PM5/21/03

to

A thread that sees sp_ok_1 != false might see
sp == 0 -- the thread may not ever have locked
any mutex, and might see the write to sp_ok_1
before it sees the write to sp.

With a memory barrier, you could ensure that the
write to sp was complete (and visible to all other
threads) before modifying sp_ok_1. Without at least
that, it seems that there is no guarantee.

-- James.

Balog Pal

unread,

May 21, 2003, 3:27:26 PM5/21/03

to

""Nick Thurn"" <thu...@bigpond.com> wrote in message

news:2Hvya.38825$1s1.5...@newsfeeds.bigpond.com...

> Thanks for the explaination.
> Now in the spirit of "never say die" could you explain why this is flawed?
>
> static bool sp_ok_1 = false;
> static bool sp_ok_2 = false;
> static MySingleton* sp = 0;
>
> MySingleton *MySingleton::Instance()
> {
> if (sp_ok_1 == false)
> {
> LOCK();
> if (sp_ok_2 == false)
> {
> sp = new MySingleton;
> sp_ok_2 = true;
> sp_ok_1 = true;
> }
> UNLOCK();
> }
> return sp;
> }

We had endless discussion of that. :) look for double-checked locking, DCL
and similar subjects here and on c.l.c++.moderated for details.

You have several assumptetions that are not met in practice. In this
particular solution you assume assignments will happen in that order.
However the compiler is allowed to assign ok1 before ok2 inside the locked
region. And even before sp. Then the other thread will see ok1 early and
fail.
After you fix that using volatiles the compiler will be limited, but the
processor still can rearrange the order on some machnes. All with guarantee
is at point of unlock every write made it "out".
Another fix could insert another lock between ok2 and ok1, making them
ordered.

And you still face problem on the _reader_ thread. Where you go without any
lock or memory barrier. And reads can be reordered to fetch sp before ok1.
(on system using relaxed memory ordering, someone mentioned Alpha as
example.)

Paul

Alexander Terekhov

unread,

May 21, 2003, 5:03:49 PM5/21/03

to

Ben Hutchings wrote:
[...]

> It is essential to add read and write memory barriers to

Nah, it's essential to add "asymmetrical" *hoist-load* and
*sink-store* barriers... or TSD/TLS.

> prevent such reordering. See <http://www.cs.umd.edu/~pugh/java/
> memoryModel/DoubleCheckedLocking.html#explicitMemoryBarriers>
> for a solution that really works.

And as for C++... A couple of DCSI** (MBR/TLS) solutions (plus
DCCI one) can be found here: <http://tinyurl.com/cbwh> (Subject:
Re: Double-checked locking and memory barriers).

regards,
alexander.

**) DCSI stands for double-checked serialized initialization. Two
variations are known corrently: DCSI-MBR and DCSI-TSD (TLS).
There's also another DC-"pattern" -- double-checked concurrent
initialization [DCCI]; using atomic<> with barriers and "CAS".

P.S. Implementation provided DCSI (pthread_once(), synchronized
static locals, once_call<>) can be done with membars or TSD.
Compiler barriers aside for a moment, hardware membars aren't
needed on UP (or when all "suspect" threads are bound/running on
the same processor with the "creator" thread). TSD COULD be faster
(for MP-safe DCSI). Even if you have a fully portable atomic<>
with barriers, there's just no portable interfaces that would
allow you to create a PORTABLE "customized implementation" that
would take into account all these considerations and optimize
"dynamically", for example. This means that you'll NEVER beat
"pthread_once()"... using DCCI for concurrent "idempotent" inits
(with a single winner) aside for a moment.

Nick Thurn

unread,

May 22, 2003, 2:39:07 PM5/22/03

to

""Nick Thurn"" <thu...@bigpond.com> wrote in message

news:2Hvya.38825$1s1.5...@newsfeeds.bigpond.com...

>
> assuming this is also a bad solution - why is it so?
>

Thanks to all who responded - I now understand the issue.
I guess the simple rule is - no lock no share.

Vishal

unread,

Jun 13, 2003, 8:30:20 PM6/13/03

to

If a compiler does not optimize through translation units, then the following
solution is likely to work (will it?):

class Singleton {

static Singleton* pInstance;
static Singleton* p2;

static Singleton* instance()
{
if(!pInstance)
{
Lock();
if(!pInstance)
{
p2 = new Singleton();
global_fun(); //in a different translation unit.
pInstance = p2;
}
Unlock();
}
return pInstance;
}
};

My assumption is that the compiler will not reorder "pInstance = p2" assignment
because of a call to global_fun() which could have something to do with the values
of both pInstance and p2.(Or along the same lines, passing a pointer to a function
argument to instance() which is invoked at the place where global_fun() is present)

But, if the linker does optimizations like inlining across translation units (like
MS's C++ 7.0 that I know of), then this will not work.However, using a compiler
specific switch, like __declspec(noinline) on global_fun() (which actually works for
member functions according to the documentation, but something along the lines),
then it can be made to work.

Vishal

Scott Meyers wrote:

> On Mon, 17 Feb 2003 17:09:35 +0000 (UTC), Christoph Rabel wrote:
> > MySingleton *MySingleton::Instance(void)
> > {
> > if(!pInstance)
> > {
> > LOCK(); // Do some MT-locking here
> > if (!pInstance)

> > pInstance = new MySingleton;
> > UNLOCK();

> > return sp;
> > }
>
> This is the double-checked locking pattern. I recently drafted an article on
> this topic for CUJ. As I sit here in a pool of my own blood based on the
> feedback I got from pre-pub reviewers, I feel compelled to offer the following
> observation: there is, as far as I know, no way to make this work on a reliable
> and portable basis.
>

> The best treatment of this topic that I know of is "The 'Double-Checked Locking
> is Broken' Declaration"
> (http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html). I
> suggest you not fall into the trap I did in assuming that its focus on Java
> implies that it doesn't really apply to C++. It does. My favorite paragraph
> from that document is this:
>
> There are lots of reasons it doesn't work. The first couple of reasons we'll
> describe are more obvious. After understanding those, you may be tempted to
> try to devise a way to "fix" the double-checked locking idiom. Your fixes will
> not work: there are more subtle reasons why your fix won't work. Understand
> those reasons, come up with a better fix, and it still won't work, because
> there are even more subtle reasons.
>
> Lots of very smart people have spent lots of time looking at this. There is no
> way to make it work without requiring each thread that accesses the helper
> object to perform synchronization.
>
> As an example of one of the "more obvious" reasons why it doesn't work, consider
> this line from the above code:
>
> pInstance = new MySingleton;
>
> Three things must happen here:
> 1. Allocate enough memory to hold a MySingleton object.
> 2. Construct a MySingleton in the memory.
> 3. Make pInstance point to the object.
>
> In general, they don't have to happen in this order. Consider the following
> translation. This isn't code a human is likely to write, but it is a valid
> translation on the part of the compiler under certain circumstances (e.g., when
> static analysis reveals that the MySingleton constructor cannot throw):
>
> pInstance = // 3
> operator new(sizeof(MySingleton)); // 1
> new (pInstance) MySingleton; // 2
>
> If we plop this into the original function, we get this:
>
> > MySingleton *MySingleton::Instance(void)
> > {
> > if(!pInstance) // Line 1

> > {
> > LOCK(); // Do some MT-locking here
> > if (!pInstance)
> pInstance =

> operator new(sizeof(MySingleton)); // Line 2
> new (pInstance) MySingleton;
> > UNLOCK();
> > return sp;
> > }
>
> So consider this sequence of events:
> - Thread A enters MySingleton::Instance, executes through Line 2, and is
> suspended.
> - Thread B enters MySingleton::Instance, executes Line 1, sees that pInstance
> is non-null, and returns. It then merrily dereferences the pointer, thus
> referring to memory that does not yet hold an object.
>
> If there's a portable way to avoid this problem in the presence of aggressive
> optimzing compilers, I'd love to know about it.
>
> Scott

Jon Biggar

unread,

Jun 16, 2003, 3:39:45 PM6/16/03

to

Vishal wrote:
> If a compiler does not optimize through translation units, then the following
> solution is likely to work (will it?):
>
> class Singleton {
>
> static Singleton* pInstance;
> static Singleton* p2;
>
> static Singleton* instance()
> {
> if(!pInstance)
> {
> Lock();
> if(!pInstance)
> {
> p2 = new Singleton();
> global_fun(); //in a different translation unit.
> pInstance = p2;
> }
> Unlock();
> }
> return pInstance;
> }
> };
>
> My assumption is that the compiler will not reorder "pInstance = p2" assignment
> because of a call to global_fun() which could have something to do with the values
> of both pInstance and p2.(Or along the same lines, passing a pointer to a function
> argument to instance() which is invoked at the place where global_fun() is present)

Actually, the compiler could still optimize this incorrectly, because
you declared pInstance & p2 to be static, so it knows that another
translation unit can't fiddle with them.

--
Jon Biggar
Floorboard Software
j...@floorboard.com
j...@biggar.org

Vishal

unread,

Jun 16, 2003, 9:39:34 PM6/16/03

to

Jon Biggar wrote:

> Actually, the compiler could still optimize this incorrectly, because
> you declared pInstance & p2 to be static, so it knows that another
> translation unit can't fiddle with them.

If you notice, pInstance and p2 are declared in class scope (static data members) and
hence have external linkage.

Thanks,
Vishal

Lawrence Rust

unread,

Jun 19, 2003, 9:50:55 PM6/19/03

to

Nick Thurn wrote...
[snip]

> Now in the spirit of "never say die" could you explain why this is flawed?
>
> static bool sp_ok_1 = false;
> static bool sp_ok_2 = false;
> static MySingleton* sp = 0;
>
> MySingleton *MySingleton::Instance()
> {
> if (sp_ok_1 == false)
> {
> LOCK();
> if (sp_ok_2 == false)
> {
> sp = new MySingleton;
> sp_ok_2 = true;
> sp_ok_1 = true;
> }
> UNLOCK();
> }
> return sp;
> }
>
> my assumptions are that:
> - the pointer sp is never inspected hence backwash of raw
> memory during construction is not an issue
> - bools (or an equivalent eg chars) should be atomically
> assignable
> - sp_ok_2 is always protected by a mutex so cannot be in an
> undefined state when checked
> - sp_ok_1 and sp will be synced due to the lock acquisition
> if sp_ok_1 is seen to be false but the thread does not
> win the lock first
>
> assuming this is also a bad solution - why is it so?

There are at least two flaws:

1. The C++ spec provides no guarantees about the atomicity of assignment of
any type in the presence of threads. Therefore the line "if (sp_ok_1 ==
false)" could result in undefined behaviour if sp_ok_1 is in the process of
being written.

2. Neither sp_ok_1 or sp_ok_2 are declared volatile so an optimiser could
(assuming single threaded operation) optimise them both away, leaving a test
on sp alone.

Also there is a potential problem with cache coherency in SMP systems that
don't implement bus snooping. sp_ok_1 is read outside of the lock but
written inside of it. This could result in one processor having an
incorrect false value cached and hence repeatedly acquiring the lock.

Note that the use of sp_ok_2 is superfluous and could be replaced by a test
of sp since they are both protected by the lock.

-- Lawrence Rust, Software Systems, www.softsystem.co.uk

Lawrence Rust

unread,

Jun 20, 2003, 1:38:52 AM6/20/03

to

"Nick Thurn" wrote...
[snip]

> > MySingleton *MySingleton::Instance()
> > {
> > static MySingleton* sp = MakeInstance();
> > return sp;
> > }

[snip]

> Two things spring to mind:
>
> First:
> MySingleton* MySingleton::MakeInstance()
> {
> LOCK();
> static MySingleton* s = new MySingleton;
>
> MySingleton* alias = s;
> alias->SomeFunctionOrOther();
>
> UNLOCK();
> return alias;
> }
>
> but then I'm no guru (as has been proved multiple times before).

Again this is not thread safe since the init of sp is not mutexed. And
again optimisation by flow analysis could remove the call to
SomeFunctionOrOther() and hence alias could be optimised
away.

It strikes me that the problem with implementing double locking in C++
occurs because the language doesn't provide thread safe atomic assignment.
This can be avoided by requiring platform specific atomic read and atomic
write operations that provide the sequencing and memory coherency:

struct atomic_t; // Platform specific type
void* ATOMIC_READ( const atomic_t*);
void ATOMIC_WRITE( const void*, atomic_t*);

then:

MySingleton *MySingleton::Instance(void)
{
static atomic_t sp;

if ( !ATOMIC_READ( &sp) )
{
LOCK();
if ( !ATOMIC_READ( &sp) )
ATOMIC_WRITE( new MySingleton, &sp);
UNLOCK();
}

return static_cast< MySingleton*>( ATOMIC_READ( &sp) );
}

In most implementations ATOMIC_READ/WRITE could be implemented as simple
load/store ops. However, because of their definition they cannot be
optimised away and guarantee sequence. A better implementation could use
templates.

I have found that in a MT environment these and the related primitives
decrement, increment, add, exchange and compareAndExchange are invaluable.

-- Lawrence Rust, Software Systems, www.softsystem.co.uk

---