MP safe Singleton using double-checked locking

David Holmes

unread,

May 28, 1998, 3:00:00 AM5/28/98

to

A recent thread (on Synchronously stopping ..) ended up discussing the use
of the double-checked locking pattern and it was observed by Dave Butenhof
that the way the pattern was being used was potentially unsafe on a MP
system. This post clarifies a point that I made (and which was easily
overlooked) that it is indeed possible (and not difficult or burdensome) to
correctly use double-checked locking on MP systems.

Cheers,
David
-------------------------

The double-checked locking pattern is an optimisation to avoid locking
overhead in cases where the locking appears to be needed only once. However
it has been shown (Dave B.) that on a multi-processor system the use of
double-checked locking may not guarantee correct results when accessing the
'object' protected by the double-check due to a lack of memory coherency
between multiple threads.

The problem is most easily demonstrated using the Singleton pattern as an
example. We use a Java example (with a straight-forward mapping to C++ and
POSIX - not withstanding the issue of how to initialise the mutex ;-))

public class Singleton {
private static Singleton ref = null;

private int a; // some instance data
private Singleton(){ a = 1;} // initialisation of instance data

// static methods synchronize on class object
public static Singleton instance(){
// use double-checked locking
if ( ref == null) {
synchronized(Singleton.class){ // use the class object's lock
if (ref == null) ref = new Singleton();
}
}
return ref;
}

// instance method synchronize on 'this'
public synchronized int getA(){ return a; }
public synchronized void setA(int newA){ a = newA; }
}

Although this appears to be correctly synchronized it has a problem.
Consider this. Thread A sees the reference as null, acquires the lock,
creates the object, gets a reference back from the constructor, stores the
reference into the shared reference field and then unlocks the lock. At the
point of unlock all initialised fields and the shared reference must be
visible to all other threads before any subsequent memory changes caused by
this thread can occur. The problem is that there is nothing to stop the
shared reference from appearing in main memory *before* the newly
initialised fields. Neither Java locks, nor POSIX mutexes control the
ordering of writes within a critical section as seen from another thread
(without using the same mutex/lock). Thus thread B may see the shared
reference as non-null and use the object, but in the process will access
unitialised fields of that object.

The only way for thread B to be guaranteed to see all the values written by
thread A is to force thread B to wait until A's memory writes all appear in
main memory - and the only way that can happen is if thread B tries to
acquire the same lock that thread A used for initialisation purposes. So
this example of a Singleton can fail. However with a slight change we can
make everything work correctly.

public class Singleton {
// define one lock object to be used for all locking
private static Object lock = new Object();
private static Singleton ref = null;

private int a;
private Singleton(){ a = 1;}

// static methods synchronize on 'lock'
public static Singleton instance(){
if ( ref == null) {
synchronized(lock){
if (ref == null) ref = new Singleton();
}
}
return ref;
}

// instance methods synchronize on 'lock'
public int getA(){ synchronized(lock){ return a;} }
public void setA(int newA){ synchronized(lock){a = newA;} }

}

Now thread B can not access the instance fields until initialisation is
complete and all memory writes by thread A appear in main memory.

Thus people can continue to use double-checked locking for initialisation
and access to Singletons. Note however all accesses to the Singletons data
must be synchronised otherwise the problem can return. For example we may
define a get() method to return a valid but possibly stale value and thus
assume that it is not necessary to synchronise it. However if a thread uses
that method first then it would again be possible for an uninitialised
value to be returned.

Raj Calisa

unread,

May 28, 1998, 3:00:00 AM5/28/98

to

>
> public class Singleton {
> // define one lock object to be used for all locking
> private static Object lock = new Object();
> private static Singleton ref = null;
>
> private int a;
> private Singleton(){ a = 1;}
>
> // static methods synchronize on 'lock'
> public static Singleton instance(){
> if ( ref == null) {
> synchronized(lock){
> if (ref == null) ref = new Singleton();
> }
> }
> return ref;
> }
>
> // instance methods synchronize on 'lock'
> public int getA(){ synchronized(lock){ return a;} }
> public void setA(int newA){ synchronized(lock){a = newA;} }
>
> }
>

Even though the bove codew solves the MP unsafe semantics of
double-checked locking,
it introduces an overhead of locking and unlocking for accessing data
members in the
singleton object. This might not be desirable if (for example) the
object's data is
readonly.

I have an alternate solution which requires the use of an additional
mutex and a
variable to indicate if it is safe to access the singleton object's data
members.

class Mutex { } ; // Assume usual semantics
class MutexLocker { } ; // Assume usual semantics

class Singleton {

private:
static Singleton *theInstance ;
static int safe ;
static Mutex mInstance,
mSafe ;

public :
static Singleton *Instance() {

if ( !theInstance ) {
Mutexlocker l( mInstance ) ;
if ( !theInstance ) {
theInstance = new Singleton() ;
MutexLocker l( mSafe ) ;
safe = 1 ;
}
}

if ( safe ) {
// Safe to use singleton
return theInstance ;
} else {
// Unsafe - hence lock the mutex before using.
MutexLocker l( mInstance ) ;
return theInstance ;
}
/*NOTREACHED*/
return theInstance ;
}
} ;

The above code ensures that all data members of the object are
initialized before
the 'safe' is set to 1.

Any improvements/corrections welcome.

rajanish...

Niall Smart

unread,

May 28, 1998, 3:00:00 AM5/28/98

to

David Holmes wrote:

> public class Singleton {

> private static Object lock = new Object();
> private static Singleton ref = null;
> private int a;
> private Singleton(){ a = 1;}
>

> public static Singleton instance(){
> if ( ref == null) {
> synchronized(lock){
> if (ref == null) ref = new Singleton();
> }
> }
> return ref;
> }
>

> public int getA(){ synchronized(lock){ return a;} }
> public void setA(int newA){ synchronized(lock){a = newA;} }
> }

Why do you have to create a different lock in the second implementation?
Can't you just use Singleton.class?

Niall

Joe Seigh

unread,

May 28, 1998, 3:00:00 AM5/28/98

to

This pattern is just an application of a more general
pattern for atomically updating a data structure by making
a new copy and atomically swapping a pointer to it. A lot
if "wait-free" programs use this pattern.

writer thread:
x.a = y; // update or create some data structure
; // memory barrier needed here
q = x; // update a globally pointer to point to
// new or updated data structure.

reader thread;
z = q; // get pointer to object
; // memory barrier needed here
w = z.a; // access object

The memory barriers are needed if you have an architecture that allows
out of order memory accesses.

The memory barrier for the writer thread in this case is easy.
Just use the memory barrier that is part of the mutex that is
used to coordinate the creation of the mutex for this thusly.

Something q;
Something q2;

if (q == null) { // writer part
synchronized(lock) {
if (q2 == null) {
q2 = new Something();
}
} // memory barrier from unlock here to
// force stores from initialization to complete
// before making new object visible by
// updating q. q2 at this point is always the
// address of the new object.

q = q2; // there's a very small window where you
// might get a few extra redundant stores
// but they won't hurt anything. There's
// a way around this if it bothers anyone.

}
else { // reader part
... // memory barrier required here
}
// access object pointed to by q

The reader part is a little more tricky. The only memory barriers
that are portable are built into mutexes so you'd have to use a
lock if you wanted portable code. There is one thing though. It's
arguable whether there are any architectures that have prescient like
out of order fetching, i.e. fetching the contents of a data object
before the pointer to that object is fetched. If you have

p = q;
x = p.a;

can the fetch of "p.a" occur before the fetch of "q" from memory?
With regards to java, while the java architecture does not forbid
this kind of behavior and it's possible to implement a jvm which
would exhibit this behavior, Sun's java support doesn't think it
is likely. Their event handling via the AWTEventMulticaster uses
the paradigm from the beginning of this posting and it doesn't
use any memory barriers at all. And they've said they're not
going to fix it. So you might consider skipping the use of a
memory barrier for the reader part if you are using java or if
you are using a processor where this type of out of order fetch
is impossible. With java, at least you'd have company if things
broke.

It should also be mentioned that the global pointer to this singleton
object should be initialized before the other threads are started.
When you start a thread the memory states are synchronized. If you
initialize the global pointer after the other threads are started
then you have to come up with some synchronization mechanism to let
the other threads see it's initial state which could obviate most
the advantage of using this pattern. If you are using pthreads,
this would be the place to perform mutex initialization.

Joe Seigh

David Holmes

unread,

May 29, 1998, 3:00:00 AM5/29/98

to

Niall Smart <nj...@doc.ic.ac.uk> wrote in article
<356D5CEE...@doc.ic.ac.uk>...

> Why do you have to create a different lock in the second implementation?
> Can't you just use Singleton.class?

Sure, you can use any object as the lock as long as the same object is
used.

David

David Holmes

unread,

May 29, 1998, 3:00:00 AM5/29/98

to

Raj Calisa <Raj.C...@dsto.defence.gov.au> wrote in article
<356D0484...@dsto.defence.gov.au>...

> The above code ensures that all data members of the object are
> initialized before the 'safe' is set to 1.

Nice variant. Thanks.

David

Dave Butenhof

unread,

Jun 1, 1998, 3:00:00 AM6/1/98

to

Raj Calisa wrote:

> I have an alternate solution which requires the use of an additional
> mutex and a variable to indicate if it is safe to access the singleton
> object's data members.

[...]

> if ( !theInstance ) {
> Mutexlocker l( mInstance ) ;
> if ( !theInstance ) {
> theInstance = new Singleton() ;
> MutexLocker l( mSafe ) ;
> safe = 1 ;
> }
> }
>
> if ( safe ) {
> // Safe to use singleton
> return theInstance ;
> } else {
> // Unsafe - hence lock the mutex before using.
> MutexLocker l( mInstance ) ;
> return theInstance ;
> }
> /*NOTREACHED*/
> return theInstance ;
> }
> } ;
>

> The above code ensures that all data members of the object are
> initialized before the 'safe' is set to 1.

Sorry, but, big problem here. Just like the original data and its pointer,
the read and write order of "safe" is indeterminate unless synchronized.
Yes, "safe" is set to 1 only after the data is initialized. But then, you
can just as easily make sure that the pointer is set after the data is
initialized. None of that means anything unless you've ensured proper data
visibility ordering for the reader; which you haven't done.

In other words, you haven't solved the problem, just added extra overhead
and complication.

You need a MEMORY BARRIER, not an extra data reference. The only portable
way to generate a memory barrier is to use explicit POSIX synchronization,
such as a mutex. So we're right back where we started.

Please, let's stop this flood of optimistic "gee, I've solved the problem".
Chances are, you haven't. So that leaves me with the painful decision of
whether to spend my time to debunk yet another attempt, or to ignore it and
risk that people will get themselves into trouble by assuming that an
unchallenged "solution" must actually be correct. Yeah, "caveat reader" and
all that, but, still, this doesn't seem a productive use of anyone's time.

/---------------------------[ Dave Butenhof ]--------------------------\
| Digital Equipment Corporation bute...@zko.dec.com |
| 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof |
| Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----------------[ Better Living Through Concurrency ]----------------/

John Hickin

unread,

Jun 1, 1998, 3:00:00 AM6/1/98

to

>You need a MEMORY BARRIER, not an extra data reference. The only portable
>way to generate a memory barrier is to use explicit POSIX synchronization,
>such as a mutex. So we're right back where we started.

The code could be modified to store the instance pointer in a thread-specific
key. Each reading thread would have to acquire the mutex once to synchronize
and thereafter no mutex locking would be needed. I hesitate to recommend this
for two reasons:

1. Thread-specific keys are a limited resource,
2. Looking up thread-specific data might be as slow as a mutex.

Any comments on item 2? I'd like to know how expensive this type of lookup is.

--
John D. Hickin Nortel Technology, Montreal, Quebec
(514) 765-7924 hic...@nortel.ca

David Holmes

unread,

Jun 1, 1998, 3:00:00 AM6/1/98

to

Dave Butenhof <bute...@zko.dec.com> wrote in article
<3572A4C3...@zko.dec.com>...

> Sorry, but, big problem here. Just like the original data and its
pointer,
> the read and write order of "safe" is indeterminate unless synchronized.
> Yes, "safe" is set to 1 only after the data is initialized. But then, you
> can just as easily make sure that the pointer is set after the data is
> initialized. None of that means anything unless you've ensured proper
data
> visibility ordering for the reader; which you haven't done.

Dave is right as usual. Just to clarify for those who haven't yet seen the
problem. theInstance, the fields/data of the instance and the 'safe' flag
can all be sitting in the write buffer waiting to go to main memory and
they can be in any order. Thus if theInstance and 'safe' get written first
a reader thread can still see uninitialised data.

Sorry that I missed that Raj.

> You need a MEMORY BARRIER, not an extra data reference. The only portable
> way to generate a memory barrier is to use explicit POSIX
synchronization,
> such as a mutex. So we're right back where we started.

I don't see how a memory barrier per-se can fix this problem. The Mutexes
used above provide the memory barrier for the writer. The reader must be
prevented from accessing the memory locations concerned until after they
have been written to main memory. In other words you have to force the
reader to wait until the writer memory updates are completed. The only way
to make a thread wait in this context is using a Mutex - but it's not the
memory barrier that fixes the problem its the forcing of atomicity over a
sequence of memory writes.

> Please, let's stop this flood of optimistic "gee, I've solved the
problem".

The version I showed did solve the problem at the expense of needing
synchronised accessor methods. Access to the data is protected by the same
'mutex' that controls initialisation - the release of the mutex by the
writer ensures that the values are written to main memory before the mutex
is free, thus the reader can't get the mutex until the data is in memory.

> Chances are, you haven't. So that leaves me with the painful decision of
> whether to spend my time to debunk yet another attempt, or to ignore it
and
> risk that people will get themselves into trouble by assuming that an
> unchallenged "solution" must actually be correct. Yeah, "caveat reader"
and
> all that, but, still, this doesn't seem a productive use of anyone's
time.

On the contrary this is productive because it educates people as to what
approaches are wrong. Yes that means that you always need to correct the
incorrect, but until this all appears in a nice neat text that everybody
reads you will never stop that (hopefully this example will appear in the
next version of Dave's book ;) ). The lazy optimisation example we have
been using (and there are worse variants of it) are used a lot in practice
- which means that a lot of people need to be educated about this. Your
input is invaluable Dave so please don't stop.

David

Raj Calisa

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to

> Sorry, but, big problem here. Just like the original data and its pointer,
> the read and write order of "safe" is indeterminate unless synchronized.
> Yes, "safe" is set to 1 only after the data is initialized. But then, you
> can just as easily make sure that the pointer is set after the data is
> initialized. None of that means anything unless you've ensured proper data
> visibility ordering for the reader; which you haven't done.
>

Sorry, I don't understand. Maybe I am missing something.

Since there is a memory barrier between 'safe = 1' and 'theInstance =
..."
I thought that when ever the reader sees that safe = 1, it means that
theInstance
as well as the data members are initialized. As I understand there may
be reordering
of writes among theInstance and the object's data members and 'safe = 1'
is done
only after that because of the presence of 'MutexLocker l(mSafe)'.

raj...

David Holmes

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to

Raj Calisa <Raj.C...@dsto.defence.gov.au> wrote in article

<3573536E...@dsto.defence.gov.au>...

> Sorry, I don't understand. Maybe I am missing something.
>
> Since there is a memory barrier between 'safe = 1' and 'theInstance =
> ..." I thought that when ever the reader sees that safe = 1, it means
that
> theInstance as well as the data members are initialized.

I seem to be playing ping-pong with my opinions here, which isn't helping
anything. My only excuse is I've been thinking in Java terms rather than
POSIX - no excuse really. (With Java values are only guaranteed to move to
main memory when you release a synchronisation lock - but the lock
implementation still needs memory barriers at lock and unlock.)

I agree with Raj. When the mSafe mutex is locked a memory barrier is issued
which places initialisation of theInstance and initialisation of the data
on the far side of the barrier. A reader thread could see theInstance
before the data but when they check 'safe' it must be false and so they
lock mSafe. If 'safe' is true and visible then because it is on the
opposite side of the barrier theInstance and the data must have been
initialised.

If you see my other post you'll see that we don't really need the safe flag
- simply appropriate placement of the second mutex to force a memory
barrier between initialisation of the data and initialisation of
theInstance.

David

David Holmes

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to

[ Replacement posting correcting a coding error ]

David Holmes <dho...@mri.mq.edu.au> wrote in article
<01bd8db5$ae402bc0$1bf56f89@dholmes>...

> I don't see how a memory barrier per-se can fix this problem.

Yes I do - I've been discussing this in the context of Java elsewhere for
the past week. My apologies for being dense this morning.

The original problem can be solved if a memory barrier is placed at the end
of the object constructor before the returned reference/pointer is written
out to main memory. This way if a reader thread can see the reference then
it must be able to see the data that was previously written.

This leads to a simple solution:

if (instance == null){
lock(&mutex1);
if (instance == null) { // THIS IS THE CORRECTION
Singleton *temp;
lock(&mutex2);
temp = new Singleton();
unlock(&mutex2); // memory barrier after initialisation
instance = temp;
}
unlock(&mutex1);
}
return instance;

Now the accessor methods don't need to be synchronised.

David

Nanbor Wang

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to Douglas C. Schmidt

Hi David (H.),

Allow me to jump in the discussion.

> > The original problem can be solved if a memory barrier is placed at
> > the end of the object constructor before the returned
> > reference/pointer is written out to main memory. This way if a
> > reader thread can see the reference then it must be able to see the
> > data that was previously written.
>
> >This leads to a simple solution:
> >
> > if (instance == null){
> > lock(&mutex1);
> > if (instance == null) { // THIS IS THE CORRECTION
> > Singleton *temp;
> > lock(&mutex2);
> > temp = new Singleton();
> > unlock(&mutex2); // memory barrier after initialisation
> > instance = temp;
> > }
> > unlock(&mutex1);
> > }
> > return instance;
> >
> >Now the accessor methods don't need to be synchronised.

The only problem I can see here is that you assume the write in
"instance = temp" is atomic, which may not be the case on some
platforms. Therefore, a reader thread might read a partially written
pointer.

I would propose a metaphore of double-checked locking,

(Both <singleton> and <instantiated> are static and initialized to zero.)

if (! initialized)
{
mutex.lock ();

if (singleton == null)
singleton = new Singleton ();
else
initialized = true;

mutex.unlock ();
}

return singleton;

This should ensure that when initialized changes state, singleton is
visible to all other threads. Althought it requires one extra locking
compared with the original double-checked locking pattern.

Does this make sense?

Thanks,

Nanbor

Bas de Bakker

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to

Nanbor Wang <nan...@cs.wustl.edu> writes:

> (Both <singleton> and <instantiated> are static and initialized to zero.)
>
> if (! initialized)
> {
> mutex.lock ();
>
> if (singleton == null)
> singleton = new Singleton ();
> else
> initialized = true;
>
> mutex.unlock ();
> }
>
> return singleton;
>
> This should ensure that when initialized changes state, singleton is
> visible to all other threads. Althought it requires one extra locking
> compared with the original double-checked locking pattern.
>
> Does this make sense?

No, or at least not enough. A reading thread could read 'singleton'
at the start of this function and put it in a register while it is
still NULL. Then it could read 'initialized', which has now been set
to true by another thread.

Regards,
Bas de Bakker.

Per Schröder

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to

In article <3573536E...@dsto.defence.gov.au>, Raj Calisa <Raj.C...@dsto.defence.gov.au> wrote:
>Sorry, I don't understand. Maybe I am missing something.
>
>Since there is a memory barrier between 'safe = 1' and 'theInstance =

>...."

>I thought that when ever the reader sees that safe = 1, it means that
>theInstance

>as well as the data members are initialized. As I understand there may
>be reordering
>of writes among theInstance and the object's data members and 'safe = 1'
>is done
>only after that because of the presence of 'MutexLocker l(mSafe)'.
>
>raj...

The reads may also be "reordered".
You need a memory barrier after the reader sees that safe == 1.

If you want to go outside the mutex model of programming, you need to be
VERY careful that you understand the memory model completely for your
hardware. You need to study the architecture manual for your specific
hardware, and follow the guidelines accurately. Note that you can not (in
general) _test_ your code for correctness; it may execute "correctly" on
your chip, only to fail miserably when you upgrade to the next chip (in the
same architecture), move to a SMP machine, or use a new cache/memory bus
(or whatever).

For portable and easy programming, stick with the mutex model.

In my opinion, the best description of a memory model can be found in
chapter 5 of the Alpha Architecture Handbook at:

http://ftp.digital.com/pub/Digital/info/semiconductor/literature/alphahb2.
pdf

--
/Per

Achim Gratz

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to

p...@mimer.nospam.se (Per Schröder) writes:

> In my opinion, the best description of a memory model can be found in

> chapter 5 of the Alpha Architecture Handbook at [...]

For a slightly different (and a bit easier to understand, IMHO)
presentation of formal memory models see the SPARC Architecture
Manual, V9, Appendix D. The terminology is based on a Xerox PARC
Report (CSL-91-11). Most notably, once you have a formal
specification (including things like the effect of a context switch),
you are able to prove or disprove certain assertions about your code.
The SPARC Architecture Manual contains example output of a program to
enumerate all possible outcomes of certain code sequences. Similar
techniques are used to prove that security protocols have or don't
have certain features.

Achim Gratz.

--+<[ It's the small pleasures that make life so miserable. ]>+--
WWW: http://www.inf.tu-dresden.de/~ag7/{english/}
E-Mail: gr...@ite.inf.tu-dresden.de
Phone: +49 351 463 - 8325

Nanbor Wang

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to

Ah, yes. I guess the best way to solve this is to instantiate the
singleton before hand. But that's kind of lame. ;(

Thanks,

Nanbor

David Holmes

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to

Bas de Bakker <fa...@pc0067.pica.nl> wrote in article
<jbg1ho6...@pc0067.pica.nl>...

> Nanbor Wang <nan...@cs.wustl.edu> writes:
> >
> > if (! initialized)
> > {
> > mutex.lock ();
> >
> > if (singleton == null)
> > singleton = new Singleton ();
> > else
> > initialized = true;
> >
> > mutex.unlock ();
> > }
> >
> > return singleton;
> >
>

> No, or at least not enough. A reading thread could read 'singleton'
> at the start of this function and put it in a register while it is
> still NULL. Then it could read 'initialized', which has now been set
> to true by another thread.

I don't see how. Each thread will read the value of 'initialized' and may
store it in a register if it likes. If it is false then it will lock the
mutex and read 'singleton'. If singleton is null it creates a new instance;
otherwise it sets initialised. Both 'initialised' and 'singleton' are only
read once and singleton is read when the mutex is held.

David

David Holmes

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to

Nanbor Wang <nan...@cs.wustl.edu> wrote in article
<ovpbtsc...@lambada.cs.wustl.edu>...

> The only problem I can see here is that you assume the write in
> "instance = temp" is atomic, which may not be the case on some
> platforms. Therefore, a reader thread might read a partially written
> pointer.

That fact that instance can be read/written atomically is a requirement for
the use of this pattern. This was stated in the original thread that was
discussing this and is described as part of the double-checked locking
pattern.

> if (! initialized)
> {
> mutex.lock ();
>
> if (singleton == null)
> singleton = new Singleton ();
> else
> initialized = true;
>
> mutex.unlock ();
> }
>
> return singleton;

In this situation you achieve the same affects by forcing at least two
threads to acquire the mutex. The first use forces singleton and the data
in singleton to be visible before initialised; the second use sets
initialised.
You use one mutex and two threads, my version used two mutexes and one
thread. The affect is the same we ensure that if a reader sees that the
singleton is 'initialised' then it will also see the singletons data.

David

David Holmes

unread,

Jun 2, 1998, 3:00:00 AM6/2/98

to

Per Schröder <p...@mimer.nospam.se> wrote in article
<6l0roh$b...@sparc1.mimer.se>...

> The reads may also be "reordered".
> You need a memory barrier after the reader sees that safe == 1.

Can you elaborate on what you see as the problem?

David

Raj Calisa

unread,

Jun 3, 1998, 3:00:00 AM6/3/98

to

Per Schröder wrote:
> >be reordering
> >of writes among theInstance and the object's data members and 'safe = 1'
> >is done
> >only after that because of the presence of 'MutexLocker l(mSafe)'.
> >

> The reads may also be "reordered".
How does it matter? Can you illustrate the problem with an example?

raj...

Dave Butenhof

unread,

Jun 3, 1998, 3:00:00 AM6/3/98

to

David Holmes wrote:

> The original problem can be solved if a memory barrier is placed at the end
> of the object constructor before the returned reference/pointer is written
> out to main memory. This way if a reader thread can see the reference then
> it must be able to see the data that was previously written.

You can't expect coherency unless both parties cooperate. Locking the mutex on
the writer side will provide the necessary memory barriers there. However, you
CANNOT, EVER, UNDER ANY CIRCUMSTANCES safely read the results from another
thread without (at least) a memory barrier on that side, as well.

> This leads to a simple solution:
>
> if (instance == null){
> lock(&mutex1);
> if (instance == null) { // THIS IS THE CORRECTION
> Singleton *temp;
> lock(&mutex2);
> temp = new Singleton();
> unlock(&mutex2); // memory barrier after initialisation
> instance = temp;
> }
> unlock(&mutex1);
> }

else MemoryBarrier ();

> return instance;
>
> Now the accessor methods don't need to be synchronised.

With the addition of the memory barrier, and assuming you don't consider this
to be "synchronization", the statement is correct. However, without the memory
barrier on the READER, it may still see the pointer before it can see the data
to which the pointer should point.

Your second mutex in the writer is unnecessary. You're using it to get a
memory barrier between setting the data and the pointer; but that barrier
won't help anyone since it only applies to the order in which that processor
will write data. It doesn't affect the order in which any other processor may
READ data. It's sufficient to use a single mutex, which will give the normal
guarantee that all data written while the mutex was locked shall have been
written prior to the unlock.

Yes, your earlier solution of using a mutex on the reader side will work, but,
since a mutex lock and unlock each include a barrier, plus a lot more, a
single barrier will always be cheaper. On the other hand, of course, using a
mutex results in portable code, whereas the meaning and implementation of a
memory barrier isn't at all portable.

Joe Seigh

unread,

Jun 3, 1998, 3:00:00 AM6/3/98

to

If the memory model allows fetches to be out of order then the fetches
from the singleton may have occurred before the fetch of the flag
indicating that the singleton was properly initialized. If that
may have occurred then we can't may any inference as to whether
the single had actually been initialized before the fetches happened.
The logic would be
"a" precedes "c" (per store barrier)
"b" precedes "d" (possible out of order fetch)
"c" precedes "d" (if flag is observed as set)
where
"a" is the stores initializing the singleton
"b" is the fetches from the singleton
"c" is the store of the flag
"d" is the fetch of the flag

We can't say that "a" necessarily precedes "b". We need a fetch
memory barrier between the fetch of the flag and the fetches from
the singleton, which would give us instead
"d" precedes "b" (ordered fetches)

Now we have what we want
"a" precedes "c" precedes "d" precedes "b"
or "a" precedes "b"

and thus say definitively that initialization of the singleton
completed before any fetches from it occurred.

Joe Seigh

David Holmes

unread,

Jun 3, 1998, 3:00:00 AM6/3/98

to

Dave Butenhof <bute...@zko.dec.com> wrote in article

<357550E6...@zko.dec.com>...

> You can't expect coherency unless both parties cooperate. Locking the
mutex on
> the writer side will provide the necessary memory barriers there.
However, you
> CANNOT, EVER, UNDER ANY CIRCUMSTANCES safely read the results from
another
> thread without (at least) a memory barrier on that side, as well.

As a general statement I agree but in the case where the read of the data
relies on the read of the pointer ie. the address of the data to be read is
formed by using the pointer then I disagree.

> With the addition of the memory barrier, and assuming you don't consider
this
> to be "synchronization", the statement is correct. However, without the
memory
> barrier on the READER, it may still see the pointer before it can see the
data
> to which the pointer should point.

First disagreement. The memory barriers on the writer side ensure that if
the pointer is in memory then the data is in memory. If we freeze the
system and inspect memory at the completion of the final memory barrier
then we will see the pointer and the data. If we don't then the memory
barriers have failed. This is what happens to memory but it isn't
necessarily the way other threads see things.

One the reader side we can argue that reads can occur in any order so that
a read of the data may be done before a read of the pointer. However any
read reordering must maintain basic program consistency - which I'll
explain. Let's assume that a reader tries to read data before reading the
pointer. In that case how does the reader get to the data? It must use some
value for pointer. The only values it can use are null - the original
value, or some non-null value set by a writer thread. If the pointer is
null then obviously it is impossible to read the data, but if the pointer
was null then the program flow will detect that and go through the
locking/initialise/unlock sequence and consequently force the data to be
read via the correct pointer. As the reading of the data must occur after
reading the pointer and as the data was guaranteed written before the
pointer, this scenario works fine. If the pre-read of data uses the
non-null value of pointer then again because it sees the non-null pointer
value it must see the data which was written before that.

> Your second mutex in the writer is unnecessary. You're using it to get a
> memory barrier between setting the data and the pointer; but that barrier
> won't help anyone since it only applies to the order in which that
processor
> will write data. It doesn't affect the order in which any other processor
may
> READ data. It's sufficient to use a single mutex, which will give the
normal
> guarantee that all data written while the mutex was locked shall have
been
> written prior to the unlock.

The second mutex/memory barrier is necessary. The second barrier forces the
data to be visible in memory before the value of the pointer. Without it
the pointer could appear in main memory before the data values. A reader
could then see the non-null pointer and even with your reader-side memory
barrier it could then read data which still hadn't been written by the
other processor.

David

David Holmes

unread,

Jun 3, 1998, 3:00:00 AM6/3/98

to

David Holmes <dho...@mri.mq.edu.au> wrote in article

<01bd8e7b$917c5c90$1bf56f89@dholmes>...

> I don't see how. Each thread will read the value of 'initialized' and may
> store it in a register if it likes. If it is false then it will lock the
> mutex and read 'singleton'. If singleton is null it creates a new
instance;
> otherwise it sets initialised. Both 'initialised' and 'singleton' are
only
> read once and singleton is read when the mutex is held.

Now I do. It is possible for the read of singleton to occur before the read
of the initialised flag. Thus singleton could be read while it is null and
the flag is not set. By the time the flag is read the flag could now be set
and thus the program will return null.

The problem in this case is a disassociation between the flag used to
signal initialisation and the pointer itself. In Raj's example this problem
can not occur, nor do I believe it can occur in my example.

David

David Holmes

unread,

Jun 3, 1998, 3:00:00 AM6/3/98

to

David Holmes <dho...@mri.mq.edu.au> wrote in article

<01bd8e7b$a3eaaf30$1bf56f89@dholmes>...

> The affect is the same we ensure that if a reader sees that the
> singleton is 'initialised' then it will also see the singletons data.

This has now been shown not to be true due to the possibility of
'singleton' being read before 'initialised'.

David

David Holmes

unread,

Jun 3, 1998, 3:00:00 AM6/3/98

to

Joe Seigh <se...@bose.com> wrote in article <1998Jun...@bose.com>...

> If the memory model allows fetches to be out of order then the fetches
> from the singleton may have occurred before the fetch of the flag
> indicating that the singleton was properly initialized. If that
> may have occurred then we can't may any inference as to whether
> the single had actually been initialized before the fetches happened.

Having gone through this in detail I'm now agree that there is a potential
read ordering problem in Raj's example.

We have three things being written in Raj's example:
- theInstance pointer
- the 'safe' flag
- the data (which will be accessed after executing instance() )

The use of mutexes on the writer side ensures that theInstance and the data
will appear in memory before the 'safe' flag.

On the reader side we have to look at how reads could be ordered and work
out the possible interactions.

As I've said in another post before the data can be read some value of
theInstance must be used to calculate the address of the data. That value
must either be null or a non-null value written by another thread. If the
value is null we cannot read the data and the program logic dictates that
we go through the locking sequence with its barriers that will force the
data to be read later. If theInstance is non-null then it is possible that
the data is not yet in memory and we could read invalid data.

However we must now consider the 'safe' flag. Regardless of when it is read
this flag is either true or false. If it is false then there is no problem
because we must go through the locking sequence and the memory barriers
imply that the early read of the data is not valid and so the data must be
read again. If the flag is true then either we read it before theInstance
and the data, or we read it after. If we read it before and found it true
then the memory barriers in the writer guarantee that theInstance and the
data were in memory before 'safe' and so we read the correct data. If
'safe' was read after and is true it doesn't infer anything about the value
of theInstance and the data. Thus we have a problem.

The problem occurs when we read theInstance followed by the data, followed
by the 'safe' flag. We initially read the correct value for theInstance but
when we read the data it has not yet been written by the other processor
and we get invalid data. By the time we read the 'safe' flag the writer has
completed and 'safe' is now true and so we access invalid data.

Now it may be that such out-of-order reads can't occur across procedure
calls - I don't know. If they can then potentially a call:
Singleton::instance->getData()
could return garbage.

I maintain that my version using two mutexes works correctly because of the
tight coupling between the pointer and the data pointed to.

At this point I have to concur with Dave B.'s sentiments. This process of
trying to optimise by removing locking is so fraught with danger that it is
best simply avoided. I hadn't appreciated the consequences of read
reorderings. I'm now older and wiser. And if my other assertions turn out
to be wrong I'll feel even older :)

Cheers,
David

David Holmes

unread,

Jun 4, 1998, 3:00:00 AM6/4/98

to

David Holmes <dho...@mri.mq.edu.au> wrote in article

<01bd8f45$32d69140$1bf56f89@dholmes>...

> The problem in this case is a disassociation between the flag used to
> signal initialisation and the pointer itself. In Raj's example this
problem
> can not occur, nor do I believe it can occur in my example.

OOps. Can occur in Raj's example.

David

Steve Watt

unread,

Jun 4, 1998, 3:00:00 AM6/4/98

to

[ Maybe I'll save Dave B. a little bit of time... ]

In article <01bd8f43$b284cb70$1bf56f89@dholmes>,

David Holmes <dho...@mri.mq.edu.au> wrote:
>Dave Butenhof <bute...@zko.dec.com> wrote in article
><357550E6...@zko.dec.com>...
>> You can't expect coherency unless both parties cooperate. Locking the
>mutex on
>> the writer side will provide the necessary memory barriers there.

[ snip ]

> In that case how does the reader get to the data? It must use some
>value for pointer. The only values it can use are null - the original

>value, or some non-null value set by a writer thread. [ snip ]

Or some non-null value that was *PARTIALLY* set by a writer thread.
Consider the case where your code is running on an MP system with
64 bit processor/address space, but 32 bit alignment constraints. A
write of that pointer could require two cycles -- and they could
even (worst case) cross a cache line. So one of the writes goes
out, making the memory location non-NULL. Then the other processor
reads the pointer, finds an address, and dereferences it. *Poof!*

And I won't even begin to imagine an MP system with several 8088-ish
processors and a pthreads implementation. I've seen the former, but
it didn't have pthreads. Heck, it was *before* pthreads.
--
Steve Watt KD6GGD PP-ASEL-IA Packet: KD6GGD @ N0ARY.#NOCAL.CA.USA.NA
ICBM: 121W 56' 58.1" / 37N 20' 14.2" Internet: steve @ Watt.COM
Free time? There's no such thing. It just comes in varying prices...

Achim Gratz

unread,

Jun 4, 1998, 3:00:00 AM6/4/98

to

"David Holmes" <dho...@mri.mq.edu.au> writes:

> First disagreement. The memory barriers on the writer side ensure
> that if the pointer is in memory then the data is in memory. If we
> freeze the system and inspect memory at the completion of the final
> memory barrier then we will see the pointer and the data. If we
> don't then the memory barriers have failed. This is what happens to
> memory but it isn't necessarily the way other threads see things.

[...]

Common misconception about memory barriers, it seems. The barrier
doesn't "force" the data to memory in the sense you seem to imply, but
merely ensures that any memory transaction issued on the same
processor after the barrier does not complete until all memory
transactions before the barrier have completed. Memory barriers thus
introduce an order to memory transactions (that includes some fine
print about visibility to other processors; SPARC has several types of
memory barriers, while Alpha has just one), think sequence points if
you're a C programmer.

> The second mutex/memory barrier is necessary. The second barrier
> forces the data to be visible in memory before the value of the
> pointer. Without it the pointer could appear in main memory before
> the data values. A reader could then see the non-null pointer and
> even with your reader-side memory barrier it could then read data

> which still hadn't been written by the other processor.

Your example still doesn't work if instance isn't written or read
atomically. It is never a good idea to use data values for
synchronisation and pointers are no exception. You need to do
something that doesn't depend on that assumption.

/**/ if (one_shot == 0){

lock(&mutex1);
if (instance == null) {

Singleton *temp;
// using another mutex in lieu of a memory barrier
/**/ lock(&mutex2);
temp = new Singleton();
instance = temp;
/**/ unlock(&mutex2);
// data and pointer may become visible to other threads,
// but these block on mutex1 before trying to read anything
/**/ one_shot = -1;
// data and pointer completely visible to other processors
// before one_shot becomes visible (due to memory barrier),
// one_shot may only be partly visible, but that's OK --
// noone is ever going to touch it again
/**/ }
unlock(&mutex1);
// threads blocked earlier on mutex1 see instance != 0 now,
// everyone else doesn't care because one_shot is nonzero
}
return instance;

It is essential that one_shot is never changed again and that any
nonzero value means "hands off", as the name hopefully implies. You
are still relying on a coherency protocol on top of a single address
space, so for aggressive distributed memory systems it still falls on
it's face. From the explanations Dave Butenhof gave recently I think
that pthreads were never intended to work on such systems anyway (I'm
thinking about a system where data is pulled to a local address space
on demand - mutexes would need to keep a directory of "their" data in
this case).

Is this "optimization" worth all the hassle? I wouldn't think so.
Has anyone really identified a performance problem with the
"canonical" code? [remove the /**/ lines and all comments to arrive at
that version] Yeah, every time you need a new handle to the singleton
you lock a mutex. But it's guaranteed to work and you can go on to
solve real problems and if contention occurs at that point I'd really
like to see an example. The discussion has shown that it is all to
easy to get something wrong (I'm not even 100% sure it is correct yet)
and quite a number of programmers would feel compelled to misuse it in
inappropriate places. Remember, that's the prototype for a
"Heisenbug" (after Heisenberg's uncertainty principle: when you know
where it is, you don't know what it does -- when you know what it
does, you don't know where it is).

Joe Seigh

unread,

Jun 4, 1998, 3:00:00 AM6/4/98

to

In article <01bd8f43$b284cb70$1bf56f89@dholmes>, "David Holmes" <dho...@mri.mq.edu.au> writes:
|> Dave Butenhof <bute...@zko.dec.com> wrote in article
|> <357550E6...@zko.dec.com>...
|> > You can't expect coherency unless both parties cooperate. Locking the
|> mutex on
|> > the writer side will provide the necessary memory barriers there.

|> First disagreement. The memory barriers on the writer side ensure that if
|> the pointer is in memory then the data is in memory. If we freeze the
|> system and inspect memory at the completion of the final memory barrier
|> then we will see the pointer and the data. If we don't then the memory
|> barriers have failed. This is what happens to memory but it isn't
|> necessarily the way other threads see things.
|>

|> One the reader side we can argue that reads can occur in any order so that
|> a read of the data may be done before a read of the pointer. However any
|> read reordering must maintain basic program consistency - which I'll
|> explain. Let's assume that a reader tries to read data before reading the

|> pointer. In that case how does the reader get to the data? It must use some

|> value for pointer. The only values it can use are null - the original

|> value, or some non-null value set by a writer thread. If the pointer is
|> null then obviously it is impossible to read the data, but if the pointer
|> was null then the program flow will detect that and go through the
|> locking/initialise/unlock sequence and consequently force the data to be
|> read via the correct pointer. As the reading of the data must occur after
|> reading the pointer and as the data was guaranteed written before the
|> pointer, this scenario works fine. If the pre-read of data uses the
|> non-null value of pointer then again because it sees the non-null pointer
|> value it must see the data which was written before that.

I don't think it is safe to assume in the case of fetching a pointer
followed by the fetch of data referenced by the pointer, that the
actual fetches have to occur in that order. The argument for this is
based on the apparent belief that there can be only one way to
accomplish or implement this on a processor.

But unless the architecture explicitly and specifically states this
as a condition that would impose an order of memory fetches, you
can't assume this. Not unless you can prove no other implementation
is possible. You can't infer the general case from a specific
implementation or implementations.

Also, while there may be some architectures that do state that
pointer fetches impose an order on memory fetchs (and I be interested
in hearing which ones do) , Java isn't one of them. Specifically,
java allows (does not forbid) caching of memory fetches. So in the
case of the double-checked singleton, if an existing object was used
for the singleton and data from that object was in cache, that data
could be used even if it was stale.

You may argue that cache schemes don't return stale data. Most don't
but they aren't forbidden from doing that. The Definition of cache
is that it is undetectable, i.e. it can't violate architecture. As
long as the stale data is valid for the interval that the fetch was
allowed to occur in, that stale data is valid.

--
Joe Seigh

David Holmes

unread,

Jun 4, 1998, 3:00:00 AM6/4/98

to

Steve Watt <st...@Watt.COM> wrote in article <Eu0L7...@Watt.COM>...

> [ Maybe I'll save Dave B. a little bit of time... ]

No you won't :)

> Or some non-null value that was *PARTIALLY* set by a writer thread.
> Consider the case where your code is running on an MP system with
> 64 bit processor/address space, but 32 bit alignment constraints.

For those who haven't followed this from the beginning we already require
the atomicity constraint on reading/writing whatever 'flag' variables we
use.

David

David Holmes

unread,

Jun 4, 1998, 3:00:00 AM6/4/98

to

Achim Gratz <gr...@ite.inf.tu-dresden.de> wrote in article
<ck96xs...@ite127.inf.tu-dresden.de>...

> Common misconception about memory barriers, it seems. The barrier
> doesn't "force" the data to memory in the sense you seem to imply, but

My terminology was loose I'll grant but the effect is the same. The memory
barrier ensures that the data is written to main memory before the instance
pointer. Consequently if you inspect main memory and see the instance then
you must also be able to see the data. Now that doesn't force another
thread to see things that way but the rest of the post explained how that
was addressed.

> Your example still doesn't work if instance isn't written or read
> atomically.

As has been stated from the beginning use of the double-checked locking
pattern assumes/requires atomicity of the "flag". Maybe that's not a good
requirement to have but I didn't invent the pattern. :)

> Is this "optimization" worth all the hassle? I wouldn't think so.

Perhaps not. I didn't invent the optimisation but presumably those that did
considered a need for it.

My original post in this thread showed how you could optimise the locking
of the instance pointer provided you subsequently synchronised all accesses
to the data accessed via the instance.

Subsequent optimisations have been aimed at allowing you to remove the
synchronisation in the accessor methods when you are accessing immutable
data. It certainly appears to be very heavy handed to require a mutex to
access an immutable field if the only time the mutex is really needed is
the first time a thread accesses that field.

Having said that there is still a hole in my presented optimisation on some
systems - but that's another post.

David

David Holmes

unread,

Jun 4, 1998, 3:00:00 AM6/4/98

to

Joe Seigh <se...@bose.com> wrote in article <1998Jun...@bose.com>...

> I don't think it is safe to assume in the case of fetching a pointer
> followed by the fetch of data referenced by the pointer, that the
> actual fetches have to occur in that order. The argument for this is
> based on the apparent belief that there can be only one way to
> accomplish or implement this on a processor.

They have to occur in that order in the sense that the address of the data
is found from the value of the pointer. Now if the pointer could have some
arbitrary value then this would be a problem but it can't. The pointer is
initialised to null and is subsequently set to some value. All threads must
either access the pointer as null or as the new value. If its the former
then it can't access the data, if its the latter then it would appear to be
okay. So while in the general case there could be problems, with the
restrictions on the pointer there are no ordering problems. Note that the
setting of the pointer to null is assumed to occur statically or at least
before the threads that will invoke instance() are created.

However there *is* a problem with what I proposed but it's not to do with
ordering it's to do with visibility/caching as you later point out.

For the sake of the discussion lets call the pointer 'ref' and the data
'field' so that we access the data as ref->field.

On a system with hardware cache coherency when we read ref->field we must
read the initialised value based on the fact that we have the value of ref
and that the data is written to memory before ref. This is what I've
previously argued.

On a system without hardware cache coherency there could be a problem. At
some point we context switch to the thread of interest and the cache is
made coherent at that point by the context switch software. Eventually in
this thread we will get to the point where we want to read ref->field.
Suppose that ref->field is of a type that does not occupy an entire cache
line on this system. Now suppose that at some time prior to reading
ref->field we read 'x' which shares the same cache line as ref->field. With
no intervening synchronisation points no cache coherency will be enforced
thus it is possible that when we come to read ref->field we find that it is
cached *but* it is actually a pre-initialisation value that is cached.

This would seem to be a fairly big hole, but we can determine that it is
probably much smaller. If 'x' shares a cache line with ref->field then they
must be very close together in memory. Arguably you could say that if they
are that close together in memory then they would be part of the same
object. However if they were part of the same object then the only way to
read 'x' would be as ref->x - which means x must be initialised and
therefore so must field. If we assume that object memory layout is always
aligned on a cache boundary then it seems even more likely that 'x' would
have to be in the same object as field.

But I'm no compiler expert and I guess there are no guarantees about
alignment so potentially a system with software cache coherency could
present a problem. So in that context what I have proposed would fail - as
would Dave B.s amendment unless his memoryBarrier() is also taking care of
cache coherency.

> Also, while there may be some architectures that do state that
> pointer fetches impose an order on memory fetchs (and I be interested
> in hearing which ones do) , Java isn't one of them. Specifically,
> java allows (does not forbid) caching of memory fetches. So in the
> case of the double-checked singleton, if an existing object was used
> for the singleton and data from that object was in cache, that data
> could be used even if it was stale.

Quite true. And it is this scenario that causes my example to break.

> You may argue that cache schemes don't return stale data.

No argument there. This discussion had centred so much on memory ordering
issues that I overlooked the caching/visibility issues.

My original example that started this thread is correct and valid, but
requires that all accesses to the data be synchronised. For accessing
immutable data this would seem to be an unnecessary overhead and subsequent
examples have tried to remove the need for synchronising these accessor
methods. Those attempts have all now been shown to be invalid in some
respect.

Cheers,
David

Dave Butenhof

unread,

Jun 8, 1998, 3:00:00 AM6/8/98

to

David Holmes wrote:

> Dave Butenhof <bute...@zko.dec.com> wrote in article
> <357550E6...@zko.dec.com>...
> > You can't expect coherency unless both parties cooperate. Locking the
> mutex on
> > the writer side will provide the necessary memory barriers there.
> However, you
> > CANNOT, EVER, UNDER ANY CIRCUMSTANCES safely read the results from
> another
> > thread without (at least) a memory barrier on that side, as well.
>
> As a general statement I agree but in the case where the read of the data
> relies on the read of the pointer ie. the address of the data to be read is
> formed by using the pointer then I disagree.

You can disagree all you want. You're still wrong.

> > With the addition of the memory barrier, and assuming you don't consider
> this
> > to be "synchronization", the statement is correct. However, without the
> memory
> > barrier on the READER, it may still see the pointer before it can see the
> data
> > to which the pointer should point.
>
> First disagreement. The memory barriers on the writer side ensure that if
> the pointer is in memory then the data is in memory. If we freeze the
> system and inspect memory at the completion of the final memory barrier
> then we will see the pointer and the data. If we don't then the memory
> barriers have failed. This is what happens to memory but it isn't
> necessarily the way other threads see things.

We're not talking about the order in which data gets TO memory, in this case.
We're talking about the order it gets FROM memory. As you say, "it isn't

necessarily the way other threads see things".

> One the reader side we can argue that reads can occur in any order so that
> a read of the data may be done before a read of the pointer. However any
> read reordering must maintain basic program consistency - which I'll
> explain. Let's assume that a reader tries to read data before reading the
> pointer. In that case how does the reader get to the data? It must use some
> value for pointer. The only values it can use are null - the original
> value, or some non-null value set by a writer thread. If the pointer is
> null then obviously it is impossible to read the data, but if the pointer
> was null then the program flow will detect that and go through the
> locking/initialise/unlock sequence and consequently force the data to be
> read via the correct pointer. As the reading of the data must occur after
> reading the pointer and as the data was guaranteed written before the
> pointer, this scenario works fine. If the pre-read of data uses the
> non-null value of pointer then again because it sees the non-null pointer
> value it must see the data which was written before that.

"Basic program consistency" works in the absence of correct memory
synchronization operations ONLY on a uniprocessor. Skipping the memory barrier
(or mutex) on the reader side will work fine on any uniprocessor. It will only
work on some multiprocessors.

You're welcome to break your programs in any way you see fit. Please stop
telling other people to break their programs, as well!

The fact that the reader will see either NULL or the written pointer, (though
it's true as long as there are no other possible values flying about in the
multiprocessor system), is irrelevant! When another processor reads the
pointer, that says nothing at all about what values it may see for other
memory.

The whole point is that the read of the data NEED NOT occur after reading the
data. A high-performance processor may very well decide to perform the memory
accesses out of order, for a wide variety of reasons, to avoid hardware
stalls. (Or rather, to minimize the impact of those stalls on the running
software.) The goal is to keep the program moving. Multiple instructions and
data accesses fly around independently, and get put back together based on
dependencies that the hardware can detect. It's "easy" to keep track of
register dependencies -- but data ordering dependencies, in a multiprocessor
program, must be explicitly declared using memory synchronization operations,
such as a memory barrier.

> > Your second mutex in the writer is unnecessary. You're using it to get a
> > memory barrier between setting the data and the pointer; but that barrier
> > won't help anyone since it only applies to the order in which that
> processor
> > will write data. It doesn't affect the order in which any other processor
> may
> > READ data. It's sufficient to use a single mutex, which will give the
> normal
> > guarantee that all data written while the mutex was locked shall have
> been
> > written prior to the unlock.
>

> The second mutex/memory barrier is necessary. The second barrier forces the

> data to be visible in memory before the value of the pointer. Without it

> the pointer could appear in main memory before the data values. A reader
> could then see the non-null pointer and even with your reader-side memory
> barrier it could then read data which still hadn't been written by the
> other processor.

The whole point is that the order of WRITE operations has nothing at all to do
with the order of READ operations. You've inserted an extra memory barrier
(actually, a set of extra memory barriers, plus additional costs) that won't
help anyone. You've over-protected the WRITE ordering, and done nothing about
the READ ordering.

Now, as I've been saying all along, to avoid a mutex on the read side, you
need a barrier between the read of the pointer and the read of the data. That
also implies a barrier on the writer between the write of the data and the
write of the pointer. It's perfectly reasonable to use an explicit memory
barrier rather than a mutex in both cases. My point, which your response
doesn't address, is that your extra mutex (which is overkill, since only a
barrier is required) doesn't help the read side. And without fixing the read
side, the extra overhead (of even a barrier, much less a mutex) on the writer
side is just wasted cycles.

Ron Hunsinger

unread,

Jun 8, 1998, 3:00:00 AM6/8/98

to

In article <01bd933e$e550dfe0$1bf56f89@dholmes>, "David Holmes"
<dho...@mri.mq.edu.au> wrote:

> The reader thread could try to read the data before reading the pointer,
> but
> to read the data the program must first read the value of the pointer - as
> the data is at an offset from the address given by the pointer. Now
> hopefully you will agree that the address to read for the data can not be
> determined without having some value for the pointer.

I didn't want to repudiate this until reading the whole thread, in case
someone else pointed out the flaw in this assumption first.

And then I was heartened to see you point out the flaw yourself. The data
from the instance may have been read into the cache as a side effect of
accessing some other data that has nothing whatever to do with the
instance, beyond the pure accident of being located in the same cache line.
Notice that at this point, instance data has been read from memory WITHOUT
having accessed the pointer to the instance.

But two posts later in the thread, you've already forgotten.

There is no reason to believe that data that shares a cache line has any
logical relationship. Cache lines are often much larger than the alignment
requirements imposed by the compiler, so it is not at all unusual to have
multiple (unrelated) objects sharing a cache line. If nothing else, the
memory allocator will often put its own private data adjacent to the blocks
it allocates.

While it is true that data cannot be accessed through a pointer without
first accessing the pointer, it does not follow (in the absence of hardware
cache coherency) that the data so accessed is current as of the time the
pointer was placed in memory, unless there is an intervening memory block
in both the writer and the reader threads. Without the memory block in the
reader, a pointer recently fetched from memory can still point to stale
data in the cache.

-Ron Hunsinger

David Holmes

unread,

Jun 9, 1998, 3:00:00 AM6/9/98

to

Dave Butenhof <bute...@zko.dec.com> wrote in article

<357BB8E1...@zko.dec.com>...

> We're not talking about the order in which data gets TO memory, in this
case.
> We're talking about the order it gets FROM memory. As you say, "it isn't
> necessarily the way other threads see things".

Once we have established the order in which things gets TO memory we have
also constrained what can be seen when we read FROM memory. Or to look at
it conversely it won't matter how much we constrain the reader if we also
haven't constrained the writer.

> The fact that the reader will see either NULL or the written pointer,
(though
> it's true as long as there are no other possible values flying about in
the
> multiprocessor system), is irrelevant! When another processor reads the
> pointer, that says nothing at all about what values it may see for other
> memory.

Lets consider this in two parts:

1. If we inspect physical memory what will we see.

We have already established that if you inspect physical memory and see the
non-null pointer then you must see the initialised data due to the memory
barriers on the writer side. Without this any reader would be up the creek
regardless.

2. What will the reader thread see in memory?

The reader thread could try to read the data before reading the pointer,
but
to read the data the program must first read the value of the pointer - as
the data is at an offset from the address given by the pointer. Now
hopefully you will agree that the address to read for the data can not be

determined without having some value for the pointer. The value of the
pointer will either be null or non-null as they are the only values the
pointer takes.

If it is null then the program logic dictates that the "instance == null"
branch of the program is taken, the synchronisation is used and the memory
barriers nullify any pre-read values for the data. If the pointer is
non-null then it must hold the correct address. If it holds the correct
address then the thread will pre-read the correct location of the data. By
virtue of the memory barriers on the writer side if the reader thread finds
the final value of the pointer in memory then it must find the final value
of the data (except of course for the cache coherency problem outlined -
which memory barriers don't fix).

The only way this can fail is if the hardware reads the null value of the
pointer and uses it to pre-read the data and then re-reads the pointer and
finds it no longer null but still uses the pre-read data. Given the data
dependency that exists here I don't believe such a sequence would be valid.
Do you know differently?

So for this special case of a pointer used to access data, where that
pointer can only be null or have its final value, then I believe that the
code given will work with respect to ordering. However it has been shown
that it can fail due to cache coherency problems so I wouldn't advise
anyone to use it.

> Now, as I've been saying all along, to avoid a mutex on the read side,
you
> need a barrier between the read of the pointer and the read of the data.
That
> also implies a barrier on the writer between the write of the data and
the
> write of the pointer.

Are you saying that a barrier between reading the pointer and reading the
data implies the *need* for a barrier between writing the pointer and
writing the data, or that it will *cause* such a barrier? I presume the
former.

In an earlier post you said that the extra barrier between writing the data
and writing the pointer won't help anyone - my point is that it is in fact
needed. Going back to the original coding sequence using a single mutex,
the lack of a barrier between writing the data and writing the pointer can
allow a reader to see the final pointer value but still read unitinialised
data - *not* because the reader reads things out of order but because the
writer wrote them out of order. The second mutex provides that needed
barrier. As a mutex() is the only POSIX way to achieve the required
ordering I showed the use of a mutex; further only a mutex would ensure
cache coherency.

The bottom line is that all of the optimisations to remove locking have now
been shown to be wrong for one or more reasons and should not be used.
While some here have known this all along, and others repeated the mantra
of "always use the mutex" without knowing exactly why, these discussions
have raised and exposed all of the complex issues that make the mantra
true. These issues are not well enough discussed in the general literature
and there is a general ignorance concerning them. These discussions help to
alleviate this ignorance by explaining why these optimisations are wrong.
Given the widespread use of such optimisations it is apparent these
discussions are needed.

David

David Holmes

unread,

Jun 10, 1998, 3:00:00 AM6/10/98

to

Ron Hunsinger <hns...@sirius.com> wrote in article
<hnsngr-ya0231800...@news.sirius.com>...

> But two posts later in the thread, you've already forgotten.

No I haven't forgotten, just arguing a different point. I already conceded
that the approach was in fact broken. It was why it is broken that I was
arguing.

But I concede defeat. I am wrong because I can not make assumptions about
what any general MP system is and is not allowed to do. Although specific
systems may work in ways that agree with what I said, I can not assume that
this is so.

Thanks to all those who responded here and offline. As I said earlier this
*has* been educational and hopefully not just for me.

Just a reminder that the example that originally started this thread *is*
correct - it uses the mutex to access the data. Our attempts to find an
optimisation that allowed the data to be accessed without using a lock
(specifically for the case of immutable data) have been unsuccessful.
Perhaps if nothing else this demonstrates a need for (portable) memory
synchronisation primitives other than mutexes.

Cheers,
David

Jason Rosenberg

unread,

Jun 21, 1998, 3:00:00 AM6/21/98

to

David Holmes wrote:
>
> Perhaps if nothing else this demonstrates a need for (portable) memory
> synchronisation primitives other than mutexes.
>

Well, in the pthread case, is there just one needed function:
pthread_memory_barrier() ?
Or are there 2 primitives needed: pthread_memory_read_barrier() and
pthread_memory_write_barrier() ?

I am still assuming that judicious use of the Interlocked atomic operations
under
Windows provide this needed ordering. No?

Jason

Joe Seigh

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

Maybe is more like it. Microsoft doesn't document it and win32.dll is one
of the dll's du jour. Also, Intel's smp architecture isn't exactly cast
in concrete. Are you willing to bet that all those n-way intel sytems out
there all have smp that behave's the same way?

--
Joe Seigh