The article says the following code may not work correctly in a
multi-threaded environment for two reasons. The code has volatile all
over the place to prevent the compiler from re-ordering code. The
idea is to avoid acquiring the (expensive) Lock every time you need to
access the singleton.
class Singleton {
public:
static volatile Singleton* volatile instance();
//...
private:
//
static volatile Singleton* volatile pInstance;
};
// from the implementation file
volatile Singleton* volatile Singleton::pInstance = 0;
volatile Singleton* volatile Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
volatile Singleton* volatile temp =
new volatile Singleton;
pInstance = temp;
}
}
return pInstance;
}
The first reason given for why this code may fail in a multi-threaded
environment is given on page 10
<quote>
First, the Standard’s constraints on observable behavior are only for
an abstract machine defined by the Standard, and that abstract machine
has no notion of multiple threads of execution. As a result, though
the Standard prevents compilers from reordering reads and writes to
volatile data within a thread, it imposes no constraints at all on
such reorderings across threads. At least that’s how most compiler
implementers interpret things. As a result, in practice, many
compilers may generate thread-unsafe code from the source above.
<end quote>
I can't figure out what the above quoted text is getting at. Can
anyone explain? What does "re-ordering across threads" mean?
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
There are two issues here:
1. In accordance with the C++03 standard, the compiler optimizer is
completely free to move both pInstance reads to *before* the Lock
construction -- essentially negating the value of the mutex.
Practically speaking, many compilers don't do this because compilers
have customers and customers expect code like this to work. However,
the standard doesn't preclude it, so any dependence on this behavior
is by definition non-portable.
2. While 'volatile' qualified variables mean that reads and writes
cannot be elided or re-ordered, there is no guarantee that (a) writes
are atomic or (b) that writes by one thread are immediately visible to
any other thread. I.e. CPU cores have data caches and they must be
occasionally invalidated and refreshed. C++03 makes no guarantees
about how these caches must be maintained. As a result, in this code,
more than one Singleton could be constructed as each thread entering
the function might find a stale value for pInstance in the CPU cache.
C++11 (ratified just last week) addresses both of these issues.
Compilers are no longer allowed to reorder read/write instructions
across std::mutex acquire/release boundaries (issue #1).
std::atomic<T> makes guarantees about atomicity and visibility of
write instructions (issue #2).
In fact, with C++11, all of the problems go away if you change the
declaration of pInstance to:
static std::atomic<Singleton*> pInstance;
*viola*
Mike
It means that within the bounds of C++98/03 you can't reliably use
threads. The reason we do and have been using them for a long time is
that the threading libraries (like POSIX threads) have additional
guarantees. However, when you use DLCP you forgo those guarantees
since you access a shared entity outside of the protected scope.
Generally, you couldn't use DLCP in any way portably and reliably.
One could still make it work if one bothered to write very
platform-specific code (for a specific compiler/OS/hardware) and
make sure that volatile does what you expect and by using right
memory barriers. But as a general rule, you couldn't do that. And
not the least because even the use of volatile + memory barriers
for threading is not guaranteed to give you the results you want.
In the C++0x (or should we start calling it C++11 ?) You're perfectly
able to do this portably with the use of atomics and correct ordering
directives.
HTH,
Andy.
Different threads may see writes to separate variables in different
orders, unless there is appropriate synchronization used.
e.g. thread A writes x=1, y=2. Threads B and C both read y and then x.
Thread B sees y==2, x==1. Thread C sees y==2, x==0.
Unless you have explicit synchronization, such as a mutex lock, this
is a valid outcome. Technically, this is undefined behaviour by the C+
+11 standard, unless x and y are atomic variables, and even then if
the operations are memory_order_relaxed then this outcome is still
permitted.
Anthony
--
Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/
just::thread C++0x thread library http://www.stdthread.co.uk
Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No.
5478976
on Wed Aug 17 2011, nospam <nospam-AT-nospam.nospam> wrote:
> There is something in this article that puzzles me
> http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
>
> The article says the following code may not work correctly in a
> multi-threaded environment for two reasons. The code has volatile all
> over the place to prevent the compiler from re-ordering code.
Even if volatile has that effect, the CPU is still allowed to "re-order
code" as long as the effects *observable in a single thread* aren't
changed.
> The idea is to avoid acquiring the (expensive) Lock every time you
> need to access the singleton.
Yes, an idiom which is well known to be very popular and very broken.
One way to look at this is that each thread may run on a separate
physical CPU or core, each one of which has its own cache. The caches
are only synchronized to one another by chance (when cache lines are
flushed to main memory and read back in elsewhere) and by explicit CPU
instructions (e.g. atomics and memory barriers) that are part of
threading primitives like mutex locks but are not generated in ordinary
C++ code. The result is that, without these explicit instructions, one
thread may not observe writes to a given memory location happening in
the same order as they are observed by another thread. That's what he
means by "re-ordering across threads."
Note: don't assume that just because you have only a single CPU or core
you are safe from these effects: compiler writers generally assume that
your code deserves no more protection from cross-thread confusion just
because your threads are running on a single core, and they don't go out
of their way to make sure you'll observe sensible effects unless you
correctly use the special CPU instructions to ensure that your threads
have a consistent view of the world.
Hope that helps.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
Sequentially consistent atomic (C++11 default for atomic) is an overkill
for a typical "double checked locking pattern" using a pointer
(dynamically allocated object).
C++11 has more better suited
std::memory_order_consume/memory_order_release (albeit, frankly, I just
can't grok associated std::kill_dependency() thing).
Elsewhere in this thread...
Andy Venikov wrote:
[...]
> In the C++0x (or should we start calling it C++11 ?) ...
C++0xB sounds much uglier than C++11. :-)
regards,
alexander.
No, compiler reordering aside, no special CPU instructions regarding
memory ordering/barriers are needed if a multithreaded program is
restricted to run on a single core/uniprocessor (not have more than one
thread running at the same time). For a uniprocessor, all fences/barrier
instructions can be just ignored (they are not needed). If you have
contrary evidence, please point to it.
regards,
alexander.
--
> Dave Abrahams wrote:
> [...]
>> Note: don't assume that just because you have only a single CPU or core
>> you are safe from these effects: compiler writers generally assume that
>> your code deserves no more protection from cross-thread confusion just
>> because your threads are running on a single core, and they don't go out
>> of their way to make sure you'll observe sensible effects unless you
>> correctly use the special CPU instructions to ensure that your threads
>> have a consistent view of the world.
>
> No, compiler reordering aside, no special CPU instructions regarding
> memory ordering/barriers are needed if a multithreaded program is
> restricted to run on a single core/uniprocessor (not have more than one
> thread running at the same time). For a uniprocessor, all fences/barrier
> instructions can be just ignored (they are not needed). If you have
> contrary evidence, please point to it.
I don't; I think I misspoke. What I meant to say was "unless you
correctly use synchronization to ensure..." My point is that in
general, even on a uniprocessor you shouldn't consider it safe to modify
shared state without synchronization.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
>On Aug 17, 8:51 pm, nospam <nos...@nospam.nospam> wrote:
>> There is something in this article that puzzles
>mehttp://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
>>
>>
>> The first reason given for why this code may fail in a multi-threaded
>> environment is given on page 10
>> <quote>
>> First, the Standard's constraints on observable behavior are only for
>> an abstract machine defined by the Standard, and that abstract machine
>> has no notion of multiple threads of execution. As a result, though
>> the Standard prevents compilers from reordering reads and writes to
>> volatile data within a thread, it imposes no constraints at all on
>> such reorderings across threads. At least that's how most compiler
>> implementers interpret things. As a result, in practice, many
>> compilers may generate thread-unsafe code from the source above.
>> <end quote>
>>
>> I can't figure out what the above quoted text is getting at. Can
>> anyone explain? What does "re-ordering across threads" mean?
>
>Different threads may see writes to separate variables in different
>orders, unless there is appropriate synchronization used.
>
>e.g. thread A writes x=1, y=2. Threads B and C both read y and then x.
>Thread B sees y==2, x==1. Thread C sees y==2, x==0.
>
>Unless you have explicit synchronization, such as a mutex lock, this
>is a valid outcome. Technically, this is undefined behaviour by the C+
>+11 standard, unless x and y are atomic variables, and even then if
>the operations are memory_order_relaxed then this outcome is still
>permitted.
ok, thanks. It seems to me that based on what you're saying, the text
I'm quoting (above) from the article is at least confusing when it
says "many compilers may generate thread-unsafe code". I took this to
mean that the compiler might "accidentally" generate thread-safe code
It's hard to imagine any compiler, even a thread-aware compiler,
automatically inserting the synchronization needed to make it safe.