I keep hearing that volatile is useless from the
multithreading point of view... Though I completely
agree that the use of volatile can not be sufficient
to guarantee thread-safety, my problem is that I
think it may be necessary in some cases (in addition
to proper synchronization).
What prevents the optimizer to assume whatever it
wants to assume about the variable "shared" in the
following program:
int main()
{
....
int shared = 1;
...
lots of code that don't touch shared
// at this point, the compiler may assume that
// shared is 1 -- however, with or without proper
// synchronization, shared may have been changed
// by another thread (a thread that may have been
// created and handed a pointer or reference to
// shared)
...
}
Am I missing something? The way I see it, shared would
have to be declared volatile -- that won't be sufficient,
but it seems necessary to guarantee thread-safety.
Why is it that I always hear that volatile and thread-
safety are two completely unrelated things?? (I do
understand that the naive eye may tend to believe that
volatile alone can guarantee thread-safety, and that
is a misconception ... But the typical reaction to
such claim seems also exaggerated... Again, unless
I'm missing something?)
Thanks for any comments,
Carlos
--
I use a volatile atomic variable when I need to change it infrequently, but I
need to read it frequently. I still wrap the variable in a mutex to protect
changing the variable.
Phil Frisbie, Jr.
Hawk Software
http://www.hawksoft.com
Also, in your example your shared variable was local to main (located on the
stack). Did you mean to make it global (located on the heap)?
Anyway, later.
-geoff
"Carlos Moreno" <moreno_at_mo...@xx.xxx> wrote in message
news:3DA35E63...@xx.xxx...
>
> int main()
> {
> ....
>
> int shared = 1;
>
> ...
>
> lots of code that don't touch shared
>
> // at this point, the compiler may assume that
> // shared is 1 -- however, with or without proper
> // synchronization, shared may have been changed
> // by another thread (a thread that may have been
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> // created and handed a pointer or reference to
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> // shared)
^^^^^^^^^^^^^^^^^^
>
> ...
> }
>
> Am I missing something? The way I see it, shared would
> have to be declared volatile -- that won't be sufficient,
> but it seems necessary to guarantee thread-safety.
>
I don't believe a value may remain cached in a register once you have taken
its address.
hys
--
Hillel Y. Sims
FactSet Research Systems
hsims AT factset.com
In order for another thread to have access to the variable, its
address must be taken, since it has automatic storage class.
If another thread then modifies it through that pointer, if the
program is to work correctly, it must be accessed with some
mutual exclusion taken to ensure memory visibility. The act
of locking a mutex (among other things) will, on a correct
POSIX implementation, flush any cached registers to memory.
>Am I missing something? The way I see it, shared would
>have to be declared volatile -- that won't be sufficient,
>but it seems necessary to guarantee thread-safety.
>
>Why is it that I always hear that volatile and thread-
>safety are two completely unrelated things?? (I do
>understand that the naive eye may tend to believe that
>volatile alone can guarantee thread-safety, and that
>is a misconception ... But the typical reaction to
>such claim seems also exaggerated... Again, unless
>I'm missing something?)
Volatile can not guarantee thread safety. It is also not required
for thread safety. Correct adherence by the application to the
POSIX memory visibility rules is *all* that is required for thread
safety.
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9"
Internet: steve @ Watt.COM Whois: SW32
Free time? There's no such thing. It just comes in varying prices...
Not true. (IMHO)
> What prevents the optimizer to assume whatever it
> wants to assume about the variable "shared" in the
> following program:
When you pass the address as a parameter to some function (e.g.
pthread_create) the compiler _will_ assume that the value of
``shared'' may have changed (if it doesn't not anything about that
other function or has no inter-procedural optimizations).
> Am I missing something? The way I see it, shared would
> have to be declared volatile -- that won't be sufficient,
> but it seems necessary to guarantee thread-safety.
The fact is that ``shared'' may be modified in ways unknown to the
program (in this particular case by another thread). Then it _must_
be declared volatile.
> Why is it that I always hear that volatile and thread-
> safety are two completely unrelated things?? (I do
One can hear all sorts of things :) Multithreading is one way to
obtain volatile behavior - thus volatile and thread safety are not
unrelated.
(by "thread-safety" I mean "correct execution" for the various values
of "correct").
"Thread-safety" does not equal "mutual exclusion".
Think about optimistic concurrency control algorithms, which perform
an unlocked (atomic) read of some variable and if the value is "right"
lock the variable and read it again. One would want to avoid the
compiler caching the value in register (or in other more "convenient"
place other than the variable's home location).
Another applicability - in reality most compilers (all?) would perform
volatile variable accesses in the order specified in the source, i.e.
would not reorder them, which coupled with the appropriate memory
barrier operations open the way to yet another class of algorithms.
~velco
I've written tons of multithreaded programs, all of which worked
as expected, without ever using volatile.
--
josh(at)intmain.net | http://intmain.net
CS @ College of Computing, Georgia Institute of Technology, Atlanta
532604 local keystrokes since last reboot, 37 days ago.
> What prevents the optimizer to assume whatever it
> wants to assume about the variable "shared" in the
> following program:
>
> int main()
> {
> ....
>
> int shared = 1;
>
> ...
>
> lots of code that don't touch shared
>
> // at this point, the compiler may assume that
> // shared is 1 -- however, with or without proper
> // synchronization, shared may have been changed
> // by another thread (a thread that may have been
> // created and handed a pointer or reference to
> // shared)
>
> ...
> }
>
Nothing prevents the optimizer from making that assumption. Not a C
example, but one in Ada with GNAT, you might even get a compiler warning
that "shared" could be declared constant if the scope was right. I
really *like* that warning since it sometimes points out bugs in the
code.
> Am I missing something? The way I see it, shared would
> have to be declared volatile -- that won't be sufficient,
> but it seems necessary to guarantee thread-safety.
>
Nope. You have a far better view of this than most readers.
> Why is it that I always hear that volatile and thread-
> safety are two completely unrelated things?? (I do
> understand that the naive eye may tend to believe that
> volatile alone can guarantee thread-safety, and that
> is a misconception ... But the typical reaction to
> such claim seems also exaggerated... Again, unless
> I'm missing something?)
>
Volatile is not enough for thread safety for a variety of reasons:
- the system may in some cases write values "out of order". Some
versions of the Alpha have this behavior. So if you update a buffer &
then the update index, you need a memory barrier between the two
instructions for safe operation. (to prevent the CPU from reordering
those two writes)
- other caches may exist that don't get flushed properly. I have a
special case in a system where caches on a card have to be flushed prior
to some read / write operations. In this case, the threads are on two
separate systems using memory "shared" across this special interface.
Again, these are cases where the hardware design allows for some
"strange behavior" to occur so it can get the maximum performance out of
the system. Over 99% of the code works just fine in this environment -
the < 1% that does not must get fixed.
Other people have stated that they can get away without volatile. They
are correct for the specific system / compiler combination (and even
compiler switch setting) they are using, but not for the general case.
--Mark
> (by "thread-safety" I mean "correct execution" for the various values
> of "correct").
>
> "Thread-safety" does not equal "mutual exclusion".
>
> Think about optimistic concurrency control algorithms, which perform
> an unlocked (atomic) read of some variable and if the value is "right"
> lock the variable and read it again. One would want to avoid the
> compiler caching the value in register (or in other more "convenient"
> place other than the variable's home location).
>
> Another applicability - in reality most compilers (all?) would perform
> volatile variable accesses in the order specified in the source, i.e.
> would not reorder them, which coupled with the appropriate memory
> barrier operations open the way to yet another class of algorithms.
The definition of "volatile" is essentially as you describe, with some
limitations.
The C language defines a series of "sequence points" in the "abstract
language model" at which variable values must be consistent with language
rules. An optimizer is allowed substantial leeway in reordering or
eliminating sequence points to minimize loads and stores or other
computation. EXCEPT that operations involving a "volatile" variable must
conform to the sequence points defined in the abstract model: there is no
leeway for optimization or other modifications. Thus, all changes
previously made must be visible at each sequence point, and no subsequent
modifications may be visible at that point. (In other words, as C99 points
out explicitly, if a compiler exactly implements the language abstract
semantics at all sequence points then "volatile" is redundant.)
On a multiprocessor (which C does not recognize), "sequence points" can only
be reasonably interpreted to refer to the view of memory from that
particular processor. (Otherwise the abstract model becomes too expensive
to be useful.) Therefore, volatile may say nothing at all about the
interaction between two threads running in parallel on a multiprocessor.
On a high-performance modern SMP system, memory transactions are effectively
pipelined. A memory barrier does not "flush to memory", but rather inserts
barriers against reordering of operations in the memory pipeline. For this
to have any meaning across processors there must be a critical sequence on
EACH end of a transaction that's protected by appropriate memory barriers.
This protocol has no possible meaning for an isolated volatile variable,
and therefore cannot be applied.
The protocol can only be employed to protect the relationship between two
items; e.g., "if I assert this flag then this data has been written" paired
with "if I can see the flag is asserted, then I know the data is valid".
That's how a mutex works. The mutex is a "flag" with builtin barriers
designed to enforce the visibility (and exclusion) contract with data
manipulations that occur while holding the mutex. Making the data volatile
contributes nothing to this protocol, but inhibits possibly valuable
compiler optimizations within the code that holds the mutex, reducing
program efficiency to no (positive) end.
If you have a way to generate inline barriers (or on a machine that doesn't
require barriers), and you wish to build your own low-level protocol that
doesn't rely on synchronization (e.g., a mutex), then your compiler might
require that you use volatile -- but this is unspecified by either ANSI C
or POSIX. (That is, ANSI C doesn't recognize parallelism and therefore
doesn't apply, while POSIX applies no specific additional semantics to
"volatile".) So IF you need volatile, your code is inherently nonportable.
A corollary is that if you wish to write portable code, you have no need for
volatile. (Or at least, if you think you do have a need, it won't help you
any.)
In your case, trying to share (for unsynchronized read) a "volatile"
counter... OK. Fine. The use of volatile, portably, doesn't help; but as
long as you're not doing anything but "ticking" the counter, (not a lot of
room for optimization) it probably won't hurt. IF your variable is of a
size and alignment that the hardware can modify atomically, and IF the
compiler chooses the right instructions (this may be more likely with
volatile, statistically, but again is by no means required by any
standard), then the worst that can happen is that you'll read a stale
value. (Potentially an extremely stale value, unless there's some
synchronization that ensures memory visibility between the threads at some
regular interval.) If the above conditions are NOT true, then you may read
"corrupted" values through word tearing and related effects.
If that's acceptable, you're probably OK... but volatile isn't helping you.
In summary, you're right, you don't need SYNCHRONIZATION here. But you
probably do expect some level of VISIBILITY, while you're doing nothing to
portably ensure any visibility. What you get occurs by accident, either
because your machine has a sequential memory system or because you're
gaining the "accidental" benefit of something else on the system, such as
clock ticks interrupts (which will tend to eventually synchronize memory
visibility across the processors).
--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/
I think you got it kinda backwards... And, BTW, the C standard says:
"What constitutes an access to an object that has volatile-qualified
type is implementation-defined"
regards,
alexander.
> The fact is that ``shared'' may be modified in ways unknown to the
> program (in this particular case by another thread). Then it _must_
> be declared volatile.
You are absolutely 100% right, unless you're talking about POSIX
threads, in which case you are absolutely 100% wrong. If you're talking
about POSIX threads, the use of POSIX synchronization functions is both
necessary and sufficient.
DS
% Other people have stated that they can get away without volatile. They
% are correct for the specific system / compiler combination (and even
% compiler switch setting) they are using, but not for the general case.
To reiterate, it is definitely not required for any POSIX system. For
any other system, it really depends on what the system's stated
requirements are. You can't depend on the C standard because threading
is always an extension to that.
--
Patrick TJ McPhee
East York Canada
pt...@interlog.com
So ?
Maybe you compiled without optimizations, maybe the registers were not
enough (like IA32), maybe you used global variables, which rarely get
registers ...
I've written tons of multithreaded programs, which didn't work without
using volatile.
From neither of these statements should be implied that
a) there does not exist a multithreaded program, which requires
volatile, nor
b) any multithreaded program requires volatile
but rather:
a) not every multithreaded program requires volatile
b) there exists multithreaded program, which requires volatile
Trivial examples:
a) optimistic unlocked reads - read, if the values is right lock and
read again. The volatile qualifier is needed so the read is actually
performed or otherwise the compiler can just load the variable in a
register and use that register throughout the function.
b) a "sensor" variable - a variable providing a stream of values,
where it is ok to read some slightly stale data, but it is important
that the new data is eventually read also - again volatile enusres the
actual read is performed.
c) memory ordering - CPU can reorder memory accesses, which is
prevented by memory barriers, but the compiler can reorder memory
accesses too, which is prevented by volatile.
Of course the above ones assume that
"volatile int x; y = x;"
constitutes "an access to an object that has volatile-qualified type"
(ISO/IEC 9899:1999).
~velco
As an example, think what would happen if the ``q_size'' variable in
mthttpd is allocated in a register.
(I don't think the standard forbids it and you can even force the GCC
to do it by declaring it ``int q_size asm ("esi");''. Note that which
the asm construct is GCC extension it merely forces the compiler to do
something not forbidded by the standard, thus one can pretend the
compiler did it all by itself :)
~velco
No, he was following the memory visibility rules for whatever platform(s)
he was working on. I am not aware of any platform that requires volatile
for thread safety.
Likewise, I have written several hundred non-trivial threaded programs,
on architectures ranging from microcontrollers to 64 bit processors,
numerous different compilers, and the highest optimization levels
possible.
Not once have I needed volatile for threading reasons.
>I've written tons of multithreaded programs, which didn't work without
>using volatile.
Then you were not on POSIX or Win32 platforms.
>From neither of these statements should be implied that
> a) there does not exist a multithreaded program, which requires
>volatile, nor
> b) any multithreaded program requires volatile
>
>but rather:
> a) not every multithreaded program requires volatile
> b) there exists multithreaded program, which requires volatile
>Trivial examples:
>
> a) optimistic unlocked reads - read, if the values is right lock and
>read again. The volatile qualifier is needed so the read is actually
>performed or otherwise the compiler can just load the variable in a
>register and use that register throughout the function.
This will not work on systems with weak memory ordering.
> b) a "sensor" variable - a variable providing a stream of values,
>where it is ok to read some slightly stale data, but it is important
>that the new data is eventually read also - again volatile enusres the
>actual read is performed.
If the variable's storage is in a device, it must be volatile, but that
has nothing to do with threads. If the variable is in another thread
that does nothing but compute new values for that variable, then you need
to obey the memory visibility rules that your platform sets out.
> c) memory ordering - CPU can reorder memory accesses, which is
>prevented by memory barriers, but the compiler can reorder memory
>accesses too, which is prevented by volatile.
This is not an example. Using volatile in an attempt to achieve thread
safety *WILL NOT WORK* on all platforms. Period. Further, there is
always a way to do the same job (thread safety of data) without using
volatile.
False example.
>(I don't think the standard forbids it and you can even force the GCC
>to do it by declaring it ``int q_size asm ("esi");''. Note that which
>the asm construct is GCC extension it merely forces the compiler to do
>something not forbidded by the standard, thus one can pretend the
>compiler did it all by itself :)
The compiler could put it into a register, if it knew how to make that
register available to all threads. mthttpd correctly follows the
memory visibility rules: All accesses to q_size are done while holding
a mutex.
The C language standard does not address threads, and thus things are
possible in it that are not possible under POSIX. POSIX requires
(roughly speaking -- read the standard for the gory details) that
modifications made to a memory object while holding a mutex will be
visible to any other thread that acquires that mutex afterward.
The compiler is free to keep q_size in a register up until the call to
pthread_mutex_unlock() or pthread_cond_wait(), in this particular example,
and that may well be a useful optimization. (But it's unimportant in
this exact bit of code.)
> >I've written tons of multithreaded programs, which didn't work without
> >using volatile.
>
> Then you were not on POSIX or Win32 platforms.
It might not have anything to do with threads. I think the only "usefuly
defined" use for volatile in C is for variables modified in signal
handlers. You have to declare them as volatile, threads or no threads.
> >Trivial examples:
> >
> > a) optimistic unlocked reads - read, if the values is right lock and
> >read again. The volatile qualifier is needed so the read is actually
> >performed or otherwise the compiler can just load the variable in a
> >register and use that register throughout the function.
>
> This will not work on systems with weak memory ordering.
Why not? I've inherited a certain program which uses those constructs a
lot and I'd like to prove it wrong, but I haven't been able to see a
problem, apart from the fact that it used int instead of sig_atomic_t.
The code goes like this:
int foo(...)
{
static volatile sig_atomic_t initialized = 0;
if (!initialized)
{
pthread_mutex_lock(...);
if (!initialized)
{
initialize_me();
initialized = 1;
}
pthread_mutex_unlock(...);
}
...
}
Unless... initialize_me function could store some values in memory. After
that we store one in initialized. When the next thread starts executing
this code, it might read one from initialized, but it won't necessarily
see all the values stored by initialize_me function, because there's no
memory barrier in its code path.
--
.-. .-. I don't work here. I'm a consultant.
(_ \ / _)
| da...@willfork.com
|
However, I do have another doubt (well, kind of the same
doubt, but now in a more specific context).
Several people have pointed out that when using POSIX
threads, volatile is never necessary, and that proper
synchronization using the right POSIX threads facilities
is always sufficient.
Now, how can the compiler know that? Is the compiler
aware that when using pthreads it must disable any
optimizations surrounding shared variables access? As
I understand it, POSIX threads are kinda multiplatform
(there are pthreads libraries for Unix/Linux, but also
for Windows -- maybe for MAC and other OS's too?). So,
how can POSIX threads alone provide any guarantee about
something that seems completely in the compiler's hands?
I understand that when passing a pointer to another
function, the compiler must know that such function
*could* modify the value, so any assumption about its
value would be dropped. But that only applies to the
call to pthread_create, where the pointer is passed
such that the other thread is given access to "shared".
But in a situation like:
int shared = 0;
pthread_create ( ...... , &shared);
pthread_mutex_lock ( some_mutex .... );
shared = 2;
pthread_mutex_unlock ( ...... );
lots of code that does NOT modify shared
(but in the mean time, the other thread
could have modified it, of course)
// at this point -- why is it that the calls
// to mutex_lock and unlock guarantee that
// the compiler will not do something wrong
// because it assumed that shared is 2?
// Or is that a wrong synchronization mechanism
// for this case?
Again, thanks for this great discussion! And thanks
in advance for any further comments on this POSIX
threads question.
Cheers,
Carlos
--
At this point, if 'shared' is accessed, it should be protected
with a mutex, since the data is shared and could have been written
to by another thread. Remember, you as a programmer are responsible
for this level of synchronization... when you add synchronization,
you're adding cancelation points, and the compiler then 'knows'
not to assume anything.
I'll let the other, more frequent thread programmers out there
do a better job of explaining than I just did :-)
--
josh(at)intmain.net | http://intmain.net
CS @ College of Computing, Georgia Institute of Technology, Atlanta
532704 local keystrokes since last reboot, 38 days ago.
% Several people have pointed out that when using POSIX
% threads, volatile is never necessary, and that proper
% synchronization using the right POSIX threads facilities
% is always sufficient.
%
% Now, how can the compiler know that? Is the compiler
% aware that when using pthreads it must disable any
% optimizations surrounding shared variables access? As
It doesn't have to disable all optimisations, but the short
answer is that the compiler is part of the POSIX system, and
it has to do whatever it has to do in order to satisfy the
requirements.
+ (by "thread-safety" I mean "correct execution" for the various values
+ of "correct").
Thread-shared objects must be protected by mutexes whose operations
must
1) behave as super sequence points for thread-shared variables and
2) must do more (i.e., invoke hardware-level barriers to reordering).
If all thread-shared variables have volatile-qualified types, we pay
a huge penalty in overhead, but #1 becomes redundant. Sigh.
However, the mutexes themselves involve thread-shared objects, i.e.,
the objects that represent the state of those mutexes. Obviously,
those state objects require the same treatment that C/C++ requires for
objects of volatile-qualified types, i.e., you can't lock a mutex and
then keep its dirty state variable in a register --- no other thread
will know that the mutex is locked. (Note, however, that mutexes
cannot be implemented in standard C/C++.)
Tom Payne
>
> Wow! This has been a great discussion! (well, for me
> anyway). I appreciate your comments and thoughts!
>
> However, I do have another doubt (well, kind of the same
> doubt, but now in a more specific context).
>
> Several people have pointed out that when using POSIX
> threads, volatile is never necessary, and that proper
> synchronization using the right POSIX threads facilities
> is always sufficient.
>
> Now, how can the compiler know that? Is the compiler
> aware that when using pthreads it must disable any
> optimizations surrounding shared variables access? As
> I understand it, POSIX threads are kinda multiplatform
> (there are pthreads libraries for Unix/Linux, but also
> for Windows -- maybe for MAC and other OS's too?). So,
> how can POSIX threads alone provide any guarantee about
> something that seems completely in the compiler's hands?
A C or C++ compiler cannot know that, because the language does not
address threading. The Posix libraries can know that. If you want
a compiler to know about threading issues you must use a language
with threading built into the syntax. The three languages that
most frequently come to my mind for this purpose are Java, C#,
and Ada.
Of those three, the most robust and complete threading model is
provided by Ada. Ada has robust locking mechanisms as well as a
couple of useful pragmas.
Pragma Atomic is used to specify that access (reads and writes)
must be indivisible. This can only be applied to objects no larger
than a "word" on the current hardware.
Pragma Volatile specifies that all accesses must be direct, not
through local copies of a variable.
These two pragmas are useful for shared data as long as you can
ensure there is no possible race condition between reader threads
and writer threads. In general this is very difficult to ensure without
some form of locking.
Jim Rogers
Who claimed you _always_ need volatile for threading reasons ? Care
to read my message to the end ?
> >I've written tons of multithreaded programs, which didn't work without
> >using volatile.
>
> Then you were not on POSIX or Win32 platforms.
Well, I was both on POSIX and Win32 platforms.
>
> >From neither of these statements should be implied that
> > a) there does not exist a multithreaded program, which requires
> >volatile, nor
> > b) any multithreaded program requires volatile
> >
> >but rather:
> > a) not every multithreaded program requires volatile
> > b) there exists multithreaded program, which requires volatile
>
> >Trivial examples:
> >
> > a) optimistic unlocked reads - read, if the values is right lock and
> >read again. The volatile qualifier is needed so the read is actually
> >performed or otherwise the compiler can just load the variable in a
> >register and use that register throughout the function.
>
> This will not work on systems with weak memory ordering.
Not true. Memory ordering is ensured by the implementations of the
lock/unlock functions. The point is that no memory ordering can help
you if the compiler does _not_ perform the read.
> > b) a "sensor" variable - a variable providing a stream of values,
> >where it is ok to read some slightly stale data, but it is important
> >that the new data is eventually read also - again volatile enusres the
> >actual read is performed.
>
> If the variable's storage is in a device, it must be volatile, but that
> has nothing to do with threads. If the variable is in another thread
> that does nothing but compute new values for that variable, then you need
> to obey the memory visibility rules that your platform sets out.
Define "memory visibility" ?
On every platform I'm aware of, if a CPU performs an (atomic) memory
write, the value written is _eventually_ visible to other CPUs.
Moreover, on every platform I'm aware of, the sequence of values read
from a single memory location is a (not necessarily proper)
subsequence of the sequence of values written, i.e. no CPU can observe
values occuring in the opposite order.
These, along with atimicity of reads/writes if sufficient for the
above examples to work. Note that I don't claim (and have _never_
claimed) they are necessary.
> > c) memory ordering - CPU can reorder memory accesses, which is
> >prevented by memory barriers, but the compiler can reorder memory
> >accesses too, which is prevented by volatile.
>
> This is not an example.
This is not an argument.
> Using volatile in an attempt to achieve thread
> safety *WILL NOT WORK* on all platforms. Period.
This is not an argument either.
> Further, there is
> always a way to do the same job (thread safety of data) without using
> volatile.
Most probably. So, what?
Correctness first, but after all we want performance too, don't we ?
On modern SMP architectures every write to concurently accessed memory
is a potential bottleneck - including things like
pthread_mutex_lock/unlock.
Note that the same applies to pthread_rwlock_rdlock too as it performs
a memory write too. Your best bet after making all the efforts to
have no concurrently accessed locations is to perform mostly reads
there and avoid as hell pthread_ synchronization functions (or any
sync functions for that matter).
(Well, maybe I misunderstood the topic of the newsgroup, maybe it is
about bitching about standards instead of multithreaded programming in
the real world, so I'll go find the FAQ).
~velco
This is not a valid argument. You _can_ depend on the C standard for
the things specified in the C standard. The compiler is unaware of
threads, but this does not mean it will perform randon violations of
the standard.
~velco
No. :)
>
> >(I don't think the standard forbids it and you can even force the GCC
> >to do it by declaring it ``int q_size asm ("esi");''. Note that which
> >the asm construct is GCC extension it merely forces the compiler to do
> >something not forbidded by the standard, thus one can pretend the
> >compiler did it all by itself :)
>
> The compiler could put it into a register, if it knew how to make that
> register available to all threads.
The compiler is not entitled to reasoning about threads.
> mthttpd correctly follows the
> memory visibility rules: All accesses to q_size are done while holding
> a mutex.
Which will not work if q_size is in a register and it _is_ possible,
of the compiler determines that pthread_mutex_lock cannot modify it.
> The C language standard does not address threads, and thus things are
> possible in it that are not possible under POSIX.
POSIX does not invalidate the C standard. Features of the C standard
not specified, explicitly modified or forbidden by POSIX are by no
means invalidated.
> POSIX requires
> (roughly speaking -- read the standard for the gory details) that
> modifications made to a memory object while holding a mutex will be
> visible to any other thread that acquires that mutex afterward.
Yes. That POSIX requirement (IEEE Std. 1003.1-2001 [4.10 Memory
Synchronization]) indeed renders all of my examples non-conforming.
It is yet another topic for discussion whether this point should have
been included at all in the standard. I'm yet to see a single system
where this is the _only_ way to obtain safe access to shared
variables. On the opposite, on the _majority_ of systems out there
this requirement is unnecessarily restrictive and severely damaging
performance.
> The compiler is free to keep q_size in a register up until the call to
> pthread_mutex_unlock() or pthread_cond_wait(), in this particular example,
> and that may well be a useful optimization. (But it's unimportant in
> this exact bit of code.)
Why it is not allowed to keep it in a register across calls to
pthread_mutex_lock/unlock ?
~velco
It will see them. There will be a memory barrier inside
pthread_mutex_unlock, said memory barrier ensuring that the writes in
``initialize_me()'' and to ``initialized'' are ordered before the
write, which makes the mutex unlocked, i.e. no other CPU/thread can
see the mutex unlocked and read stale values.
~velco
Try this:
http://www.crhc.uiuc.edu/ece412/papers/models_tutorial.pdf
http://www.primenet.com/~jakubik/mpsafe/MultiprocessorSafe.pdf
regards,
alexander.
Stop silly arguing and read {trying to understand} the stuff
you're pointed to.
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
(The "Double-Checked Locking is Broken" Declaration)
http://groups.google.com/groups?selm=G3Q49.14%24Fr2.86101%40news.cpqcorp.net
(Subject: Re: DCL and code optimization and etc.)
regards,
alexander.
http://groups.google.com/groups?selm=3D81DFCF.D8FF9B50%40web.de
<copy&paste>
White Wolf wrote:
>
> Alexander Terekhov wrote:
> > Attila Feher wrote:
> >
> >>Alexander Terekhov wrote:
> >>[SNIP]
> >>
> >>>>That _is_ wrong. bool has to be sig_atomic_t to work - as long as only
> >>>>assignment and read is done. (Only those are atomic on sig_atomic_t.)
> >>>
> >>>http://groups.google.com/groups?threadm=3D4A8DDD.5A935D95%40web.de
> >>>(Subject: Re: "memory location")
> >>
> >>OK. Now I am puzzled. So far I was assured by several, that
> >>sig_atomic_t is "the stuff" which is A OK and safe to write from one
> >>thread and read from the other. I mean Thread A _only_ writes (OK,
> >>maybe reads but what for) and Thread B _only_ reads. Do you mean that
> >>this doesn't work?
> >
> >
> > Yes. And even if it WOULD work (i.e. atomicity), you I'd still have
> > the problem of visibility w.r.t. dependent {mutable} data (if any).
>
> Now I am even more puzzled: why is it called sig_atomic_t if it isn't?
It (i.e. *static volatile sig_atomic_t*) IS "atomic" (and even
thread-safe ;-) ) with respect to ONE SINGLE thread that reads
and writes it AND signal handler(s) "interrupting" THAT thread.
IOW, it has really nothing to do with threads.
http://www.lysator.liu.se/c/rat/b.html#2-2-3
("2.2.3 Signals and interrupts", ANSI C89 Rationale)
> Why do "big old names" say it _is_ safe to use it (well, from interrupt
> routines, but in an MT environment that IT can come whoknowswhatway).
>
> So one thing. If read and write _is_ atomic for this type, what is the
> problem? AFAIK (OK, did not read it) the standard asks for it. Then
> why cannot I use for (for example) a dirty flag? I look at my copy of
> sth (reading this flag) and if it is non-zero (integral type it is) then
> I know I have (when I want) update my copy. Which will certainly
> involve some sort of locking - again if needed. Got it? Why cannot I
> use this type in MT enviroment for this? One threads writes (only!) the
> data, others may read it. Where can it go wrong? Please do not post
> links to scattered lengthy discussion - I am a simple man and I get
> confused easily.
>
> I have worked in Intel (OK, only 8086) assembly, still have the books
> about Z80 and some more HW design so let's cut to the chase! What does
> make sig_atomic_t non-safe. I do not care about POSIX, that is
> something what is made by pretty clever in a way that I will not
> understand (standard). However I understand the concept of system bus,
> somewhat the cache (cash even moe :-), controll signals and the like.
> So where the heck can it go wrong, if the operation is atomic?
http://rsim.cs.uiuc.edu/~sadve/Publications/models_tutorial.ps
"5.2.1 Cache Coherence and Sequential Consistency
Several definitions for cache coherence (also referred to
as cache consistency) exist in the literature. The strongest
definitions treat the term virtually as a synonym for
sequential consistency. Other definitions impose
extremely relaxed ordering guarantees.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What does the programmer expect from the memory system to
ensure correct execution of this program fragment? One
important requirement is that the value read from the
data field within a dequeued record should be the same
as that written by P1 in that record. However, in many
commercial shared memory systems, it is possible for
processors to observe the old value of the data field
(i.e., the value prior to P1's write of the field),
leading to behavior different from the programmer's
expectations.
http://groups.google.com/groups?selm=c29b5e33.0202150632.6d579f24%40posting.google.com
(Subject: Re: Can multiple threads set a global variable simultaneously?)
regards,
alexander.
--
POSIX
"Applications shall ensure that access to any memory
location by more than one thread of control (threads
or processes) is restricted such that no thread of
control can read or modify a memory location while
another thread of control may be modifying it. Such
access is restricted using functions that synchronize
thread execution and also synchronize memory with
respect to other threads."
JAVA (revised Java volatiles aside)
"If two threads access a normal variable, and one
of those accesses is a write, then the program should
be synchronized so that the first access is visible to
the second access. When a thread T 1 acquires a lock
on/enters a monitor m that was previously held by
another thread T 2, all actions that were visible to
T 2 at the time it released the lock on m become
visible to T 1"
[...]
> It (i.e. *static volatile sig_atomic_t*) IS "atomic" (and even
> thread-safe ;-) ) with respect to ONE SINGLE thread that reads
> and writes it AND signal handler(s) "interrupting" THAT thread.
> IOW, it has really nothing to do with threads.
OK, you've forced me to go and read the C standard and I'm not thankful.
Let me modify the above:
Although C claims that an object of type sig_atomic_t "can be accessed as
an atomic entity, even in the presence of asynchronous interrupts", I've
seen compiler documentation which asks that variables modified from
signal handlers be declared as volatile if the user wants to use certain
level of optimization.
My point was: <copy&paste>
well, i was under impression that "sig_atomic_t" alone
does not guarantee thread (or even signal) safety..
only the combination of _static_storage_duration_,
_volatile_ and _sig_atomic_t makes it safe.. and only
for signal handlers.. i could imagine an impl. which
would just disable signal delivery while accessing
"static volatile sig_atomic_t" variable (allocated
in some special storage region - for static volatiles
sig_atomic_t's only) or would do something else which
would NOT work with respect to threads.
or am i missing something?
---
> : What is needed is something similar to the Java memory model requirement
> : that values cannot "come out of thin air"
I don't think so. http://groups.google.com/groups?selm=3C9236F3.49C68326%40web.de
> (i.e. roughly speaking, a value
> : read from any variable must have been previously written to that variable,
> : with some additional ordering constraints). This has little or nothing to do
> : with the semantics of sig_atomic_t (or volatile), which the C99 Standard
> : only defines for single-threaded programs.
>
> Moreover, the standard only guarantees atomicity of writes by signal
> handlers to data
static data
> of type sig_atomic_t, and only when the object is
> also declared to be volatile. Objects of type sig_atomic_t are not
> guaranteed to be atomic in any other context.
AFAICS, it's even worse than that... in a multithreaded application that
happens to use asynchronous signals [vs. sigwait and/or SIGEV_THREAD delivery]
with static volatile sig_atomic_t vars you'd have to ensure that such signals
could only be "delivered" to a corresponding ONE SINGLE thread -- the one that
reads/writes a particular static volatile sig_atomic_t variable(s). You just
can't have such signal(s) delivered to any other thread.
regards,
alexander.
> Carlos Moreno <mor...@mochima.com> wrote:
>
>> // at this point -- why is it that the calls
>> // to mutex_lock and unlock guarantee that
>> // the compiler will not do something wrong
>> // because it assumed that shared is 2?
>>
>>
>
> At this point, if 'shared' is accessed, it should be protected
> with a mutex, since the data is shared and could have been written
> to by another thread. Remember, you as a programmer are responsible
> for this level of synchronization... when you add synchronization,
> you're adding cancelation points, and the compiler then 'knows'
> not to assume anything.
Muy doubt was (is?) how does the compiler know not to assume
anything?? If I need to access "shared", I would do:
pthread_mutex_lock ( the mutex );
int a = shared;
pthread_unlock ( the mutex );
Since the calls to lock and unlock do not involve taking the
address of "shared", how would the compiler know not to assume
that shared is 2? (well, whatever value was assigned originally
and never changed in this thread)
The next reply in this branch answers this question, saying
that the compiler itself is part of the POSIX system (that
is something that wasn't very clear in my mind -- I kept
thinking that you were simply talking about a threading
library called "POSIX threads", as opposed to a complete
system specification)... But then, the following reply
seems to suggest that C or C++ are not necessarily part
of that specification. As you can understand, my poor
brain is about to explode with so many details! :-)
So, I'll turn my doubt into a concrete question: I'm using
Linux (RedHat 7.2, or 7.3, or soon it'll be 8.0), with g++
(I normally use C++, but in some cases I might end up using
C as well). In that scenario, is my system compliant with
the POSIX and POSIX threads specification? Or might I
need to use volatile in cases like my example with the
shared variable? I guess *if* I need to use volatile, it
would be only with auto storage variables, right?
Thanks!
Carlos
--
To begin with, POSIX mutexes aren't cancelation points. See
POSIX threads rationale if you want to know/understand why.
> > and the compiler then 'knows' not to assume anything.
Yes it knows, but with has really nothing to do with thread
cancelation, AFAIK.
> Muy doubt was (is?) how does the compiler know not to assume
> anything?? If I need to access "shared", I would do:
>
> pthread_mutex_lock ( the mutex );
> int a = shared;
> pthread_unlock ( the mutex );
>
> Since the calls to lock and unlock do not involve taking the
> address of "shared", how would the compiler know not to assume
> that shared is 2? (well, whatever value was assigned originally
> and never changed in this thread) [.... RedHat ....]
Quoting James Kanze: < Newsgroups: comp.lang.c++.moderated,
Subject: Re: volatile -- what does it mean in relation to
member functions? >
<quote>
> Without volatile, the compiler might decide that it already knows the
> value that it will be using a few lines down anyway and keep it in a
> register instead of writing it back to memory and then reading it back
> from memory. This is a portable behaviour of volatile.
As we've been trying to explain, guaranteeing that the compiler will not
use a value which it explicitly cached in a register simply doesn't buy
you anything.
> Will a mutex force the compiler to generate memory reads/writes?
There's no such thing as a mutex in C++, so it obviously depends on the
system definition of mutex.
In Posix (IEEE Std 1003.1, Base Definitions, General Concepts, Memory
Synchronization): "The following functions synchronize memory with
respect to other threads: [...]". Both pthread_mutex_lock and
pthread_mutex_unlock are in the list.
> It's only even possible if it is aware that you *are* using a mutex.
If I call pthread_mutex_lock, I guess that the compiler can suppose that
I am using a mutex.
Posix makes certain requirements. (I suppose that Windows threads offer
similar guarantees, and make similar requirements.) If my program
conforms to those requirements, and the system claims Posix compliance,
then it is the compiler's or the system's problem to make my program
work. It's none of my business how they do it.
With regards to code motion of the compilers, there are two relatively
simple solutions:
- The compiler knows about the system calls, and knows that it cannot
move reads or writes around across them, or
- The compiler doesn't know about them, and treats them just as any
other external function call. In this case, of course, it had
better ensure that the necessary reads and writes have taken place,
since it cannot assume that the called code doesn't make use of or
modify the variables in question. (Any object accessible from
^^^^^^^^^^^^^^^^^^^^^^^^^^
another thread would also be accessible from an external function
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
with unknown semantics.)
^^^^^^^^^^^^^^^^^^^^^^^^
Most compilers currently use the second strategy, at least partially
because they have to implement it anyway -- I can call functions written
in assembler from C++, and there is no way that the C++ compiler can
know their semantics, so all that is needed is that the C++ compiler
treat pthread_mutex_lock et al. as if they were unknown functions
written in assembler (which is often the case in fact anyway).
</quote>
regards,
alexander.
Heck, here's my final copy&paste.
regards,
alexander. < ``over and out.'' >
Subject: Re: stl deques and "volatile"
Newsgroups: comp.lang.c++
Date: 2002-08-23 05:47:59 PST
Gerhard Prilmeier wrote:
[...]
> Opposed to that, Andrei Alexandrescu wrote a lengthy article about using
> volatile in multhithreaded programs:
> http://www.cuj.com/experts/1902/alexandr.htm
>
> You see me rather confused, unfortunately.
> Who is wrong? Are both right?
Ha! Yeah, that was rather funny and noisy stuff, indeed. ;-)
http://groups.google.com/groups?selm=94ccng%24m2t%241%40nnrp1.deja.com
"Please, Andrei, not 'volatile'!"
http://groups.google.com/groups?selm=3A66E4A1.54EE37AD%40compaq.com
<quote>
Andrei Alexandrescu wrote:
> In my opinion the use of volatile that the article suggests is fully in
> keeping with its original intent. Volatile means "this data can be changed
> beyond compiler's ability to figure it out" and this is exactly what happens
> data that's shared between threads. Once you lock the synchronization
> object, the code's semantics become single-threaded so it cannot be changed
> beyond compiler's ability to comprehend and so you can cast volatile away.
> What's so wicked in that? Looks very kosher to me :o).
After the number of responses in this thread, you can still say that? Amazing.
Original intent? The intent of the "volatile" attribute is to change the code
generated by the compiler on references to memory tagged with that attribute.
You are using the syntactic tag while defeating the intent of that tag, never
allowing the compiler to generate code in the way required by that tag. Abuse,
I'm afraid, is often in the eye of the beholder, but it's hard to see how
anyone could refuse to admit that this is, at best, "pretty close to the edge".
In fact, though you've said you weren't intending to advocate direct access to
any "volatile" variable without applying synchronization and casting away
"volatile", your first Gadget::Wait example does precisely that, and is wildly
incorrect and dangerously misleading. Compiler volatile semantics are not
sufficient when sharing flag_ between threads, because the hardware, as well as
the compiler, may reorder memory accesses arbitrarily, even with volatile. (Nor
would a compiler implementation that issued memory barriers at each sequence
point for volatile variables be sufficient, unless ALL data was volatile, which
is impractical and unreasonably expansive.)
Memory barriers must be applied where necessary on many architectures, and
there is no standard or portable way to generate them. There is no excuse for a
compiler to require both volatile AND memory barriers, because there's no
excuse for a compiler to reorder memory access around its own memory barrier
construct. (Usually either a compiler builtin such as Compaq C/C++ "__MB()" or
an asm("mb") "pseudocall".) The standard and portable way to ensure memory
consistency is to rely on the POSIX memory model, which is based solely on
specific POSIX API calls rather than expensive and inappropriately defined
language keywords or nonportable hardware instructions. A system or compiler
that does not provide the proper memory model (without volatile) with proper
use of the portable POSIX API calls does not conform to POSIX, and cannot be
considered for serious threading work. Volatile is irrelevant.
Entirely aside from the language issues, my point was simply that "volatile",
and especially its association with threaded programming, has been an extremely
confusing issue for many people. Simply using them together is going to cause
even more confusion. The illusory promise of volatile will lead novices into
trouble.
In contradiction to your absurd statement that "writing multithread programs
becomes impossible" without volatile, the intended C and C++ semantics
associated with volatile are neither useful nor sufficient for threaded code.
And it is WITH volatile, not without, that "the compiler wastes vast
optimization opportunities", especially as the expense of meeting the volatile
"contract" is of no benefit to threaded code.
With all that said, I wish there was a language keyword intended to be used in
the manner you're (ab)using volatile. Though I think your method falls far
short of your promises to detect all race conditions at compile time (unless
applying such a limited and unusual definition of "race" that the term becomes
essentially meaningless), it does have value. What you've done is, in some
ways, one step beyond the Java "synchronized" keyword. It provides not only
syntax to require that access be synchronized, but your type magic allows the
compiler to determine whether, in the current scope, the data is already
synchronized. (This might allow avoiding the Java dependency on expensive
recursive mutexes. Though I'm not entirely convinced your method would survive
a complicated application with multilevel lock hierarchies, I'm not entirely
convinced it wouldn't, either.)
Still, if you're willing to point out that applying volatile to tag temporaries
would be "abuse", recognize that others might reasonably draw the line a bit
differently.
</quote>
http://groups.google.com/groups?selm=3A684272.EC191FD%40compaq.com
<quote>
Andrei Alexandrescu wrote:
> "Dave Butenhof" <David.B...@compaq.com> wrote in message
>
> > In fact, though you've said you weren't intending to advocate direct access to
> > any "volatile" variable without applying synchronization and casting away
> > "volatile", your first Gadget::Wait example does precisely that, and is wildly
> > incorrect and dangerously misleading. Compiler volatile semantics are not
> > sufficient when sharing flag_ between threads, because the hardware, as well as
> > the compiler, may reorder memory accesses arbitrarily, even with volatile. (Nor
> > would a compiler implementation that issued memory barriers at each sequence
> > point for volatile variables be sufficient, unless ALL data was volatile, which
> > is impractical and unreasonably expansive.)
>
> Yeah, I learned to hate the Gadget example. Where's that chrononaut to go
> back in time and remove it.
You can, at least, write a followup article to correct and clarify. It won't
reach everyone who ought to see it, but it's better than nothing.
> > In contradiction to your absurd statement that "writing multithread programs
> > becomes impossible" without volatile, the intended C and C++ semantics
> > associated with volatile are neither useful nor sufficient for threaded
> code.
>
> I agree. Boy this is hard :o).
Yeah, well, you certainly got a lot of attention for your paper. As the saying
goes, "I don't care what they say about me as long as they get my name right."
(Or, "all advertising is good advertising.")
You're right; it is hard to play around between the cracks as you're doing.
There's not a lot of wiggle room. Sounds like you'll be more careful in the
future, and that's good. Now your job is to try to help anyone you confused the
first time around. ;-)
> > And it is WITH volatile, not without, that "the compiler wastes vast
> > optimization opportunities", especially as the expense of meeting the volatile
> > "contract" is of no benefit to threaded code.
>
> What I meant was that the compiler would waste optimization opportunities if
> it treated all variables as if they were volatile. But anyway, given that
> volatile is not really of a lot of help...
Ah. Yes, eliminating all optimization would make threaded programming
impractical, at best. After all, most (though not all) applications use threads
to improve performance. While it's true that in some cases parallelized but
unoptimized code might outperform optimized unthreaded code, I wouldn't want to
bet my job on it happening a lot.
> I am glad I'm not the only one who felt there is something cool here.
> Perhaps the most important point of the article is the importance of type
> modifiers in programming languages, and how one can define/use such
> modifiers to help with multithreaded programs.
Oh yes, it's cool. In principle. It's also fairly simple, and may prove
applicable only to relatively simple programs (e.g., that never hold more than
one mutex at a time, as we'll get into below).
Perhaps this is an interesting opportunity for the language folks; to build a
language (or maybe a new C++ version) that allows something like an
"attributedef" statement, defining properties of an attribute keyword that can
be applied to classes and typedefs. You, for example, could have used a
"locked" keyword instead of confusingly overloading "volatile". I'll bet such a
keyword, which could be added or cast away at need, would enable all sorts of
interesting extensions of the compiler's type checking... including perhaps
that thing about detecting temporaries.
> The projects to which I've applied the idiom are "classic" multithreaded
> applications. The technique is easy to explain and is field tested, and not
> only by me - programmers who are not MT saviors have caught it up in no time
> and loved it, and this is is an important reason for which I believe the
> idiom is valuable. Indeed, I don't know what would happen on special
> threading models. Could you please tell what multilevel lock hierarchies
> are?
There are many cases in complicated threaded applications where a region of
code must hold more than one lock at the same time. Such code must always have
DANGER signs posted at the entrances, and you need to be really careful. Still,
there are well established ways to deal with the risks (just as, foolish though
it may be, we often drive our cars onto highways without bothering to consider
that we might die there).
The risk is deadlock, or "deadly embrace" -- the good ol' Dining Philosophers
problem. One thread owns Mutex A, and waits for Mutex B; while another thread
owns Mutex B and waits for Mutex A. The most common and "well structured"
solution to this problem is to design a strict "mutex hierarchy" defining the
"level" of each mutex. That is, if one needs both the mutex on the head of a
queue and on an element of the queue, one must always first lock the head and
only then lock the element. There is no risk of deadlock, because the element
cannot be locked unless the head is also locked.
Your technique doesn't make it impossible or even more difficult to manage
mutex hierarchies: but it doesn't make it any easier, either. Furthermore, the
"advertised power" of the technique (as currently structured) is somewhat
weakened when an object needs to be protected by multiple mutexes: locking the
element would provide a non-volatile pointer, even though correct use of that
pointer actually requires a second mutex (the header). Could you reasonably
extend the model to deal syntactically with mutex hierarchies? Would the
complacency suggested by reliance on the model prove disastrous in an
application that required hierarchies?
</quote>
Well, to be fair:
http://www.informit.com/isapi/product_id~%7BE3967A89-4E20-425B-BCFF-B84B6DEED6CA%7D/element_id~%7B1872DFB1-6031-4CB0-876D-9533C4A23FC9%7D/st~3FAD3499-20A6-4782-9A96-05825F8E6E5B/content/articlex.asp
(Multithreading and the C++ Type System
FEB 08, 2002 By Andrei Alexandrescu. Article is provided courtesy of Addison Wesley.)
<quote>
An article of mine[4] describes practical compile-time race condition
detection in C++. The method exploits the volatile keyword not as a
semantic vehicle, but only as a participant to the C++ type system.
The programmer qualifies the shared user-defined objects with volatile.
Those objects can be manipulated only by applying a const_cast. Finally,
a helper object can ensure that the const_cast is performed only in
conjunction with locking a synchronization object. Effectively, an
object's type (volatile-qualified or not) depends on whether its
corresponding synchronization object is locked or not. The main
caveat of the technique is that the use of the obscure volatile
qualifier might appear confusing to the unwitting maintainer.
</quote>
4. Andrei Alexandrescu, "volatile: Multithreaded Programmer's Best
Friend," C/C++ Users Journal, February 2001.
regards,
alexander.
>> If the variable's storage is in a device, it must be volatile, but that
>> has nothing to do with threads. If the variable is in another thread
>> that does nothing but compute new values for that variable, then you need
>> to obey the memory visibility rules that your platform sets out.
>
> Define "memory visibility" ?
>
> On every platform I'm aware of, if a CPU performs an (atomic) memory
> write, the value written is _eventually_ visible to other CPUs.
Yes. For some definition of "eventually". ;-)
But that says nothing of the SEQUENCE, and often that's more important...
> Moreover, on every platform I'm aware of, the sequence of values read
> from a single memory location is a (not necessarily proper)
> subsequence of the sequence of values written, i.e. no CPU can observe
> values occuring in the opposite order.
This is simply wrong. X86 doesn't reorder anything. SPARC (normally)
reorders writes but not reads (so a barrier on the writer side is enough).
But it's not true for Alpha and it's not true for IPF. You need a barrier
(or fence) on BOTH sides of the transaction. That's the essential bug in
the double-checked initialization. All the mutexing and volatility and
everything else in the writer does absolutely no good for the poor thread
who comes along later and reads "initialization done" before it can see all
the initialized data.
The ONLY correct and portable solution short of proper POSIX synchronization
is a barrier/fence between reading the "initialized" flag and any access to
data the presence of which the flag is intended to indicate.
"DCL" initialization code is inherently nonportable. Period. If your
definition of "correct" is proper operation on the architectures on which
it operates properly, then fine... it's "correct but nonportable". If you
don't like that tautology, then you have to accept that it's "incorrect".
> These, along with atimicity of reads/writes if sufficient for the
> above examples to work. Note that I don't claim (and have _never_
> claimed) they are necessary.
>
>> > c) memory ordering - CPU can reorder memory accesses, which is
>> >prevented by memory barriers, but the compiler can reorder memory
>> >accesses too, which is prevented by volatile.
>>
>> This is not an example.
>
> This is not an argument.
"Yes it is." (If you're not a fan of Monty Python, or don't know who they
are, just forget I said that...)
>> Using volatile in an attempt to achieve thread
>> safety *WILL NOT WORK* on all platforms. Period.
>
> This is not an argument either.
>
>> Further, there is
>> always a way to do the same job (thread safety of data) without using
>> volatile.
>
> Most probably. So, what?
>
> Correctness first, but after all we want performance too, don't we ?
How do you define "correct"? If you violate the POSIX memory model (as your
double-checked initialization variable does), then your code MAY be correct
on some particular processor models, and it may have the best performance
on those processors; but it is not portable. In terms of the standard, and
to many people writing here, nonportable is not correct. This dichotomy can
lead to endless and pointless arguments. ("... No it can't!"; "Yes it
can!")
> On modern SMP architectures every write to concurently accessed memory
> is a potential bottleneck - including things like
> pthread_mutex_lock/unlock.
> Note that the same applies to pthread_rwlock_rdlock too as it performs
> a memory write too. Your best bet after making all the efforts to
> have no concurrently accessed locations is to perform mostly reads
> there and avoid as hell pthread_ synchronization functions (or any
> sync functions for that matter).
People still tend to count instruction cycles to measure "performance"; but
you're right... in most modern systems all that really matters is memory
references, and particularly writes. Write conflicts can hash up everything
by filling the cache coherency channels -- and because those are oriented
towards cache lines rather than "variables", and even with multi-way
associativity you can still get widely space data competing for a single
cache line, the effects are difficult to predict or control.
Still, you avoid synchronization only as much as possible, and no more. One
of the inputs to your decision must be the consideration of how much you're
willing to rewrite/redesign your code when porting to a platform that's
more agressive. If you stick with synchronization for memory visibility and
sequence, you're safe on any conforming platform. If you skip "the rules"
and roll your own architecture-specific code, you're going to miss
something when you port the code, and it's going to blow up in weird ways
that will waste inordinate amounts of time.
Write it correct and portable first. Then ANALYZE what's really happening
and optimize only where needed... and where the loss of portability is
worth the payback.
--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/
> This is simply wrong. X86 doesn't reorder anything. SPARC (normally)
> reorders writes but not reads (so a barrier on the writer side is enough).
Even with those or similar processors... Wouldn't it be possible that a
largish machine has a bus which reorders memory accesses independently of
what the CPU is capable of? Or the memory interface has to keep the same
promise as the CPU?
It's not quite as bad as it might seem, really.
Following the C/C++ rules will generally lead even a fairly agressively
optimizing compiler to "do the right thing" automatically without any
explicit awareness of POSIX synchronization or memory barrier builtins.
It's HARD to optimize across function calls using the variable visibility
rules. I've dealt with some highly agressive optimizing compilers, and I've
never yet heard of a compiler that could figure out how to break the POSIX
memory rules in a conforming POSIX threaded program. (Or even an "extended"
program using direct memory barriers.) That, of course, isn't meant to be a
guarantee that such compilers can't (or even don't) exist.
However, as Patrick says, any compiler that COULD do this sort of
optimization in a way that could break POSIX semantics, and was intended to
support a POSIX conforming threading environment, would simply have to do
whatever was necessary to make sure the generated code would work. That
might be just as difficult as breaking the rules in the first place, but it
doesn't matter -- it's an unyielding requirement.