Is volatile necessary [with multiple threads] anymore? I have a two-thread
piece of code I've been
testing to figure out what volatile does (fairly simple code, uses
pthreads). I have an update thread (variables passed as volatile) and a
print thread (one variable volatile, the other, not). There is no difference
in the behavior of the volatile and nonvolatile thread.
I'm compiling this with gcc, using the -O2 and -pthreads flags.
The sudocode is at the end. The result I'm getting is that the correct
memory addresses and values are being printed by the volatile and
non-volatile variable. If I understand things correctly, the non-volatile
variables should give the wrong addresses at least half the time. Is GCC
just smart enough to handle this, or am I completely misunderstanding
things. I can provide the code, but I'd rather not clutter up the list with
it, without a request.
OS/hardware: FreeBSD 7.0/i386/SMP/2 Cores (AMD)
Compiler: GCC 4.2.1
Thanks,
-Jim Stapleton
* two pointers (a, b) are global (non volatile) variables. Each is of type
(int*)
* each is malloc'ed and the value in the allocated memory set to the value
of a counter variable (0).
* A global volatile turn variable is set to 0 (0 = print, 1 = update).
* the print and update functions are called.
* print function
** output that the print function was called
** call the internal print function with a non-volatile pointer to a
(int**), and a volatile pointer to b (volatile int**)
** output that the print function is closing
** exit
* internal print function
** wait until turn = 0
** loop until count = 5
*** print the memory addresses a & b point to, and the values stored therein
*** set turn to 1
*** wait until turn = 0
** exit
* update function
** output that the update function was called
** call the internal update function with a volatile pointer to a (volatile
int**), and a volatile pointer to b (volatile int**)
** output that the update function is closing
** exit
* internal update function
** wait until turn = 1
** loop until count = 5
*** increment count
*** assign integer pointers (int*) t1 and t2 to the memory values pointed to
by the arguments (eqiv to t1 = a, t2 = b)
*** malloc new memory to a and b (sizeof(int)), and set the value to the
same as count.
*** print out eqiv: "*t1 (t1) -> *a (a) *t2 (t2) -> *b (b)"
*** free t1 and t2
*** set turn = 0
*** wait until turn = 1
** exit
It is almost never necessary, and that hasn't changed. "volatile" is
used when accessing memory-mapped hardware, variables that may be
changed by async signal handlers, and under certain scenarios with
setjmp/longjmp. It tells the compiler something, but doesn't tell
anything to the cpu itself and so has no effect on cpu caches.
When sharing data between threads you should use standard thread locking
mechanisms. If for whatever reason that is too expensive, then you can
use various non-portable lock-free mechanisms generally involving cpu
barriers behind the scenes.
Chris
Sorry, I think I replied to you and not the group earlier. Anyway, here's a
more concise description.
ok, I am using locks, but I can't get to them instatly. This sampe shows you
the important things (there are a few more variables in the struct to make
the rwlock 'safe'). The problem is, if the value is changed, the struct may
be reallocated (if the memory requirements change). I'm worried that the
time it takes to obtain the lock reference may be enough to cause a memory
error. Look at the third macro in particular.
Thanks,
-Jim Stapleton
sample code:
========================================
typedef struct CTYPLESS_STRUCT
{
int typeid;
pthread_rwlock_t lock;
char data;
} ctypeless_struct;
typedeff ctypeless_struct *ctypeless;
/*we pass a pointer to data, and not the struct, so data can be accessed
immediately - mostly for non-threaded environments, or to allow a programmer
to access the data quickly if he/she knows it won't be written at the time
of access*/
#define GET_PASSED_VALUE( X ) \
(ctypeless) ((void*)X + offsetof(ctypeless_struct, data))
#define GET_STRUCT_VALUE( X ) \
(ctypeless_struct) ((void*)X - offsetof(ctypeless_struct, data))
#define GET_LOCK( X ) \
(pthread_rwlock_t)*((void*)X - offsetof(ctypeless_struct, data) +
offsetof(ctypeless_struct, lock))
========================================
> Is volatile necessary [with multiple threads] anymore?
In C/C++ the volatile has role in compiler optimisations only. In
multi-threaded code it plays the same role and it is not meant for
synchronisation.
If you are using threads, you are supposed to use proper
synchronisation means (mutex or semaphore if you are working with low
level pthread library) when accessing shared data. Until you are using
non-concurrent languages like C/C++ you can use unsafe library calls
only but use them at least instead of playing around with the
volatiles which are not designed for synchronisation.
> I have a two-thread piece of code I've been
> testing to figure out what volatile does ...
I suggest you not to fine tune your algorithm for two threads because
your code will be fragile.
Best Regards,
Szabolcs
It's a test application to help myself understand how volatile and threading
interact. Fragile is partially the point. I have another reply detailing
/why/ I am writing the test application, and why I am curious about
volatile.
-Jim Stapleton
I cannot see any proper test in your pseudo code. What interleavings
are you going to test and where is the code that enforces such
interleaving?
What you provide is just some trial-and-error code but I would not
call it a test.
> Fragile is partially the point. I have another reply detailing
> /why/ I am writing the test application, and why I am curious about
> volatile.
Forget about volatile, it is just an optimisation-blocker for the
compiler, nothing else. In C/C++ the volatile does not mean atomic
although it has that meaning in Java for primitive types You need
proper synchronisation instead. Well, if you move to multi-threaded
programming, you must obey the consequences too.
Best Regards,
Szabolcs
the psudocode in the first post /was/ a test, if you assume that I am not
quite sure how things work with threading and synchronization of data
between threads works. I should say, I understand about mutexes, but there
are other points that worry me. The test was the print function that had one
volatile and one non-volatile variable that it did identical things to (a
test case and a control case).
My reply to Chris Friesen describes what I'm worried about.
> Forget about volatile, it is just an optimisation-blocker for the
> compiler, nothing else. In C/C++ the volatile does not mean atomic
> although it has that meaning in Java for primitive types You need
> proper synchronisation instead. Well, if you move to multi-threaded
>programming, you must obey the consequences too.
I understand about proper synchronization, each shared chunk of data has a
lock, and you use that lock before doing stuff with the data.
Unfortunately, with what I'm doing, the lock has to travel /with/ the data,
and I'm not sure if there is a potential for some memory oopses in the
process. My reply to Chris' post, again can detail my worries better.
-Jim Stapleton
I do not assume anything rather I called your attention that your
pseudo-code does not contain any proper test cases.
What are you going to test?
How do you test it?
How do you know that your code tests what you are interested in?
Best Regards,
Szabolcs
P.S. Volatile does not mean atomic in C/C++ hence you cannot rely on
it.
Take two threads, each shares a reference to a data value
That reference value changes in one thread, how will it affect the other?
> How do you test it?
Change the value in one thread, see the result in the other
> How do you know that your code tests what you are interested in?
It checks checks/updates variables in the same manner as would be seen in
the situation I am curious about. So it tests what I am interested in.
> P.S. Volatile does not mean atomic in C/C++ hence you cannot rely on
> it.
I understand that. I took volatile to mean "Always grab this value from
memory each use, don't trust stored register/cache values"
Actually, I don't know that much about Java, and didn't even know it had a
volatile keyword.
-Jim Stapleton
It is in any case an error to move a pthread_rwlock_t (or
pthread_mutex_t, pthread_cond_t and some other types) in memory. The
implementation of a lock might e.g. store a pointer to the lock
somewhere.
Expressed in Standardese - POSIX says at
<http://www.opengroup.org/onlinepubs/000095399/functions/pthread_rwlock_destroy.html>:
"Only the object referenced by rwlock may be used for performing
synchronization. The result of referring to copies of that object in
calls to pthread_rwlock_destroy(), pthread_rwlock_rdlock(),
pthread_rwlock_timedrdlock(), pthread_rwlock_timedwrlock(),
pthread_rwlock_tryrdlock(), pthread_rwlock_trywrlock(),
pthread_rwlock_unlock(), or pthread_rwlock_wrlock() is undefined."
> #define GET_LOCK( X ) \
> (pthread_rwlock_t)*((void*)X - offsetof(ctypeless_struct, data) +
> offsetof(ctypeless_struct, lock))
So this is wrong. You cannot reliably use the return value from this
macro for anything. It might work on your host, but it might not
on another - or on your host after next library upgrade.
Also (void*) pointer arithmetic is a gcc extension. Use (char*) instead.
(Use the -pedantic option to have it warn about such issues.)
And X in the body should be (X), in case the argument is a complex
expression.
--
Hallvard
> Unfortunately, with what I'm doing, the lock has to travel /with/ the data,
Unfortunately that is a design bug. Hint: Try to solve the problem in
a proper way so that you do not need any re-allocation of that portion
of the shared data that contain the synchronisation.
Best Regards,
Szabolcs
P.S. Moving to multi-threaded programming has consequences that you
have to respect.
I figured out that much from your pseudo-code and I was afraid that
you mean only that much.
> > How do you test it?
>
> Change the value in one thread, see the result in the other
That you should think over with respect to the different
interleavings.
> > How do you know that your code tests what you are interested in?
>
> It checks checks/updates variables in the same manner as would be seen in
> the situation I am curious about. So it tests what I am interested in.
If you have luck, yes. Nothing ensures you that you get the same
result on a different architecture with different interleavings.
> > P.S. Volatile does not mean atomic in C/C++ hence you cannot rely on
> > it.
>
> I understand that. I took volatile to mean "Always grab this value from
> memory each use, don't trust stored register/cache values"
If you use volatile you tell your compiler not to do any optimisation
with that value. But you did not tell your processor not to optimise
with that value. Forget about volatile for synchronisation if you move
to multi-threaded programming.
Best Regards,
Szabolcs
Actually, I think my method is legal, as it says, "Only the object
REFERENCED by rwlock may be used for performing synchronization." Thus the
rwlock is a reference, and not the actual object. "The result of referring
to COPIES OF THE OBJECT [...]," I am copying the rwlock - the reference, not
the object.
-Jim Stapleton
that's easy to work around.
-Jim Stapleton
Yes, you've got it backwards. Volatile prevents some optimizations
which would otherwise be allowed, but the compiler is certainly not
required to perform these optimizations without volatile. If it
doesn't, then maybe it is not smart enough, or it is smart enough but
too many user programs depends on the non-optimization so it doesn't do
it, or maybe you are not smart enough to figure out why the optimization
would not always be correct.
--
Hallvard
> Actually, I think my method is legal, as it says, "Only the object
> REFERENCED by rwlock may be used for performing synchronization."
The 'rwlock' parameter is a 'pthread_rwlock_t *'.
But as you say in next article, that's no problem if your struct holds a
pointer to a pthread_rwlock_t.
--
Hallvard
Smart and knowledgeable are two separate things. In this case the matter is
completely about knowledge, something that I am attempting to obtain. I
don't mind you being less than gentle, but please keep your harsh realities
appropriate 'ignorant' would be the correct term which I'm attempting to
alleviate myself of.
And in this case, given the number of compilers, I don't know what all of
the potential optimizations are/could be. I just read that volatile is
useful in C to help with proper access of variables, but the document didn't
describe /why/. I've been trying to figure out why since that point.
Everything I saw seemed to indicate that it ensures you aren't using a
register value for the variable. This would strike me as important if two
processors are using that variable, even if locks are stopping them from
simultaneous use, I don't know of any way to ensure the variable's value is
ensured to be copied from memory to register, or from register to memory.
-Jim Stapleton
Sorry, I mean that, it was just a cut&paste of the previous uses of
"smart" (about the compiler). Though actually "not smart enough" can be
a real problem in any case. A number of details of C's semantics are
hairy enough that even the authors of the C standard get it wrong now
and then or can't figure out exactly what it implies.
> (...) I just read that volatile is useful in C to help with proper
> access of variables, but the document didn't describe /why/. I've been
> trying to figure out why since that point. Everything I saw seemed to
> indicate that it ensures you aren't using a register value for the
> variable. (...)
More or less. Looks like you were redirected away from comp.lang.c too
early. What is means, formally, is roughly this:
- A compiled C program must behave exactly as the program spells out,
step for step, with some exceptions. E.g. if you write
i = 150;
i += 50;
then the program must behave as if it did just that rather than
i = 200;
- The compiler can optimize that to i = 5; if it can prove either that
this _will_ behave exactly the same way, or that it can only behave
differently if the behavior is not well-defined (in standardese, if
the program would have "undefined behavior" or "unspecified
behavior").
- However declaring an object volatile tells the compiler that access to
it is a "side effect" (like file I/O). Side effects must take place
just like the program spells out: Store 150, read it back (which for
all the compiler knows might give a different result than what it just
stored), then add 50 and store the result again.
- There is no absolute ordering however. The C language defines
"sequence points" at which each side effect must be complete: At ';'
and ',', at function calls, and some other places. Thus volatile
doesn't help with the infamous 'i = i++;' - that has undefined
behavior because there is no sequence point between the 'i=' and 'i++'
which says which should store 'i' first.
- Also the C language spec doesn't know about threads, or multi-core
CPUs and the like. So:
> This would strike me as important if two processors are using that
> variable, even if locks are stopping them from simultaneous use, I
> don't know of any way to ensure the variable's value is ensured to be
> copied from memory to register, or from register to memory.
Right, as far as the C language is concerned ugly things can happen anyway.
In practice, thread primitives like pthread_mutex_lock() ensure cache
coherency and the like. (E.g. ensuring that if two memory locations
are updated, two threads on different CPUs see the updates in the same order.)
Just declaring a variable volatile might not, I'm not quite sure.
http://en.wikipedia.org/wiki/Cache_coherency
Threading memory models are being actively discussed at least in the C++
community, it seems hard to nail down a definite model.
--
Hallvard
Eh. About people not being smart enough... that was an incomplete
search & replace:-)
--
Hallvard
> Smart and knowledgeable are two separate things. In this case the matter is
> completely about knowledge, something that I am attempting to obtain. I
> don't mind you being less than gentle, but please keep your harsh realities
> appropriate 'ignorant' would be the correct term which I'm attempting to
> alleviate myself of.
He meant "smart enough" in an idiomatic way. For example, "you don't
want to set a process' CPU affinity because you are not smart enough
to know which processor it will run optimally". This doesn't literally
mean you lack intelligence, it means you don't have the information
the scheduler does. In this case, "smart" is idiomatic for something
closer to "informed".
The point he's trying to make is that when you write code to a
threading standard, you aren't smart enough to know what CPUs and
compilers it's going to run on. Some of them may not even exist yet.
Yet the standard tries to guarantee that your code will work.
This means sometimes the standard has to prohibit things that will
actually work fine or require things that everything will actually
work fine without. But if you ignore these things, and the next
version of the compiler, CPU, library, or OS expects you to follow the
standard, your code will break.
So you cannot determine whether it's safe to break the rules by
breaking them and seeing what happens.
> And in this case, given the number of compilers, I don't know what all of
> the potential optimizations are/could be. I just read that volatile is
> useful in C to help with proper access of variables, but the document didn't
> describe /why/.
It's wrong, unless by "useful" you mean changing some cases that
happen not to work into cases that happen to work, on the CPUs, OSes,
and compilers the author happened to test with. That's not a sane way
to develop code.
> I've been trying to figure out why since that point.
Forget it, it's a waste of time. No threading standard in common use
specifies the use of 'volatile' is either necessary or sufficient for
synchronization or memory visibility.
> Everything I saw seemed to indicate that it ensures you aren't using a
> register value for the variable.
Correct. But that doesn't help you in any way. Posted write buffers
and prefetched reads create exactly the same issues as registers. And
'volatile' does nothing about those.
> This would strike me as important if two
> processors are using that variable, even if locks are stopping them from
> simultaneous use, I don't know of any way to ensure the variable's value is
> ensured to be copied from memory to register, or from register to memory.
Fortunately, you don't have to worry or care. Neither the Win32
threads conceptual machine nor the POSIX threads conceptual machine
even have registers. When you write code, you write it to the
conceptual machine for the threading standard you're using, and then
it will work on every actual machine that supports that standard.
In practice, modern machines typically solve this problem by flushing
all registers that contain cached copies of memory values to memory
after acquiring a lock and before releasing it. But that's not your
problem, it's the problem of the library and compiler folks.
DS
If you don't rely on locks[1], but on custom-made synchronization / atomic
operations, volatile is necessary. IMHO.
[1] POSIX mandates that accesses to shared variables must be protected by
locks, which in turn must correctly arrange the "visibility stuff".
> Volatile should force memory access, i.e. prevent any caching in registers.
The problem is that preventing caching in registers does not force
memory access. Modern CPUs have posted write buffers and may do
speculative fetches.
> If the compiler decides to hold the value in a register, there's nothing to
> tell to the CPU that the value must be made visible to other CPUs.
Same if the CPU holds the write in a posted write buffer. Same if the
CPU does the memory read early because it had some space cycles on the
memory bus.
> If you don't rely on locks[1], but on custom-made synchronization / atomic
> operations, volatile is necessary. IMHO.
> [1] POSIX mandates that accesses to shared variables must be protected by
> locks, which in turn must correctly arrange the "visibility stuff".
Nonsense. I've made many custom synchronizations functions and that do
not have a single 'volatile' among them. In fact, 'volatile' wouldn't
help in any way, since it doesn't have precisely-defined multi-
processor semantics.
For example:
static __inline int InterlockedExchange(int *ta, int val)
{
int rtv;
__asm__ __volatile__(
"xchgl %0,%1"
: "=r" (rtv), "+m" (*ta) // outputs
: "0" (val) // inputs
: "memory");
return rtv;
}
The 'volatile' here is not the C language 'volatile' but a completely
different semantic with the same time.
So how do you think 'volatile' would help me here? This is my custom-
made synchronization function. According to you, 'volatile' is
"necessary". Which variable should I made volatile, in your opinion?
DS
which gets to part of my worry:
If I use a lock to protect the variable, then I am guaranteed to have the
most accurate memory value (not just register/cache value) for that
variable? In that case, since locks are only associated to variables
contractually, not programmatically - I assume it ensures all variables are
synced the first time you use them after obtaining a lock?
Thanks,
-Jim Stapleton
> which gets to part of my worry:
> If I use a lock to protect the variable, then I am guaranteed to have the
> most accurate memory value (not just register/cache value) for that
> variable? In that case, since locks are only associated to variables
> contractually, not programmatically - I assume it ensures all variables are
> synced the first time you use them after obtaining a lock?
Almost. At least in POSIX land, when you release a lock, you are
guaranteed that any thread that subsequently acquires that same lock
will then see any changes you made before you released the lock. In
other words, releasing a lock 'exports' any changes to memory that
you've made, and acquiring a lock 'imports' any changes exported by
that same lock.
I believe every sensible threading library provides this same
guarantee at minimum. To my knowledge, none of them actually require
it to be the same lock. They just have a number of operations that are
defined to synchronize memory.
For example, if a thread makes some changes and then creates another
thread, the new thread is guaranteed to see all changes made by the
creating thread prior to the creation. This is also provided on just
about every threading implementation.
DS
>
> Nonsense. I've made many custom synchronizations functions and that do
> not have a single 'volatile' among them. In fact, 'volatile' wouldn't
> help in any way, since it doesn't have precisely-defined multi-
> processor semantics.
>
OK, synchronization was a bad word. For example:
static unsigned x;
void thread1(void)
{
unsigned local = x;
while(local+10 > x)
++y;
}
void thread2(void)
{
while(1)
atomic_inc_uint(&x); // also arranges global visibility
}
Let's say that the above is a complete compilation unit, the variable x is
static and the compiler can prove that its address is not exported to other
code via some pointer, and that thread1 does not modify x on its own. Why is
the compiler, even without volatile, not allowed to cache x in a register in
thread1?
> On 2008-07-11, David Schwartz <dav...@webmaster.com> wrote:
> >
> > The problem is that preventing caching in registers does not force
> > memory access. Modern CPUs have posted write buffers and may do
> > speculative fetches.
> To put it in another way: volatile is necessary but not sufficient, because
> if a variable is cached in a register, no memory barrier is going to make it
> visible to another thread; see below.
Actually, it's not necessary. In fact, 'volatile' may disable
optimizations that are perfectly safe to use on multi-threaded code.
> > Nonsense. I've made many custom synchronizations functions and that do
> > not have a single 'volatile' among them. In fact, 'volatile' wouldn't
> > help in any way, since it doesn't have precisely-defined multi-
> > processor semantics.
> OK, synchronization was a bad word. For example:
>
> static unsigned x;
>
> void thread1(void)
> {
> unsigned local = x;
>
> while(local+10 > x)
> ++y;
> }
>
> void thread2(void)
> {
> while(1)
> atomic_inc_uint(&x); // also arranges global visibility
> }
>
> Let's say that the above is a complete compilation unit, the variable x is
> static and the compiler can prove that its address is not exported to other
> code via some pointer, and that thread1 does not modify x on its own. Why is
> the compiler, even without volatile, not allowed to cache x in a register in
> thread1?
That is one of a possible universe of problems with this code. That
'volatile' fixes this one specific potential problem does not make it
necessary or sufficient. It's not sufficient because of the universe
of other problems (the CPU may consolidate the reads if it's smart
enough). It's not necessary because every threading library in
existence provides correct solutions to this problem.
You might ask why the compiler doesn't see the 'volatile' keyword and
workaround all these other possible problems. It could put memory
barriers before and after 'volatile' accesses, for example. It could
workaround prefetches and write posting buffers. Some have argued that
compilers that don't do this are broken and that's not their problem
-- it is still right to use 'volatile' here.
But they are wrong. The 'volatile' keyword *does* have defined
semantics in the C standard. It does very specific things with signals
and longjmp, for example. Making 'volatile' also work for code like
your example would penalize every single-threaded program that uses
'volatile' *properly* for no good reason.
That is why there is *no* mainstream threading standard in existence
that specifies that 'volatile' is sufficient. And because other
sufficient means are provided, it is not necessary either.
DS
> That is why there is *no* mainstream threading standard in existence
> that specifies that 'volatile' is sufficient. And because other
> sufficient means are provided, it is not necessary either.
I realized that I should clarify this. There are threading standard
that do specify that 'volatile' is sufficient for this particular
case. However, they do not specify that it is sufficient for more
complicated cases. For example, the case where one thread writes two
variables and another thread must see the first write if it sees the
second.
DS
Zeljko Vrba wrote:
> So, what you're saying, is that in the code as the one I presented, one
> has three options: 1) implement required memory operations in architecture-
> specific code (e.g. inline asm), 2) rely (e.g. because the compiler
> docs say so) that a particular compiler for one particular architecture
> does "the right thing" with volatile variables, 3) use locks provided by
> the threading library?
Exactly. The portable C-standard specification for 'volatile' contains
no thread-safety guarantees at all. So it's neither necessary nor
sufficient.
DS
Anymore? It never was. volatile doesn't do what you think it does.
> The sudocode is at the end. The result I'm getting is that the correct
> memory addresses and values are being printed by the volatile and
> non-volatile variable. If I understand things correctly, the non-volatile
> variables should give the wrong addresses at least half the time. Is GCC
> just smart enough to handle this, or am I completely misunderstanding
> things. I can provide the code, but I'd rather not clutter up the list with
> it, without a request.
You completely misunderstand. Both the volatile and non-volatile code
should behave the same -- either you will get the right answer 0% of
the time, 100% of the time, or somewhere in between. What you get
is basically dependent on the vagaries of scheduling.
When accessing data across threads this way, you need a mutex to protect
the data. volatile doesn't enter into the picture at all.
-frank
But I thought he had to obey the consequences.
RF