volatile and win32 multithreading

Maxim Yegorushkin

unread,

Jan 24, 2005, 3:06:55 AM1/24/05

to

I'd like to get a clear statement regarding using volatile for variables
shared between threads for writing portable win32 applications.

C++ and C standards state that volatile semantics are implementation
defined, though they neither specify any threading model nor address any
threading issues at all. The relevant wording is (ISO/IEC 9899 subnote
114):

<quote>
A volatile declaration may be used to describe an object corresponding to
a memory-mapped input/output port or an object accessed by an
asynchronously interrupting function. Actions on objects so declared shall
not be optimized out by an implementation or reordered except as permitted
by the rules for evaluating expressions.
</quote>

Microsoft reading of volatile is at
http://msdn.microsoft.com/library/en-us/vclang/html/_langref_volatile.asp:

<quote>
The volatile keyword is a type qualifier used to declare that an object
can be modified in the program by something such as the operating system,
the hardware, or *a concurrently executing thread*.
</quote>

Since the quoted section does not contain a "microsoft specific" note and
the C++ and C standard do not mention threads at all, I would like to know
what was the premise for putting "a concurrently executing thread" there,
or was it just an overlook?

Platform SDK section "DLLs, Processes, and Threads" does not mention
volatile at all, although Interlocked* function declarations take volatile
pointers. Volatile pointers can be bound to non volatile variables without
a cast, so it can't be considered as a requirement to pass volatile
variables to these functions. A note in the Platform SDK docs is required
to clarify the issue. I also heard that before year 2002 Platform SDK
headers did not have volatile's in these function declarations.

OTOH, POSIX threads standard does not require volatile for thread shared
variables. For accessing such variables it requires synchronization with
memory barrier functions. POSIX threads gurus say that volatile is
irrelevant to multithreading:

http://groups-beta.google.com/group/comp.programming.threads/msg/21878c22d7775997
http://groups-beta.google.com/group/comp.std.c/msg/ae49dc7a96c625f5

IA-32 Intel® Architecture Software Developer's Manual Volume 3: System
Programming Guide says in §7.2.4 that it's not portable to access thread
shared variables without proper synchronization:

<quote>
It is recommended that software written to run on Pentium 4, Intel Xeon,
and P6 family processors assume the processor-ordering model or a weaker
memory-ordering model. The Pentium 4, Intel Xeon, and P6 family processors
do not implement a strong memory-ordering model, except when using the UC
memory type. Despite the fact that Pentium 4, Intel Xeon, and P6 family
processors support processor ordering, Intel does not guarantee that
future processors will support this model. To make software portable to
future processors, it is recommended that operating systems provide
critical region and resource control constructs and API s (application
program interfaces) based on I/O, locking, and/or serializing instructions
be used to synchronize access to shared areas of memory in
multiple-processor systems. Also, software should not depend on processor
ordering in situations where the system hardware does not support this
memory-ordering model.
</quote>

Contrary to Intel recommendations, there are samples in Platform SDK docs
and public sources that read and write volatile thread shared without any
synchronization.

So, I would like to know:

a) if volatile is sufficient for accessing shared variables between
threads without any other synchronization.
b) if volatile is required for accessing shared variables between threads
when the variables are accessed only by means of Interlocked* functions
(following Intel recommendations).

--
Maxim Yegorushkin

Tom Widmer

unread,

Jan 24, 2005, 6:37:53 AM1/24/05

to

Maxim Yegorushkin wrote:
> I'd like to get a clear statement regarding using volatile for
> variables shared between threads for writing portable win32 applications.

I think you should ask Microsoft and/or Intel directly (which is likely
to be easier if you are signed up on some kind of support or developer
contract) - you could report it as a bug in the documentation I suppose.
I doubt you'll get any more information here than you can get on
comp.programming.threads.

I note that Intel C++ has this:

/Qserialize-volatile[-]
(i64 only) Impose strict memory access ordering for volatile data
object references. When you invoke /Qserialize-volatile-, the compiler
may suppress both run-time and compile-time memory access ordering for
volatile data object references. Specifically, the .rel/.acq completers
will not be issued on referencing loads and stores.

So will volatile generate those .rel and .acq calls for volatiles on
architectures that "need" them? If so, can you switch this off?

Tom

Tom Widmer

unread,

Jan 24, 2005, 7:13:59 AM1/24/05

to

Tom Widmer wrote:
> So will volatile generate those .rel and .acq calls for volatiles on
> architectures that "need" them? If so, can you switch this off?

(That question is targetting MSVC 2003+)

Tom

Maxim Yegorushkin

unread,

Jan 24, 2005, 7:22:03 AM1/24/05

to

On Mon, 24 Jan 2005 11:37:53 +0000, Tom Widmer <tom_u...@hotmail.com>
wrote:

> Maxim Yegorushkin wrote:
>> I'd like to get a clear statement regarding using volatile for
>> variables shared between threads for writing portable win32
>> applications.
>
> I think you should ask Microsoft and/or Intel directly (which is likely
> to be easier if you are signed up on some kind of support or developer
> contract) - you could report it as a bug in the documentation I suppose.

I'm not signed to any kind of support. I hope that some of MS MVP's from
this newsgroup could get such support and post an answer here.

--
Maxim Yegorushkin

David Lowndes

unread,

Jan 24, 2005, 8:37:56 AM1/24/05

to

>I'm not signed to any kind of support. I hope that some of MS MVP's from
>this newsgroup could get such support and post an answer here.

Have a search for recent messages on these newsgroups concerning
volatile using Google news - you'll find some posts on the subject by
Doug Harrison.

Dave

Carl Daniel [VC++ MVP]

unread,

Jan 24, 2005, 9:58:59 AM1/24/05

to

Maxim Yegorushkin wrote:
> I'd like to get a clear statement regarding using volatile for
> variables shared between threads for writing portable win32 applications.

volatile is neither necessary nor sufficient for writing portable code that
shares variables between threads.

Rather, the portable solution is to surround all access to shared variables
with suitable synchronization (e.g. a Critical Section).

If you're only interested in Win32, the hardware platform guarantees
atomicity and durability of access to native types (e.g. those supported
directly by the hardware), so you can "get by" without synchronization in
some cases.

If you're interested in being 64-bit ready you should stick to the
documented mechanisms: use a critical section or other synchronization
primative, or use the InterlockedXxxxx functions to test/set shared
variables. These mechanisms are guaranteed to include any required memory
barriers to ensure that access to shared variables work correctly on
platforms other than x86-32 (e.g. IA64).

-cd

Michael K. O'Neill

unread,

Jan 24, 2005, 1:34:43 PM1/24/05

to

"Carl Daniel [VC++ MVP]" <cpdaniel_remove...@mvps.org.nospam>
wrote in message news:eaJlbViA...@TK2MSFTNGP10.phx.gbl...

...

> volatile is neither necessary nor sufficient for writing portable code
that
> shares variables between threads.
>

I agree that "volatile" is not sufficient for multithreaded code (and that
proper synchronization as you described is critical), but are you really
sure that's it's also not "necessary"?

As I understand it, "volatile" is a signal to optimizing compilers,
cautioning that they should not optimize too much (such as by re-ordering
operations or preloading into registers). Such optimizations often cause
difficulties while multi-threading. So, isn't it safer to use "volatile"
for a shared resource?

There have been quite a few (maybe too many) discussions over use of
"volatile" over at microsoft.public.vc.mfc . Here's one from last November:
http://groups-beta.google.com/group/microsoft.public.vc.mfc/browse_frm/thread/bcaa8908adbafc7e/66ca9ead549797ac ,
which mentions a white paper that includes talk about "volatile" in a driver
environment: http://www.microsoft.com/whdc/driver/kernel/MP_issues.mspx

Mike

Maxim Yegorushkin

unread,

Jan 24, 2005, 3:53:57 PM1/24/05

to

On Mon, 24 Jan 2005 13:37:56 +0000, David Lowndes <dav...@example.invalid>
wrote:

Thank you, I've found some.

--
Maxim Yegorushkin

Arnaud Debaene

unread,

Jan 25, 2005, 1:55:42 AM1/25/05

to

Michael K. O'Neill wrote:
> "Carl Daniel [VC++ MVP]"
> <cpdaniel_remove...@mvps.org.nospam>
> wrote in message news:eaJlbViA...@TK2MSFTNGP10.phx.gbl...
>
> ...
>> volatile is neither necessary nor sufficient for writing portable
>> code
> that
>> shares variables between threads.
>>
>
> I agree that "volatile" is not sufficient for multithreaded code (and
> that
> proper synchronization as you described is critical), but are you
> really
> sure that's it's also not "necessary"?

If you use proper synchronization primitives, volatile is not necessary,
since even the most optimizing compiler won't bypass those primitives...

> As I understand it, "volatile" is a signal to optimizing compilers,
> cautioning that they should not optimize too much (such as by
> re-ordering
> operations or preloading into registers). Such optimizations often
> cause
> difficulties while multi-threading. So, isn't it safer to use
> "volatile"
> for a shared resource?
>
> There have been quite a few (maybe too many) discussions over use of
> "volatile" over at microsoft.public.vc.mfc . Here's one from last
> November:
> http://groups-beta.google.com/group/microsoft.public.vc.mfc/browse_frm/thread/bcaa8908adbafc7e/66ca9ead549797ac
> , which mentions a white paper that includes talk about "volatile" in
> a driver
> environment:
> http://www.microsoft.com/whdc/driver/kernel/MP_issues.mspx

Drivers is another matter : volatile may be usefull in some limited cases
when accessing registers or others hardware related variables that may
change at any time, asynchronously. However, from this article :

<quote>
If you look at the sample drivers shipped with the Windows DDK, you will see
that volatile appears infrequently. In general, volatile is of limited use
in driver code for the following reasons:

· Using volatile prevents optimization only of the volatile
variables themselves. It does not prevent optimizations of nonvolatile
variables relative to volatile variables. For example, a write to a
nonvolatile variable that precedes a read from a volatile variable in the
source code might be moved to execute after the read.

· Using volatile does not prevent the reordering of instructions by
the processor hardware.

· Using volatile correctly is not enough on a multiprocessor system
to guarantee that all CPUs see memory accesses in the same order.

</quote>

Arnaud
MVP - VC

Doug Harrison [MVP]

unread,

Jan 25, 2005, 11:37:03 AM1/25/05

to

Michael K. O'Neill wrote:

>"Carl Daniel [VC++ MVP]" <cpdaniel_remove...@mvps.org.nospam>
>wrote in message news:eaJlbViA...@TK2MSFTNGP10.phx.gbl...
>
>...
>> volatile is neither necessary nor sufficient for writing portable code
>that
>> shares variables between threads.
>>
>
>I agree that "volatile" is not sufficient for multithreaded code (and that
>proper synchronization as you described is critical), but are you really
>sure that's it's also not "necessary"?
>
>As I understand it, "volatile" is a signal to optimizing compilers,
>cautioning that they should not optimize too much (such as by re-ordering
>operations or preloading into registers). Such optimizations often cause
>difficulties while multi-threading. So, isn't it safer to use "volatile"
>for a shared resource?

Suppose your shared resource is a std::vector. Declare it volatile. Now try
to call member functions on it. You don't get very far, do you? How do you
fix it? Cast volatile away? If you do, then you're back in the same boat as
you were when you didn't declare it volatile, except your code is now even
more undefined, because you're accessing a volatile object through a
non-volatile lvalue. No one I know writes volatile versions of member
functions, so if volatile is necessary for MT programs, practically no C++
class ever written can be used in a thread-safe way without a massive,
intrusive rewrite.

What about simple types like int? While volatile will preserve ordering at
the compiler level of operations on volatile variables, it's typically
implemented such that it has no effect at the hardware level, which makes it
insufficient for SMP systems such as Itanium. (That is, it doesn't have
memory barrier semantics.) Here's an excerpt (mildly edited) from an earlier
message I wrote about the uses of this sort of volatile.

***** begin excerpt

I can think of four valid uses for volatile, all more or less related to the
idea that volatile suppresses optimizations and forces the compiler to go to
memory for each read and write to a volatile object.

1. On amenable hardware like x86, this idiom can work, and it requires
volatile to prevent the compiler from caching x in a register in the loop:

// The variable x is accessible to multiple threads, one or more of
// which change it and indirectly stop the loop below, which is
// executing in another thread. Note that x must always be accessed
// atomically for this sort of thing to work in general.
volatile bool x = true;

while (x)
whatever;

// Assuming everyone observes the locking protocol, the following approach
// would work on all hardware, even weird architectures like Sparc,
// Itanium, and others which require memory barriers. You could also use
// InterlockedXXX on Windows instead of a mutex, though that does
// limit the type of the variable x to LONG.

bool x = true; // As above, just non-volatile

for (;;)
{
lock(mx);
bool y = x;
unlock(mx);
if (!y)
break;
whatever;
}

2. You need volatile to avoid undefined behavior in certain signal handler
and setjmp/longjmp scenarios.

3. You can use volatile, say, to declare a pointer to a volatile int, which
represents a memory location updated at the interrupt level, something you
can't synchronize with, and which is outside the compiler's knowledge of the
program.

4. You can use volatile to prevent the compiler from optimizing away
operations which, as far as the compiler is concerned, seem to have no
purpose, because it can't see that the results are used or that the side
effects of computing the results are important, e.g. a timing loop.

***** end excerpt

Here's an excerpt (mildly edited) from a message I wrote about compiler
optimizations, which explains why a goodly amount of correct behavior comes
for free, at least in Windows, where the locking operations invoke functions
in opaque DLLs. It talks about global variables, but it can also apply to
local variables that have been passed by reference (or address) to other
functions.

***** begin excerpt

Here's a simplistic explanation which is probably not far off the mark
for VC. The mutex lock/unlock operations are function calls, and the
compiler knows nothing about these functions, so it can't do any
interprocedural optimization. Global variables are reachable through
functions called by the current function, including lock/unlock. The
compiler can't see into the lock/unlock functions to determine that they
don't access the globals or call other functions which ultimately do access
them. Thus, when you have the sequence below, for non-volatile, global
variables x and y:

m.lock();
y = x;
x = 2;
m.unlock();

The compiler cannot optimize the assignment to x out of existence, because
it can't tell that unlock() won't refer to x. It can't move the y and x
assignments before or after the lock/unlock calls, because that can change
the values those functions observe. It can't cache the value of x, call
lock(), and assign the cached value to y, because lock() may have modified
x. Before calling unlock(), it must flush x and y out of registers to
memory, so that unlock() will observe their current values. And so on. The
only way I know to screw this up is to write to the variables outside of the
critical section, but that's a violation of the locking protocol. So at the
compiler level, the variables don't need to be volatile.

What about code executed strictly inside the critical section? Optimizations
performed there on x and y don't matter, because other threads are observing
the locking protocol, so no one else is accessing them when a thread is
inside a critical section involving them. If another thread, or, say, an OS
interrupt handler is concurrently (and I hope atomically) accessing x or y,
you should hope EVERYone is using the InterlockedXXX functions, because that
will be well-defined, and it won't matter if the variables are declared
volatile or not. But if ANYone is accessing the variables pell-mell, you're
back to indeterminate behavior that depends on the architecture of your
computer, and volatile may or may not be sufficient. It depends on how your
compiler implements it. Compilers typically don't confer memory barrier
semantics on volatile, and this sort of volatile is not sufficient for
machines that require them.

In addition to providing mutual exclusion, the mutex lock/unlock operations
issue whatever memory barrier instructions are necessary, so that the writes
are visible to other threads observing the locking protocol. So at the
hardware level, there's no need for the variables to be volatile, assuming
volatile implies MB instructions (which it currently does not in VC++),
because they're implicit in the mutex lock/unlock operations.

(NB: A compiler which can see into the locking operations might have to mark
them somehow to suppress optimizations which can violate the expected
semantics. There's no other reasonable choice for a compiler intended to be
used for MT programming.)

***** end excerpt

>There have been quite a few (maybe too many) discussions over use of

>"volatile" over at microsoft.public.vc.mfc. Here's one from last November:

>http://groups-beta.google.com/group/microsoft.public.vc.mfc/browse_frm/thread/bcaa8908adbafc7e/66ca9ead549797ac ,
>which mentions a white paper that includes talk about "volatile" in a driver
>environment: http://www.microsoft.com/whdc/driver/kernel/MP_issues.mspx

I agree the MS whitepaper is worth reading. As for the message you linked
to, it gets my early vote for "Most Accidentally Ironic Message of the 21st
Century". :)

--
Doug Harrison
Microsoft MVP - Visual C++

Michael K. O'Neill

unread,

Jan 25, 2005, 2:11:47 PM1/25/05

to

"Doug Harrison [MVP]" <d...@mvps.org> wrote in message
news:oqscv01q8qgooo43i...@4ax.com...
...

> I agree the MS whitepaper is worth reading. As for the message you linked
> to, it gets my early vote for "Most Accidentally Ironic Message of the
21st
> Century". :)

Thanks for your thorough reply. I have seen some of those comments before
(guess where <g!>) and I admit that I still do not fully understand all of
the points you make.

But I do understand your comment on "most ironic message" <g>

Mike

Jerry Coffin

unread,

Jan 25, 2005, 10:52:53 PM1/25/05

to

In article <esk28GxA...@TK2MSFTNGP15.phx.gbl>, MikeAThon2000
@nospam.hotmail.com says...

[ ... ]

> Thanks for your thorough reply. I have seen some of those comments before
> (guess where <g!>) and I admit that I still do not fully understand all of
> the points you make.

Perhaps it would help to step back for a moment.

Volatile makes the compiler produce code that reads from memory when
the value is read and writes to memory when the value is written. The
problems arise primarily due to caching: when the code attempts to
read or write memory, it will typically only REALLY read from/write
to the cache. What goes into the cache doesn't normally get written
out to memory immediately at all.

In fact, the processor typically attempts to keep things in the cache
as long as it can. It will flush something out to memory only when it
needs to. The primary reason is that the cache is full, and it needs
to make room to load something new. In this case, attempts to find
the least recently used line in the right part of the cache, and
flushes it out to memory. There are two problems with this: first of
all, keeping track of the time any given item is used takes too much
space, so it really only takes a guess at least-recently used.
Second, an item at any particular address always gets put into one of
a (fairly small) number of specific places in the cache.

Assume I have two volatile variables X and Y. I update them in that
order, and since they're volatile, I assume other processors will see
the updates in that order as well. Having finished that, I do
something else that reads from A, B, C and D. Just for the sake of
argument, we'll assume this is a four-way set-associative cache, and
that A, B, C and D all happen to map to the same cache lines as Y
did. Since we read A, B and C after we updated Y, when we read D, Y
is now the oldest item in those cache lines, so it gets flushed out
to memory. Meanwhile, we haven't done anything that touches the part
of the cache that X is in, so it hasn't been written out yet, and we
haven't even a good idea of when it will be.

Worse: memory is already a bottleneck in most situations. When you
add more processors, updating lots of things to memory becomes even
more of a bottleneck. Therefore, the more processors you want to
support, the harder you work at keeping things in caches, and the
more you (usually) relax how up-to-date you keep memory. Now that
processor clock speeds aren't going up constantly like they used to,
we can expect nearly all machines to start to have more and more
processors. Code that can't take advantage of them will be considered
poor, and code that works incorrectly on an MP machine will become
essentially unusable.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Maxim Yegorushkin

unread,

Jan 27, 2005, 1:59:00 PM1/27/05

to

On Tue, 25 Jan 2005 07:55:42 +0100, Arnaud Debaene
<adeb...@club-internet.fr> wrote:

[]

> If you use proper synchronization primitives, volatile is not necessary,
> since even the most optimizing compiler won't bypass those primitives...

This is exactly the answer I wanted to see ;)

POSIX explicitly states that.

But I could not find such statement in MSDN to persuade my fellow windoze
developers to stop using freaking volatile with shared variables that are
only ever touched by Interlocked* functions or are protected by memory
barriers. They argumentation was like "Interlocked* takes a pointer to
volatile, so that means I should make my shared variables volatiles,
blah-blah-blah...; a super optimizing compiler may cache variables in
registers and may not reread variables after a call to an opaque function
such as EnterCriticalSection() unless that variables are volatile,
yak-yak-yak..." And their super argument was using /Oa switch which makes
their programs "16% faster" but their programs crashed without volatile.

So, the actual reason for my post was to find adequate argumentation. The
paper at
http://www.microsoft.com/whdc/driver/kernel/MP_issues.mspx (hopefully)
helped my fellow windoze developers.

--
Maxim Yegorushkin

Doug Harrison [MVP]

unread,

Jan 27, 2005, 2:43:31 PM1/27/05

to

Maxim Yegorushkin wrote:

>But I could not find such statement in MSDN to persuade my fellow windoze
>developers to stop using freaking volatile with shared variables that are
>only ever touched by Interlocked* functions or are protected by memory
>barriers. They argumentation was like "Interlocked* takes a pointer to
>volatile, so that means I should make my shared variables volatiles,

The InterlockedXXX functions are interesting. Just a few years ago, they
took plain old LPLONG parameters. Then they changed to "LPLONG volatile",
which is a useless change. Nowadays, they take "volatile LONG*", and I'm not
sure why. When I looked at this recently, I thought maybe it's to suppress
optimizations in the face of the intrinsic versions of these functions,
which on x86, replace function calls with inline assembly instructions that
use the lock prefix. (That was just a total guess, and some simple tests
I've done since then suggest that's not the reason.) Then I noticed the
following in MSDN:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclang/html/vclrf_InterlockedDecrement.asp

volatile LONG data = 1;
...
while (data < 100)
{
if (data < 100)
{
InterlockedIncrement(&data);
printf("Thread %d: %d\n", threadNum, data);
}
Sleep(time); // wait up to half of a second
}

Now I think it's just to enable people to more conveniently write wrong MT
code, that sometimes sorta works, especially on x86. Using an older SDK, one
would have to cast volatile away to call InterlockedIncrement, which could
be considered a bother.

>blah-blah-blah...; a super optimizing compiler may cache variables in
>registers and may not reread variables after a call to an opaque function
>such as EnterCriticalSection() unless that variables are volatile,
>yak-yak-yak..."

That doesn't follow at all for variables that may be shared between threads.
I gave an example to illustrate limitations on such optimizations in another
message in this thread. If anyone has a counter-example, I'd love to see it.

>And their super argument was using /Oa switch which makes
>their programs "16% faster" but their programs crashed without volatile.

They should be aware that Whidbey does away with the very dangerous /Oa
option. The last time I used /Oa was with MSC 5.1, back in the dark ages of
DOS. Needless to say, I wasn't writing multithreaded programs at the time,
and applied indiscriminately, /Oa is likely to break most every program ever
written, including single-threaded programs. Someone needs to tell them,
"Fast but wrong is not a virtue." :)