C++, volatile member functions, and threads

Eric M. Hopper

unread,

Jun 28, 1997, 3:00:00 AM6/28/97

to

After realizing that you could make member functions volatile as
well as const, it occured to me that 'thread-safe' member functions
ought to be marked as volatile, and that objects shared between several
threads ought to be declared as volatile.

Perhaps the ability to declare member functions volatile is
limited to IBMs C++/Set for OS/2, or g++ 2.7.2.1, but I would guess not.

My reasoning on this is that if you have a member function that
can be called by multiple threads, it should treat the member variables
as volatile (because multiple threads might be accessing them) until it
obtains a mutex or lock of some sort on the member variables.

Declaring objects as volatile that are accessed by multiple
threads is a simple extension of declaring variables of basic types to
be volatile when they are accessed by multiple threads.

This doesn't seem to be current practice. Is there a good
reason for this that I'm missing, or is it simply an oversight?

Have fun (if at all possible),
--
This space for rent. Prospective tenants must be short, bright, and
witty. No pets allowed. If you wish to live here, or know someone who
does, you may contact me (Eric Hopper) via E-Mail at
hop...@omnifarious.mn.org
-- Eric Hopper, owner and caretaker.

Bryan O'Sullivan

unread,

Jun 28, 1997, 3:00:00 AM6/28/97

to

e> After realizing that you could make member functions volatile as
e> well as const, it occured to me that 'thread-safe' member functions
e> ought to be marked as volatile, and that objects shared between
e> several threads ought to be declared as volatile.

No and no.

e> My reasoning on this is that if you have a member function that can
e> be called by multiple threads, it should treat the member variables
e> as volatile (because multiple threads might be accessing them)
e> until it obtains a mutex or lock of some sort on the member
e> variables.

Your code should not be inspecting or changing the values of variables
in any way unless you have obtained mutexes on those variables. The
semantics of mutexes under POSIX threads, at least, guarantee that the
values of your variables will be sane once you obtain appropriate
mutexes (this is a very short gloss over what really goes on, but it's
the right basic idea).

Declaring your variables volatile will have no useful effect, and will
simply cause your code to run a *lot* slower when you turn on
optimisation in your compiler.

<b

--
Let us pray:
What a Great System. b...@eng.sun.com
Please Do Not Crash. b...@serpentine.com
^G^IP@P6 http://www.serpentine.com/~bos

Eric M. Hopper

unread,

Jun 29, 1997, 3:00:00 AM6/29/97

to

Bryan O'Sullivan wrote:

>
> Eric Hopper wrote:
>
>> After realizing that you could make member functions volatile as

>> well as const, it occured to me that 'thread-safe' member functions

>> ought to be marked as volatile, and that objects shared between

>> several threads ought to be declared as volatile.
>
> No and no.

*chuckle* I see. I disagree, but will state my reasons below.

>> My reasoning on this is that if you have a member function that can

>> be called by multiple threads, it should treat the member variables

>> as volatile (because multiple threads might be accessing them)

>> until it obtains a mutex or lock of some sort on the member

>> variables.
>
> Your code should not be inspecting or changing the values of variables
> in any way unless you have obtained mutexes on those variables. The
> semantics of mutexes under POSIX threads, at least, guarantee that the
> values of your variables will be sane once you obtain appropriate
> mutexes (this is a very short gloss over what really goes on, but it's
> the right basic idea).

Let me provide an example. It, perhaps, isn't the best example,
but I think it illustrates my point:

-------------------------
class TS_RefCounter {
public:
// The following must be a type that the compiler can generate a
// single instruction fetch for.
typedef unsigned int count_t;

inline void AddReference() volatile;
inline void DelReference() volatile;
inline count_t NumReferences() volatile;

private:
count_t _refct;
mutex_t _mutex;
};

inline void TS_RefCounter::AddReference() volatile
{
if (NumReferences() <= 1) {
// Reference count <= 1, so only owned by one thread.
_refct++;
} else {
mutex_lock(&_mutex);
_refct++;
mutex_unlock(&_mutex);
}
}

inline void TS_RefCounter::DelReference() volatile
{
count_t local_refct = _refct;

if (local_refct <= 1) {
// Reference count <= 1, so only owned by one thread.
if (local_refct > 0) {
_refct--;
}
} else {
mutex_lock(&_mutex);
_refct--;
mutex_unlock(&_mutex);
}
}

// No mutex lock needed. Only reading value with atomic instruction.
inline count_t TS_RefCounter::NumReferences() volatile
{
return(_refct);
}
-------------------------

This use can be attacked for one major reason. Relying on
unsigned int to be a type that is read atomically by the CPU is a shaky
assumption that happens to be true on a large number of platforms. The
code has the possibility of being brittle for this reason. Also, the
trick of not obtaining a mutex when the reference count indicates that
more than one thread 'can't' own the object is also possibly somewhat
error-prone unless the reference count is strictly maintained.

Another case in which you might want to do this is when you want
to have the same class work in both threaded and non-threaded contexts.
You could overload the public and protected member functions on whether
or not they were volatile. The volatile ones could just obtain a mutex
and call the non-volatile version after a const_cast to get rid of the
volatile.

> Declaring your variables volatile will have no useful effect, and will
> simply cause your code to run a *lot* slower when you turn on
> optimisation in your compiler.

*nod* That's why you should use const_cast, and call
non-volatile versions after you've obtained a mutex.

Dave Butenhof

unread,

Jun 30, 1997, 3:00:00 AM6/30/97

to

Eric M. Hopper wrote:
>
> Bryan O'Sullivan wrote:
> >
> > Eric Hopper wrote:
> >
> >> After realizing that you could make member functions volatile as
> >> well as const, it occured to me that 'thread-safe' member functions
> >> ought to be marked as volatile, and that objects shared between
> >> several threads ought to be declared as volatile.
> >
> > No and no.
>
> *chuckle* I see. I disagree, but will state my reasons below.

Sorry, but Bryan's right.

> Let me provide an example. It, perhaps, isn't the best example,
> but I think it illustrates my point:
>

> inline void TS_RefCounter::AddReference() volatile
> {
> if (NumReferences() <= 1) {
> // Reference count <= 1, so only owned by one thread.
> _refct++;
> } else {
> mutex_lock(&_mutex);
> _refct++;
> mutex_unlock(&_mutex);
> }
> }
>

> This use can be attacked for one major reason. Relying on
> unsigned int to be a type that is read atomically by the CPU is a shaky
> assumption that happens to be true on a large number of platforms. The
> code has the possibility of being brittle for this reason. Also, the
> trick of not obtaining a mutex when the reference count indicates that
> more than one thread 'can't' own the object is also possibly somewhat
> error-prone unless the reference count is strictly maintained.

OK, "brittle". But you've underestimated how "brittle" the code is. It's
not just a matter of assuming that the read is atomic. You're assuming
that the compiler will generate SMP-atomic fetch/increment/store
sequences. Not likely! Plus, you've actually got a fetch/test
(NumReferences) followed by a separate fetch/increment/store -- the
entire sequence would have to be atomic for this transition from
"unthreaded" to "threaded" to work correctly (and, if it was, you
wouldn't need the mutex anyway). Your final caveat is absolutely
correct, and makes the entire adventure pointless -- "somewhat

error-prone unless the reference count is strictly maintained".

Absolutely. But you can only "strictly maintain" the reference count by
using a mutex, every time.

Two threads may both call AddReference simultaneously, both see _refct
<= 1, both increment _refct, and both store a new (incremented) value.
Your refct is then 2 (for example) when it should be 3. You'll now
switch into mutexed mode for the next reference -- but your reference
count is already wrong. (And you'll incorrectly switch back out of
mutexed mode after the next dereference.) This is possible on a
uniprocessor, not just on a multiprocessor, because your thread could be
timesliced between the fetch and the store.

Making the variable "volatile" doesn't help, but it does prevent the
compiler from doing its job.

> Another case in which you might want to do this is when you want
> to have the same class work in both threaded and non-threaded contexts.
> You could overload the public and protected member functions on whether
> or not they were volatile. The volatile ones could just obtain a mutex
> and call the non-volatile version after a const_cast to get rid of the
> volatile.

If you're writing an APPLICATION, (not a library), and you want to
choose whether to use threads at runtime, then, sure, you can use
non-thread-safe versions for non-threaded runs. But the non-threaded
versions don't need volatile and volatile won't help the threaded
versions.

Don't try this with a LIBRARY, though, (or an application that supports
callbacks of some sort from a library), because in this case your caller
is in charge of whether you're threaded, and you can't necessarily even
tell.

Note that Digital UNIX and Solaris (at least) provide ways to help you
write thread-safe libraries that minimize the overhead of
synchronization when running in a non-threaded process. Solaris provides
thread entry points in libc that are "stubs" -- just returning
immediately. The entry point symbols are preempted by the read thread
functions when the thread libraries are activated. (At least if they're
activated in the right order.) Digital UNIX takes a different route,
with a "TIS" (thread-independent services) API in libc, which has
non-thread stubs that are revectored dynamically into the thread library
when it initializes. (This works even for dynamic activation, and a
mutex, for example, that's locked by the stub tis_mutex_lock will still
be locked after activating the thread library, and can then be unlocked
successfully by the initial thread.)

/---------------------------[ Dave Butenhof ]--------------------------\
| Digital Equipment Corporation bute...@zko.dec.com |
| 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof |
| Nashua NH 03062-2698 http://www.awl.com/cp/butenhof/posix.html |
\-----------------[ Better Living Through Concurrency ]----------------/

Patrick TJ McPhee

unread,

Jul 1, 1997, 3:00:00 AM7/1/97

to

In article <33B791...@zko.dec.com>,
Dave Butenhof <bute...@zko.dec.com> wrote:
% Eric M. Hopper wrote:
% >
% > Bryan O'Sullivan wrote:
% > >
% > > Eric Hopper wrote:
% > >
% > >> After realizing that you could make member functions volatile as
% > >> well as const, it occured to me that 'thread-safe' member functions
% > >> ought to be marked as volatile, and that objects shared between
% > >> several threads ought to be declared as volatile.
% > >
% > > No and no.
% >
% > *chuckle* I see. I disagree, but will state my reasons below.
%
% Sorry, but Bryan's right.

Just to throw in my hundreds if not thousands of dollars worth, Eric's
idea has some merit, and Bryan did not give any particular reason
for disagreeing. All he said was `no, use mutexes, and volatile just
stops the compiler from optimising', which really isn't a correct
answer to Eric's suggestion. So, Bryan is wrong.

Now, it seems to me that Eric gave an example, which I've deleted but
it's probably still on your news server if you care to look, in which
he tried to demonstrate what he was talking about, but failed because
his example was not `thread safe'. Dave, in the message I'm replying to,
got caught up in pointing out that the example wasn't very good, and
ignored the original question.

Eric's idea will work if you design the class correctly. It cannot have
any public member variables, otherwise they could be accessed in a
non-thread-safe way. Any member functions you mark as volatile
have got to really be thread-safe. This means they have to be either
static or wrapped in mutexes.

I guess the advantage of this would be that, if you have some class in
which there are non-thread-safe methods, but there are also thread-safe
methods which would be useful in a global instance of the class, you
wouldn't have to worry about putting extra mutexes around calls to the
class, since the compiler will stop you from using the unsafe methods.

The disadvantage would be that you couldn't use any of the non-thread-safe
methods at all, and if there's some place where you use the class a lot,
you'd be going down on the mutex, then up, then down, then up, then
down and down and up and down and down and up and up and up, then down,
up, down, down, up up... I'm lost now, but that has to be more expensive
than going down on the mutex once, using the class a bunch, then going up
again.

Furthermore, if your actual global variable is really a pointer, this
won't give you anything -- you'll still have to wrap mutexes around
every access to the pointer.

I think Eric's idea is not bad, but probably won't buy you much, and
could easily lull you into a false sense of security. On the other
hand, since I've now suggested that everyone else who's contributed
to this thread is wrong in one way or another and I want to avoid
suggesting that I think _I'm_ right, what the hell is the volatile
keyword for if not for this situation? Isn't it to prevent a compiler
from doing something like this:

myhappyclass x;

myhappyfunction()
{
int y;
x.down();
y = x.y;
x.up();

/* while this is going on, another thread changes x.y */
blahblahblah(y);

x.down();
y = x.y; /* optimiser drops this as unnecessary */
x.up();

blahblahblahblahblah(y);
}

if x isn't marked as volatile, then it seems to me an optimiser might
drop the second assignment to y, and blahblahblahblahblah() will be
called with the wrong argument. So maybe Eric is right after all.
--

Patrick TJ McPhee
East York Canada
pt...@ican.net

Bryan O'Sullivan

unread,

Jul 1, 1997, 3:00:00 AM7/1/97

to

p> Just to throw in my hundreds if not thousands of dollars worth,
p> Eric's idea has some merit, and Bryan did not give any particular
p> reason for disagreeing.

I discussed my reasons with Eric in email.

p> All he said was `no, use mutexes, and volatile just stops the
p> compiler from optimising', which really isn't a correct answer to
p> Eric's suggestion. So, Bryan is wrong.

No, Bryan is right.

[Much blather deleted.]

Dave Butenhof

unread,

Jul 2, 1997, 3:00:00 AM7/2/97

to

Bryan O'Sullivan wrote:
>
> p> All he said was `no, use mutexes, and volatile just stops the
> p> compiler from optimising', which really isn't a correct answer to
> p> Eric's suggestion. So, Bryan is wrong.
>
> No, Bryan is right.
>
> [Much blather deleted.]

As Bryan so eloquently stated, "No, Bryan is right". He did give the
correct answer, and anyone ignoring the advice does so at their own
(substantial) peril.

Look, I'm sorry to come down on someone, and it's noble of you to stand
up against us naysayers. But the dangerous misinformation was
distributed as fact and as a "cool optimization." And if that
misinformation were not corrected a lot of innocent people would get
hurt. Being "noble" doesn't make you, or Eric, right.

Patrick TJ McPhee wrote:
>
> Dave, in the message I'm replying to,
> got caught up in pointing out that the example wasn't very good, and
> ignored the original question.

No, I "got caught up" in trying to explain what was wrong with the
concept, which WAS the original "question" (though it was more of an
assertion than a question).

You cannot avoid synchronization when you think there may be only one
thread using the object, because TWO threads (or more) may do that at
the exact same time. The result is no synchronization at all, and this
is bad. You can only KNOW there is only one thread using the object if
you've made that decision using appropriate synchronization.

It's true that "synchronization" doesn't always mean a mutex. If your
situation is sufficiently constrained and well-understood, and if you're
very careful, there are some cases where you can rely on subtle
implications of the POSIX memory rules to avoid a mutex. Such cases are
few and far between, are not generally or widely applicable, and are far
more error prone and complicated than some might be lead to believe by
looking at simplistic checks for "reference count greater than one".

Yes, the code as posted might work, if that class is used only in
carefully controlled circumstances. The necessary restrictions clearly
were not envisioned by the author, because they invalidate the apparent
generality of the "reference count" model. It would work only if the
reference count were incremented beyond 1 (forcing thread safe behavior)
by the main program BEFORE any threads were created. This would ensure
that all threads see a value greater than 1 and use the mutex as they
should. (It'll also work if the application architecture ensures that an
object will only be used in one thread, but that's not very
interesting.) This code will never work (except by sheer accident) if
the class is in a library, because the library can't know whether more
than one thread exists or may choose to use the object.

That's why the only real answer is "don't do that". If you "know" it'll
work in your case, fine. It's your code. But, in my professional and
expert opinion, the risk far outweighs any possible benefit. Because
even if you get it right, someone else who doesn't understand all the
details will apply the code in an inappropriate way later and break the
program.

Bryan O'Sullivan

unread,

Jul 2, 1997, 3:00:00 AM7/2/97

to

p> [this is Eric now:]
p> % After realizing that you could make member functions volatile as
p> % well as const, it occured to me that 'thread-safe' member functions
p> % ought to be marked as volatile, and that objects shared between several
p> % threads ought to be declared as volatile.

p> Why shouldn't these objects be considered volatile?

Because POSIX memory semantics make it unnecessary to declare them as
volatile, and because declaring variables as volatile will inhibit
several compiler optimisations. You gain nothing in safety and lose
in performance.

Patrick TJ McPhee

unread,

Jul 3, 1997, 3:00:00 AM7/3/97

to

In article <33BA39...@zko.dec.com>,
Dave Butenhof <bute...@zko.dec.com> wrote:
[...]

% Patrick TJ McPhee wrote:
% >
% > Dave, in the message I'm replying to,
% > got caught up in pointing out that the example wasn't very good, and
% > ignored the original question.
%
% No, I "got caught up" in trying to explain what was wrong with the
% concept, which WAS the original "question" (though it was more of an
% assertion than a question).

The original question, and I struggled with that word, was

[this is Eric now:]
% After realizing that you could make member functions volatile as
% well as const, it occured to me that 'thread-safe' member functions
% ought to be marked as volatile, and that objects shared between several
% threads ought to be declared as volatile.

Later, he introduced his non-thread-safe member function as an example
of this, but that's nothing to do with the original concept. So far
as I can see, the original posting didn't mention optimisation, and
the idea has nothing to do with optimisation.

I've deleted the rest of Dave's post, but I'll say again that it
doesn't address either the subject line of this thread or
the point of Eric's paragraph quoted above.

So sure, let's all agree that we shouldn't assume certain operations
will be done in a single instruction and are therefore safe. Let's
assume that you have to provide for thread synchronisation if you're
going to update shared objects. Now what about the original question.

Why shouldn't these objects be considered volatile?

--

Dave Butenhof

unread,

Jul 3, 1997, 3:00:00 AM7/3/97

to

I really should ignore this deepening rat-hole. But, OK, just one more
try...

Patrick TJ McPhee wrote:
>
> Later, he introduced his non-thread-safe member function as an example
> of this, but that's nothing to do with the original concept. So far
> as I can see, the original posting didn't mention optimisation, and
> the idea has nothing to do with optimisation.

They exhibit the same essential flaw, the same misunderstanding of "what
it all means". This is of course no surprise, since the EXAMPLE was
indeed an example of the CONCEPT. It was substantially easier, and more
direct, to explain why the example wouldn't behave correctly than to
work with the generic concept. But they're the same thing, and I don't
see how you could consider comments on the example "irrelevant" to the
concept it demonstrates.

And as for "optimization"... if one is not interested in "optimizing",
then one has no motive whatsoever to risk the ill will of memory systems
by avoiding a mutex where it's appropriate to use a mutex. Therefore,
the only conceivable motivation for Eric's CONCEPT and EXAMPLE is a
desire to optimize. Thus, while he may or may not have used the word
"optimize", it would have been pointless to ignore that aspect of the
post.

> So sure, let's all agree that we shouldn't assume certain operations
> will be done in a single instruction and are therefore safe. Let's
> assume that you have to provide for thread synchronisation if you're
> going to update shared objects. Now what about the original question.
> Why shouldn't these objects be considered volatile?

The use of "volatile" is not sufficient to ensure proper memory
visibility or synchronization between threads. The use of a mutex is
sufficient, and, except by resorting to various non-portable machine
code alternatives, (or more subtle implications of the POSIX memory
rules that are much more difficult to apply generally, as explained in
my previous post), a mutex is NECESSARY.

Therefore, as Bryan explained, the use of volatile accomplishes nothing
but to prevent the compiler from making useful and desirable
optimizations, providing no help whatsoever in making code "thread
safe". You're welcome, of course, to declare anything you want as
"volatile" -- it's a legal ANSI C storage attribute, after all. Just
don't expect it to solve any thread synchronization problems for you.

Because of this flaw in reasoning, Eric's EXAMPLE of his CONCEPT was
neither correct nor an optimization.

I'd like to stop beating this to death. It's not fair to Eric, who
merely had the misfortune to be someone (like probably 95% of everyone
else) who didn't understand the intricicies of SMP memory systems and
thread synchronization. He proposed a shortcut, he was corrected, and I
suspect he (and certainly I) would like to move on to other matters and
stop dragging this (and him) through the dust. Please?

Tom Payne

unread,

Jul 3, 1997, 3:00:00 AM7/3/97

to

In comp.programming.threads Patrick TJ McPhee <pt...@ican.net> wrote:
[...]
: So sure, let's all agree that we shouldn't assume certain operations

: will be done in a single instruction and are therefore safe. Let's
: assume that you have to provide for thread synchronisation if you're
: going to update shared objects. Now what about the original question.
: Why shouldn't these objects be considered volatile?

When an object is declared to be volatile:
* its values at sequence points become part of the program's behavior
(which necessitates a lot of storing)
* all register resident copies of its value become stale after a
sequence point (which necessitates a lot of loading).
The basic idea is that the variable might be an I/O register
controlling an external device and/or subject to asynchronous changes
by an external device.

Variables shared among uncoordinated threads suffer from exactly the
same problem, but, as I understand things, it is part of the POSIX
standard that, after acquiring a mutex, a thread will see only the
latest values of all shared variables. This requirement might pose
difficulties with intermodule global register allocation. With
standard compilation technology, all variables get flushed from
registers at a call to any function in another module.

Tom Payne

hop...@omnifarious.mn.org

unread,

Jul 7, 1997, 3:00:00 AM7/7/97

to

In article <33B791...@zko.dec.com>,

Dave Butenhof <bute...@zko.dec.com> wrote:
>
> Note that Digital UNIX and Solaris (at least) provide ways to help you
> write thread-safe libraries that minimize the overhead of
> synchronization when running in a non-threaded process. Solaris provides
> thread entry points in libc that are "stubs" -- just returning
> immediately. The entry point symbols are preempted by the read thread
> functions when the thread libraries are activated. (At least if they're
> activated in the right order.) Digital UNIX takes a different route,
> with a "TIS" (thread-independent services) API in libc, which has
> non-thread stubs that are revectored dynamically into the thread library
> when it initializes. (This works even for dynamic activation, and a
> mutex, for example, that's locked by the stub tis_mutex_lock will still
> be locked after activating the thread library, and can then be unlocked
> successfully by the initial thread.)

Why should I pay the overhead of mutex locking for a local object
declared on the stack who's reference isn't passed to any other functions
just because I happen to have more than one thread in my program?

Objects should be shared between threads very sparingly since each such
object is a possible source of devious heisenbugs.

-------------------==== Posted via Deja News ====-----------------------
http://www.dejanews.com/ Search, Read, Post to Usenet

hop...@omnifarious.mn.org

unread,

Jul 7, 1997, 3:00:00 AM7/7/97

to

In article <5pbf5i$4gi$1...@readme.ican.net>,

pt...@ican.net (Patrick TJ McPhee) wrote:
> I think Eric's idea is not bad, but probably won't buy you much, and
> could easily lull you into a false sense of security. On the other
> hand, since I've now suggested that everyone else who's contributed
> to this thread is wrong in one way or another and I want to avoid
> suggesting that I think _I'm_ right, what the hell is the volatile
> keyword for if not for this situation? Isn't it to prevent a compiler
> from doing something like this:
>
> myhappyclass x;
>
> myhappyfunction()
> {
> int y;
> x.down();
> y = x.y;
> x.up();
>
> /* while this is going on, another thread changes x.y */
> blahblahblah(y);
>
> x.down();
> y = x.y; /* optimiser drops this as unnecessary */
> x.up();
>
> blahblahblahblahblah(y);
> }
>
> if x isn't marked as volatile, then it seems to me an optimiser might
> drop the second assignment to y, and blahblahblahblahblah() will be
> called with the wrong argument. So maybe Eric is right after all.

The reason why this will probably work is that most compilers don't
optimize across function call boundaries because of possible unkown
aliases for the variables they're caching in registers.

Compilers that do the kind of global analysis necessary to optimize
code like this will probably be smart enough to recognize things that are
being mutex locked and act appropriately.

hop...@omnifarious.mn.org

unread,

Jul 7, 1997, 3:00:00 AM7/7/97

to

No, I disagree with your assessment.

> Two threads may both call AddReference simultaneously, both see _refct
> <= 1, both increment _refct, and both store a new (incremented) value.
> Your refct is then 2 (for example) when it should be 3. You'll now
> switch into mutexed mode for the next reference -- but your reference
> count is already wrong. (And you'll incorrectly switch back out of
> mutexed mode after the next dereference.) This is possible on a
> uniprocessor, not just on a multiprocessor, because your thread could be
> timesliced between the fetch and the store.

How do two threads call AddReference simultaneously when there is only
one reference to the object? One way, of course, would be to have the
reference be global. Of course, then the reference should be mutex
protected at all times. I presume the library forces a full write cache
flush whenever a mutex is unlocked, or a mutex would work for the purpose
it was intended.

This takes into account things said to me in an accidentally started
private e-mail conversation with Bryan. If you wish (and with ryan's
permission, I will post that discussion, although you've already covered
his points.

> Making the variable "volatile" doesn't help, but it does prevent the
> compiler from doing its job.

Accessing a volatile is still far more efficient than the function call
overhead typically required for obtaining a mutex lock.

I fully realize the pitfalls of declaring a global data element to be
volatile and expecting multiple readers and writers to remain
synchronized, even in a uniprocessor environment.

volatile is necessary in this context because you need to force the
compiler to flush the update to memory, even if that flush doesn't occur
immediately due to the vagaries of particular SMP environments.

>> Another case in which you might want to do this is when you want
>> to have the same class work in both threaded and non-threaded contexts.
>> You could overload the public and protected member functions on whether
>> or not they were volatile. The volatile ones could just obtain a mutex
>> and call the non-volatile version after a const_cast to get rid of the
>> volatile.
>
> If you're writing an APPLICATION, (not a library), and you want to
> choose whether to use threads at runtime, then, sure, you can use
> non-thread-safe versions for non-threaded runs. But the non-threaded
> versions don't need volatile and volatile won't help the threaded
> versions.

Perhaps the same class is being used in both a threaded and a
non-threaded manner in the same application. The volatile keyword makes
a good tag as to which way the class is being used in a particular
context, and allows you to avoid mmutex overhead when it isn't needed.

> Don't try this with a LIBRARY, though, (or an application that supports
> callbacks of some sort from a library), because in this case your caller
> is in charge of whether you're threaded, and you can't necessarily even
> tell.

If you make it clear to the library user that they have to be careful
with use of the volatile keyword, I think it would be OK. As for
callbacks... Any library that makes callbacks on a thread not borrowed
from the application is asking for trouble unless they make it very clear
what's going on.

Bryan O'Sullivan

unread,

Jul 7, 1997, 3:00:00 AM7/7/97

to

e> Why should I pay the overhead of mutex locking for a local object
e> declared on the stack who's reference isn't passed to any other
e> functions just because I happen to have more than one thread in my
e> program?

You don't, nor did anyone state implicitly or explicitly that you do.

Bryan O'Sullivan

unread,

Jul 7, 1997, 3:00:00 AM7/7/97

to

e> No, I disagree with your assessment.

I'm afraid that your disagreement is incorrect. You are certainly
welcome to maintain your position, but Dave will still be right.

e> Accessing a volatile is still far more efficient than the function
e> call overhead typically required for obtaining a mutex lock.

That is not necessarily true; even if it were, it is still not
helpful. You should not be caring about nigglingly small points of
efficiency, but about correctness and larger issues of efficiency.

e> volatile is necessary in this context because you need to force the
e> compiler to flush the update to memory, even if that flush doesn't
e> occur immediately due to the vagaries of particular SMP
e> environments.

Using the volatile keyword is still insufficient to give you reliable
(i.e. correct) semantics, and this is the point that Dave and I have
been pushing. The sequence you should be following is like this:

1. Develop your code. Make sure it is correct.

2. Benchmark it in realistic conditions.

3. Is it fast enough? If so, you're done. If not, continue.

4. Look at the algorithms and data structures you're using, and the
overall structure of the synchronisation you are doing. See if
you can make any changes that would have a large impact. Go to
step 1.

5. Once you have everything working sensibly in the large and you
still aren't getting quite the performance you need, start
worrying about those inner loops.

Only during the last step should you start worrying about ways to
improve the performance of your code with respect to individual
mutexes or condition variables.

At this point, you may be thinking about rewriting your inner loops in
assembly language and doing other platform-dependent things, depending
on how much you need to care about speed and portability, so more or
less anything goes, perhaps including use of your own synchronisation
code.

This is the sort of thing you will only need to pay attention to if
you have a lot of time to spare and performance is of utmost
importance, though; up until near the end of step 5, you should use
whatever portable vendor-provided synchronisation constructs are
appropriate to your task, and you will find that this suffices for
99.99% of all your programming needs.

I absolutely guarantee you that trying to write your own portable
synchronisation code in C or C++ is a quick route to insanity and
humbleness. If you think you know enough to get it right without
having used your code in production work for a year or three, you just
haven't been bitten often enough by the subtle bugs in your code.

Dave Butenhof

unread,

Jul 8, 1997, 3:00:00 AM7/8/97

to

hop...@omnifarious.mn.org wrote:
>
> No, I disagree with your assessment.

That's fine. Disagree. I won't even bother to argue, because I notice
I've already started trying to simply rephrase the same information in
hopes that someone who didn't understand originally will suddenly see
the light. I'm tired of this, and I don't intend to rephrase yet again
for this particular discussion. I'm going to do some work today,
instead.

Suffice it to say that I believe someone following your advice will get
themselves into trouble. The trouble will be subtle and difficult to
diagnose. They will regret it. Perhaps you'll feel guilty, though you'll
probably never even know.

> I presume the library forces a full write cache

> flush whenever a mutex is unlocked or a mutex would[n't] work for the
> purpose it was intended.

Ah, this is a tangent that's more interesting, and about which I have
something to say that's new -- at least in the context of this
discussion. Use of mutexes does NOT imply a cache flush. Rather, proper
use of a mutex implements a memory coherency PROTOCOL that ensures one
thread, via a mutex, can pass a consistent view of memory to another
thread. That second thread (the thread that next locks the mutex after
one thread unlocks the mutex) does indeed "inherit" a consistent view of
memory -- but that is a result not merely of the UNLOCK, but of the
UNLOCK in one thread combined with the LOCK in the next thread.

A mutex unlock is generally a memory barrier followed by clearing the
lock bit. The memory barrier ensures that the current processor cannot
reorder the writes that occurred within the locked region past the
unlock itself. It does not, however, ensure that those protected writes
occur immediately, or even soon. A lock is an atomic set (test-and-set,
swap, whatever) followed by a memory barrier. The barrier ensures that
any data written within the locked region cannot be reordered past
(before) the lock itself.

In combination, this means that a thread locking a mutex can be sure
that it will see all data written prior to the previous thread's unlock.
But that is NOT the same as a "cache flush". It's merely an orderly
limitation on memory operation reordering.

(Note that many older systems do not have this concept of a memory
barrier, and that confuses a lot of people. Most of us are used to
guaranteed hardware read/write ordering, where all memory transactions
occur IN ORDER -- and often atomically, as well. This is NOT true of
modern high-performance multiprocessor memory systems. You don't need to
worry about this, as long as you "follow the rules" -- but to break the
rules successfully you'd better understand every detail of your
hardware, and don't expect the behavior to be remotely portable!)

The implications of this distinction are subtle, but the important
consideration is that BOTH sides must use a mutex, or there is no
synchronization (or visibility guarantees). WRITING a variable under a
mutex and READING it in another thread without a mutex provides no
visibility guarantees.

This is one of the ways in which volatile falls down as a "substitute"
for synchronization. The volatile attribute forces the COMPILER to
generate write (and read) instructions in a few places where it might
not otherwise, but it has no effect on the hardware caching. In
particular, it does not "flush cache", or enforce any ordering on memory
operations. It prevents the compiler from keeping values in registers
that would benefit from being kept in registers. The volatile attribute
is useful for situations where you want to read or write a variable from
a signal handler, or after a longjmp, or where the variable's address is
bound to a hardware register (e.g., direct mapped I/O) such that each
change in value may be critical to the operation of the hardware. The
volatile attribute does not make operations on the variable atomic, nor
does it create a protocol that provides synchronization or visibility
across threads/processes operating in parallel.

Eric Hopper

unread,

Jul 8, 1997, 3:00:00 AM7/8/97

to

In article <33C219...@zko.dec.com>,
Dave Butenhof <bute...@zko.dec.com> wrote:

>
> hop...@omnifarious.mn.org wrote:
>
> > I presume the library forces a full write cache
> > flush whenever a mutex is unlocked or a mutex would[n't] work for the
> > purpose it was intended.
>

> A mutex unlock is generally a memory barrier followed by clearing the
> lock bit. The memory barrier ensures that the current processor cannot
> reorder the writes that occurred within the locked region past the
> unlock itself. It does not, however, ensure that those protected
> writes occur immediately, or even soon. A lock is an atomic set
> (test-and-set, swap, whatever) followed by a memory barrier. The
> barrier ensures that any data written within the locked region cannot
> be reordered past (before) the lock itself.
>
> In combination, this means that a thread locking a mutex can be sure
> that it will see all data written prior to the previous thread's unlock.
> But that is NOT the same as a "cache flush". It's merely an orderly
> limitation on memory operation reordering.

OK, I understand this, and agree that you are correct. This is
somewhat different than I had imagined it, but I think I'm still right.
*grin*

My point is, that in order to pass a reference from one thread
to another, you have to pass it through a shared data structure such as
a queue, or a global pointer object. That shared data structure would
require a mutex. In essence, your reference counted object is
'borrowing' the mutex of the shared data structure you're using to pass
it with. This would no longer work after two threads had references to
the shared object since those two threads could touch the reference
counted object without actually modifying the shared data structure in
any way.

> The implications of this distinction are subtle, but the important
> consideration is that BOTH sides must use a mutex, or there is no
> synchronization (or visibility guarantees). WRITING a variable under a
> mutex and READING it in another thread without a mutex provides no
> visibility guarantees.

I don't think the distinction is subtle. In fact, it's
glaringly obvious. It's just that nobody bothered to explain it in
detail.

> This is one of the ways in which volatile falls down as a

[stuff deleted]

This means that volatile is NOT OK in cases where you have a
reader who accesses a variable, and doesn't really care how up-to-date
it is. I can now think of systems in which the variable's value may be
hopelessly out of date, and that would render volatile essentially
useless for such a purpose.

I also still maintain that volatile might still be useful simply
as a tag stating whether or not you intended to have multiple threads
access an object or not. Having volatile versions of functions simply
aquire the mutex, and call a non-volatile version would be potentially
useful, and wouldn't incure a performance hit.

(Sadly, my ISP has HORRIBLE news service, and so I'm forced to
post through DejaNews. *sigh*)

Todd Murray

unread,

Jul 8, 1997, 3:00:00 AM7/8/97

to

Bryan O'Sullivan wrote:
>
> e> No, I disagree with your assessment.
>
> I'm afraid that your disagreement is incorrect. You are certainly
> welcome to maintain your position, but Dave will still be right.
>
> e> Accessing a volatile is still far more efficient than the function
> e> call overhead typically required for obtaining a mutex lock.
>
> That is not necessarily true; even if it were, it is still not
> helpful. You should not be caring about nigglingly small points of
> efficiency, but about correctness and larger issues of efficiency.

Don't forget MAINTAINABILITY and PORTABILITY. I would hope that any
professional software engineer wouldn't write code that depends on
processor tricks or alternative synchronization methods. Someone else
will invariably end up maintaining the code or trying to port it to
another platform. Suddenly, code that worked on one system (relying on
a read operation to be atomic, for instance) won't work on another
system. The code might be 3000 lines deep in a file, under a comment
saying, "// Note: This next line depends on a read of an unsigned int to
be atomic across threads."

> e> volatile is necessary in this context because you need to force the
> e> compiler to flush the update to memory, even if that flush doesn't
> e> occur immediately due to the vagaries of particular SMP
> e> environments.
>
> Using the volatile keyword is still insufficient to give you reliable
> (i.e. correct) semantics, and this is the point that Dave and I have
> been pushing. The sequence you should be following is like this:
>
> 1. Develop your code. Make sure it is correct.
>
> 2. Benchmark it in realistic conditions.
>
> 3. Is it fast enough? If so, you're done. If not, continue.
>
> 4. Look at the algorithms and data structures you're using, and the
> overall structure of the synchronisation you are doing. See if
> you can make any changes that would have a large impact. Go to
> step 1.
>
> 5. Once you have everything working sensibly in the large and you
> still aren't getting quite the performance you need, start
> worrying about those inner loops.
>
> Only during the last step should you start worrying about ways to
> improve the performance of your code with respect to individual
> mutexes or condition variables.

Very true. I couldn't agree more. And I'd have to ask if the speed of
the code is more important than the correctness and maintainability of
the code. The customer isn't going to see a 10% speed increase in an
inner loop in most cases. But, if that customer has a bug caused by
incorrect thread synchronization (by someone relying on a trick that
looks like it works, but fails a very small percentage of the time), the
customer will get very upset and call frequently until the problem is
fixed.

> At this point, you may be thinking about rewriting your inner loops in
> assembly language and doing other platform-dependent things, depending
> on how much you need to care about speed and portability, so more or
> less anything goes, perhaps including use of your own synchronisation
> code.
>
> This is the sort of thing you will only need to pay attention to if
> you have a lot of time to spare and performance is of utmost
> importance, though; up until near the end of step 5, you should use
> whatever portable vendor-provided synchronisation constructs are
> appropriate to your task, and you will find that this suffices for
> 99.99% of all your programming needs.

Not only that, but even if YOU understand what you're doing by writing
your own synchronization, chances are that some poor shmuck trying to
maintain your code won't. (And more often than not, I've been that poor
shmuck, trying to understand someone's code when they've implemented
something that looked like an optimization when it was really a
poorly-thought out hack. If my career ever progresses to the point
where I actually spend some time doing design and new implementation, as
opposed to fixing old hacked-up fire hazard code, you can bet I'll
design things cleanly and make it obvious what I'm doing. *)

> I absolutely guarantee you that trying to write your own portable
> synchronisation code in C or C++ is a quick route to insanity and
> humbleness. If you think you know enough to get it right without
> having used your code in production work for a year or three, you just
> haven't been bitten often enough by the subtle bugs in your code.

(* Two gripes about my workplace deleted. E-mail me for the deleted
bits.)
--
Todd Murray - tam @ nospam.visi.com http://www.visi.com/~tam/
Slightly deranged mountain biker, snow skater, and keeper of the
'97 Wrangler FAQ: http://www.visi.com/~tam/tjfaq.html
Don't remove "nospam" from my E-mail address.

Eric Hopper

unread,

Jul 8, 1997, 3:00:00 AM7/8/97

to

In article <87iuymq...@serpentine.com>,
Bryan O'Sullivan <b...@serpentine.com> wrote:

>
> I wrote:
>
>> Why should I pay the overhead of mutex locking for a local object

>> declared on the stack who's reference isn't passed to any other

>> functions just because I happen to have more than one thread in my

>> program?
>
> You don't, nor did anyone state implicitly or explicitly that you do.

But you did by telling me that volatile was stupid, and
shouldn't be used in any circumstances, not even as a qualifier on
member functions.

Yet another contrived example:

class AtomicCounter {
public:
inline AtomicCounter(int initial = 0, int synctype = USYNC_THREAD);
inline AtomicCounter(const AtomicCounter &b);
inline AtomicCounter(const volatile AtomicCounter &b);
~AtomicCounter();

inline const AtomicCounter operator ++();
inline const AtomicCounter operator ++(int);
inline const AtomicCounter operator --();
inline const AtomicCounter operator --(int);

inline const AtomicCounter &operator =(const AtomicCounter &b); inline
const AtomicCounter &operator =(const volatile AtomicCounter &b); inline
const AtomicCounter &operator =(int newval);

inline int GetValue() const;

const AtomicCounter operator ++() volatile;
const AtomicCounter operator ++(int) volatile;
const AtomicCounter operator --() volatile;
const AtomicCounter operator --(int) volatile;

const volatile AtomicCounter &operator =(const AtomicCounter &b)
volatile; const volatile AtomicCounter &operator =(const volatile
AtomicCounter &b) volatile; const volatile AtomicCounter &operator =(int
newval) volatile;

int GetValue() const volatile;

private:
int val_;
mutex_t mut_;
};

inline AtomicCounter::AtomicCounter(int initial, int synctype) :
val_(initial) { mutex_init(&mut_, synctype, NULL); }

inline AtomicCounter::AtomicCounter(const AtomicCounter &b)
: val_(b.GetValue())
{
mutex_init(&mut_, synctype, NULL);
}

inline AtomicCounter::AtomicCounter(const volatile AtomicCounter &b)
: val_(b.GetValue())
{
mutex_init(&mut_, synctype, NULL);
}

AtomicCounter::~AtomicCounter()
{
mutex_destroy(&mut_);
}

inline const AtomicCounter AtomicCounter::operator ++()
{
return(AtomicCounter(++val_));
}

inline const AtomicCounter AtomicCounter::operator ++(int)
{
return(AtomicCounter(val_++));
}

inline const AtomicCounter AtomicCounter::operator --()
{
return(AtomicCounter(--val_));
}

inline const AtomicCounter AtomicCounter::operator --(int)
{
return(AtomicCounter(val_--));
}

inline const AtomicCounter &AtomicCounter::operator =(const AtomicCounter
&b) { val_ = b.GetValue(); }

inline const AtomicCounter &
AtomicCounter::operator =(const volatile AtomicCounter &b)
{
val_ = b.GetValue();
}

inline const AtomicCounter &AtomicCounter::operator =(int newval)
{
val_ = newval;
}

inline int AtomicCounter::GetValue() const
{
return(val_);
}

inline const AtomicCounter AtomicCounter::operator ++() volatile {
mutex_lock(&mut_); AtomicCounter temp = (const_cast<AtomicCounter
*>(this))->operator ++(); mutex_unlock(&mut_); return(temp); }

inline const AtomicCounter AtomicCounter::operator ++(int) volatile {
mutex_lock(&mut_); AtomicCounter temp = (const_cast<AtomicCounter
*>(this))->operator ++(0); mutex_unlock(&mut_); return(temp); }

inline const AtomicCounter AtomicCounter::operator --() volatile {
mutex_lock(&mut_); AtomicCounter temp = (const_cast<AtomicCounter
*>(this))->operator --(); mutex_unlock(&mut_); return(temp); }

inline const AtomicCounter AtomicCounter::operator --(int) volatile {
mutex_lock(&mut_); AtomicCounter temp = (const_cast<AtomicCounter
*>(this))->operator --(0); mutex_unlock(&mut_); return(temp); }

inline volatile const AtomicCounter &
AtomicCounter::operator =(const AtomicCounter &b) volatile
{
mutex_lock(&mut_);
(const_cast<AtomicCounter *>(this))->operator =(b);
mutex_unlock(&mut_);
return(*this);
}

inline volatile const AtomicCounter &
AtomicCounter::operator =(const volatile AtomicCounter &b) volatile
{
// Deadlock avoidance.
int newval = b.GetValue();
mutex_lock(&mut_);
val_ = newval;
mutex_unlock(&mut_);
return(*this);
}

inline const AtomicCounter &AtomicCounter::operator =(int newval) volatile
{
mutex_lock(&mut_);
(const_cast<AtomicCounter *>(this))->operator =(newval);
mutex_unlock(&mut_);
return(*this);
}

inline int AtomicCounter::GetValue() const volatile
{
int val;
mutex_lock(&mut_);
val = val_;
mutex_unlock(&mut_);
return(val);
}

Yeah, OK, it's rather wordy. *grin*

This class has the neat property of working in both a shared and
non-shared way in an efficient manner. Of course, you had better be
using it through a reference (direct or indirect) tagged with the
volatile storage qualifier if it's shared.

It also has the neat property of acting non-shared when you
don't access it through a reference tagged with volatile, and acting
shared when you do. This means you can do stuff like this:

extern volatile AtomicCounter *glbl_cntr;
extern mutex_t glbl_cntr_mut;

void f()
{
AtomicCounter x, y, z;

++x;
// It's that mutex 'borrowing' thing again.
mutex_lock(&glbl_cntr_mut);
glbl_cntr = &x;
// And now x is up-to-date
mutex_unlock(&glbl_cntr_mut);

// Careful in here because we don't 'own' x anymore, and need to make
// all references through glbl_cntr, or other volatile versions of x.

// This routine promises to release all references and locks on
// glbl_cntr before it returns.
CountStuffInSeperateThread();

++y;
// More mutex 'borrowing'. x is is up-to-date again.
mutex_lock(&glbl_cntr_mut);
glbl_cntr = &y;
// And now y is up-to-date
mutex_unlock(&glbl_cntr_mut);
++x;

// Careful in here because we don't 'own' y anymore, and need to make
// all references through glbl_cntr, or other volatile versions of y.

// This routine promises to release all references and locks on
// glbl_cntr before it returns.
CountStuffInSeperateThread();

// Last bit of mutex 'borrowing'. y is now up-to-date again.
mutex_lock(&glbl_cntr_mut);
glbl_cntr = 0;
mutex_unlock(&glbl_cntr_mut);
++y;
}

The updates to x and y inside of f are very fast and efficient,
like they should be. All the references everywhere else use mutexes,
like they ought to. As you can see in the comment before the
'CountStuffInSeperateThread()' call, there are ways you can get bitten
this way, but if used with care, this should work.

This, again, is a rather contrived example, but if I were more
awake, and thought about it a bit, I think I could think of a better
one.

Eric Hopper

unread,

Jul 8, 1997, 3:00:00 AM7/8/97

to

In article <87hge6q...@serpentine.com>,

Bryan O'Sullivan <b...@serpentine.com> wrote:
>

> I'm afraid that your disagreement is incorrect. You are certainly
> welcome to maintain your position, but Dave will still be right.

Perhaps in some sense, but I won't be wrong. Not that I can't
be mind you. I simply am not in this instance.

> Using the volatile keyword is still insufficient to give you reliable
> (i.e. correct) semantics, and this is the point that Dave and I have
> been pushing. The sequence you should be following is like this:

It is in the way I've been using it, given certain constraints
that I think are very reasonable.

> 1. Develop your code. Make sure it is correct.
>
> 2. Benchmark it in realistic conditions.
>
> 3. Is it fast enough? If so, you're done. If not, continue.
>
> 4. Look at the algorithms and data structures you're using, and the
> overall structure of the synchronisation you are doing. See if
> you can make any changes that would have a large impact. Go to
> step 1.
>
> 5. Once you have everything working sensibly in the large and you
> still aren't getting quite the performance you need, start
> worrying about those inner loops.
>
> Only during the last step should you start worrying about ways to
> improve the performance of your code with respect to individual
> mutexes or condition variables.

I'm quite familiar with these concepts and I have two things to
say:

One, you should attempt to do things right the first time.
Otherwise you end up with bloated, slow garbage like Windoze and Windoze
NT. Too little attention to efficiency at the beginning of your project
can lead to programs that are impossible to optimize very well later.

Two, you need somewhere to turn when you do want to increase
efficiency, after you know your program works. I think you assume that
I'm a much less experienced programmer than I am.

> At this point, you may be thinking about rewriting your inner loops in
> assembly language and doing other platform-dependent things, depending
> on how much you need to care about speed and portability, so more or
> less anything goes, perhaps including use of your own synchronisation
> code.

*sigh* No, actually synchronization issues can be far, far more
of a performance degredation than an HLL like C or C++. I wouldn't put
sync tuning at the top of my list, but it would certainly not be at the
bottom either. This doesn't mean I'd ever try to hand-write my own
synchronization code from ground up in assembly. RISC processors aren't
designed for humans.

> This is the sort of thing you will only need to pay attention to if
> you have a lot of time to spare and performance is of utmost
> importance, though; up until near the end of step 5, you should use
> whatever portable vendor-provided synchronisation constructs are
> appropriate to your task, and you will find that this suffices for
> 99.99% of all your programming needs.

I plan to use portable, vendor-supplied synchronization
constructs exclusively. Rolling your own unportable ones is a DOS,
slap-it-together, programmer style thing to do.

> I absolutely guarantee you that trying to write your own portable
> synchronisation code in C or C++ is a quick route to insanity and
> humbleness. If you think you know enough to get it right without
> having used your code in production work for a year or three, you just
> haven't been bitten often enough by the subtle bugs in your code.

I would never use my volatile idea as a replacement for proper
synchronization code. It's merely an adjunct that allows you to do
things that might be difficult otherwise. I also think that volatile
also makes a good cue and tag to identify thread-safe member functions
and shared object declarations. I think my idea can be used reasonably
safely, especially in its capacity as a tag.

Bryan O'Sullivan

unread,

Jul 9, 1997, 3:00:00 AM7/9/97

to

e> I think my idea can be used reasonably safely, especially in its
e> capacity as a tag.

It's even stupid to use it as a tag:

- You are using the volatile keyword in a way that it was not meant to
be used. Since you don't have control over the compiler, the
compiler is going to work under the assumption that you are not some
kind of eccentric with funny ideas, and will inhibit optimisations
that it could otherwise perform safely, resulting in unnecessary
memory traffic that may degrade your code's performance
substantially on cached multiprocessor systems. You are forcing
semantics onto a language construct that are not there and have
never been implied to be there, which is dumb.

- You assume that some other human will see the volatile keyword and
know that what *you* meant was not the usual semantics of volatile,
but something else.

I've had enough of debating this stuff, since you seem intent on
perseverating over unimportant issues in ways that will make them
important for all the wrong reasons if you ever put your ideas into
practice.