volatile, was: memory visibility between threads

Martin Berger

unread,

Jan 12, 2001, 9:27:29 PM1/12/01

to

Dave Butenhof wrote:

> > One question is, whether those memory visibility rules are applicable for
> > other thread system such as Solaris UI threads, or win32 threads, or JAVA
> > threads ...? If yes, we can follow the same spirit. Otherwise, it will be
> > a big difference. (For example, all shared variables might have to be
> > difined as volatile even with mutex protection.)
>
> Don't ever use the C/C++ language volatile in threaded code. It'll kill your
> performance, and the language definition has nothing to do with what you want
> when writing threaded code that shares data. If some OS implementation tells
> you that you need to use it anyway on their system, (in the words of a child
> safety group), "run, yell, and tell". That's just stupid, and they shouldn't
> be allowed to get away with it.
>

well, in the c/c++ users journal, Andrei Alexandrescu recommends using
"volatile" to help avoiding race conditions. can the experts please slug it out?
(note the cross posting)

martin

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

David Schwartz

unread,

Jan 13, 2001, 5:56:51 AM1/13/01

to

Martin Berger wrote:

> well, in the c/c++ users journal, Andrei Alexandrescu recommends using
> "volatile" to help avoiding race conditions. can the experts please slug it out?
> (note the cross posting)
>
> martin

What race conditions? If you have a race condition, you need to _FIX_
it. Something that might help "avoid" it isn't good enough. If I told my
customers I added some code that helped avoid race conditions, they'd
shoot me. Code shouldn't _have_ race conditions.

DS

Martin Berger

unread,

Jan 13, 2001, 3:01:23 PM1/13/01

to

David Schwartz wrote:

> > well, in the c/c++ users journal, Andrei Alexandrescu recommends using
> > "volatile" to help avoiding race conditions. can the experts please slug it out?
> > (note the cross posting)
> >
> > martin
>
> What race conditions? If you have a race condition, you need to _FIX_
> it. Something that might help "avoid" it isn't good enough. If I told my
> customers I added some code that helped avoid race conditions, they'd
> shoot me. Code shouldn't _have_ race conditions.

maybe i should have given more details: this idea is to use certain properties
of the typing system of c++ with respect to "volatile", "const_cast", overloading
and method invocation to the effect that race conditions will be type errors or
generate at least compiler warnings (see

http://www.cuj.com/experts/1902/alexandr.html

for details). this is quite nifty, provided we ignore the problems dave has
pointed out.

interestingly, andrei's suggestions do not at all depend on the intended semantics
of "volatile", only on how the typing systems checks it and handels const_cast
and method invocation in this case. it it is possible to introduce a new c++
qualifier, say "blob" to the same effect but without the shortcomings (it would
even and in controdistinction to the "volatile" base proposal, handle built in
types correctly).

martin

James Moe

unread,

Jan 13, 2001, 10:59:46 PM1/13/01

to

Martin Berger wrote:
>
>
> well, in the c/c++ users journal, Andrei Alexandrescu recommends using
> "volatile" to help avoiding race conditions. can the experts please slug it
out?
> (note the cross posting)
>

Use mutexes or semaphores to control access to common data areas. It
is what they are for.
"volatile" is meant for things like hardware access where a device
register can change at any time.

--
sma at sohnen-moe dot com

Joerg Faschingbauer

unread,

Jan 13, 2001, 11:01:04 PM1/13/01

to

>>>>> "Martin" == Martin Berger <martinb@--remove--me--dcs.qmw.ac.uk>
>>>>> writes:

>> Don't ever use the C/C++ language volatile in threaded code. It'll
>> kill your performance, and the language definition has nothing to
>> do with what you want when writing threaded code that shares
>> data. If some OS implementation tells you that you need to use it
>> anyway on their system, (in the words of a child safety group),
>> "run, yell, and tell". That's just stupid, and they shouldn't be
>> allowed to get away with it.
>>

Martin> well, in the c/c++ users journal, Andrei Alexandrescu
Martin> recommends using "volatile" to help avoiding race
Martin> conditions. can the experts please slug it out? (note the
Martin> cross posting)

(I am not an expert, but there's a few things I understood :-)

You use a mutex to protect data against concurrent access.

int i;

void f(void) {
lock(mutex);
i++; // or something
unlock(mutex);
// some lengthy epilogue goes here
}

Looking at it more paranoidly, on might argue that an optimizing
compiler will probably want to keep i in a register for some reason,
and that it might want to keep it in that register until the function
returns.

If that was the case the usage of the mutex would be completely
pointless. At the time the function returns (a long time after the
mutex was unlocked) the value of i is written back to memory. It then
overwrites the changes that another thread may have made to the value
in the meantime.

This is where volatile comes in. Common understanding is that volatile
disables any optimization on a variable, so if you declare i volatile,
the compiler won't keep it in a register, and all is well (if that was
the definition of volatile - my understanding is that volatile is just
a hint to the compiler, so nothing is well if you put it
legally). Except that this misperforms - imagine you wouldn't have a
simple increment in the critical section, but instead lots more of
usage of i.

Now what does a compiler do? It compiles modules independently. In
doing so it performs optimizations (fortunately). Inside the module
that is being compiled the compiler is free to assign variables to
registers as it likes, and every function has a certain register set
that it uses.

The compiler also generates code for calls to functions in other
modules that it has no idea of. Having no idea of a module, among
other things means that the compiler does not know the register set
that particular function uses. Especially, the compiler cannot know
for sure that the callee's register set doesn't overlap with the
caller's - in which case the caller would see useless garbage in the
registers on return of the callee.

Now that's the whole point: the compiler has to take care that the
code it generates spills registers before calling a function.

So, provided that unlock() is a function and not a macro, there is no
need to declare i volatile.

Hope this helps,
Joerg

Martin Berger

unread,

Jan 14, 2001, 5:09:29 AM1/14/01

to

Kaz Kylheku <k...@ashi.footprints.net>

> >well, in the c/c++ users journal, Andrei Alexandrescu recommends using
> >"volatile" to help avoiding race conditions. can the experts please slug
it
> out?
>

> The C/C++ Users Journal is a comedy of imbeciles.
>
> The article you are talking about completely ignores issues of memory
> coherency on multiprocessor systems. It is very geared toward Windows;
> the author seems to have little experience with multithreaded
> programming, and especially cross-platform multithreading.

would you care to elaborate how "volatile" causes problems with memory
consistency
on multiprocessors?

> Luckily, he
> provides a disclaimer by openly admitting that some code that he wrote
> suffers from occasional deadlocks.

are you suggesting that these deadlock are a consequence of using
"volatile"?
if so, how? i cannot find indications to deadlock inducing behavior of
"volatile"
in kernighan & ritchie

James Dennett

unread,

Jan 14, 2001, 5:54:52 AM1/14/01

to

David Schwartz wrote:
>
> Martin Berger wrote:
>
> > well, in the c/c++ users journal, Andrei Alexandrescu recommends using
> > "volatile" to help avoiding race conditions. can the experts please slug it out?
> > (note the cross posting)
> >
> > martin
>
> What race conditions? If you have a race condition, you need to _FIX_
> it. Something that might help "avoid" it isn't good enough. If I told my
> customers I added some code that helped avoid race conditions, they'd
> shoot me. Code shouldn't _have_ race conditions.

True, and Andrei's claim (which seems reasonable to me, though I've not
verified it in depth) is that his techniques, if used consistently,
will detect all race conditions *at compile time*. If you want to
ensure, absolutely, that no race conditions remain, then you could
try looking into Andrei's technique as a second line of defense.

-- James Dennett <jden...@acm.org>

David Schwartz

unread,

Jan 14, 2001, 5:58:21 AM1/14/01

to

Joerg Faschingbauer wrote:

> Now that's the whole point: the compiler has to take care that the
> code it generates spills registers before calling a function.

This is really not so. It's entirely possible that the compiler might
have some way of assuring that the particular value cached in the
register isn't used by the called function, and hence it can keep it in
a register.

For example:

extern void bar(void);

void foo(void)
{
int i;
i=3;
bar();
i--;
}

The compiler in this case might optimize 'i' away to nothing.
Fortunately, any possible way another thread could get its hands on a
variable is a way that a function in another compilation unit could get
its hands on the variable. Not only is there no legal C way 'bar' could
access 'i', there is no legal C way another thread could.

ConsideR:

extern void bar(void);
extern void qux(int *);

void foo(void)
{
int i;
i=3;
while(i<10)
{
i++;
bar();
i++;
qux(&i);
}
}

For all the compiler knows, 'qux' stores the pointer passed to it and
'bar' uses it. Think about:

int *ptr=NULL;
void bar(void)
{
if(ptr!=NULL) printf("i=%d\n", *ptr);
}

void qux(int *j)
{
ptr=j;
}

So the compiler would have to treat 'i' as if it was volatile in 'foo'
anyway.

So most compilers don't need any special help to compile multithreaded
code. Non-multithreaded code can do the same things.

DS

Kaz Kylheku

unread,

Jan 14, 2001, 5:59:02 AM1/14/01

to

On 13 Jan 2001 23:01:04 -0500, Joerg Faschingbauer

<jfa...@jfasch.faschingbauer.com> wrote:
>So, provided that unlock() is a function and not a macro, there is no
>need to declare i volatile.

Even if a compiler implements sophisticated global optimizations that
cross module boundaries, the compiler can still be aware of
synchronization functions and do the right thing around calls to those
functions.

Joerg Faschingbauer

unread,

Jan 14, 2001, 2:18:53 PM1/14/01

to

>>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

David> Joerg Faschingbauer wrote:

>> Now that's the whole point: the compiler has to take care that the
>> code it generates spills registers before calling a function.

David> This is really not so. It's entirely possible that the
David> compiler might have some way of assuring that the particular
David> value cached in the register isn't used by the called function,
David> and hence it can keep it in a register.

Of course you may once target a system where everything is tightly
coupled, and where the compiler you use to compile your module knows
about the very internals of the runtime - register allocations for
example. Then it could keep variables in registers even across runtime
function calls.

Even though such a thing is possible, it is quite unlikely - consider
the management effort of the people making (and upgrading!) such a
system. And even if people dared doing such a beast, this wouldn't be
POSIX - at least not with the functions that involve locking and
such. (There was a discussion here recently where Dave Butenhof made
this plausible - and I believe him :-}.)

(Of course there are compilers that do interprocedural and
intermodular (what a word!) optimization, involving such things as not
spilling registers before calling an external function. But usually
you have to compile the calling module and the callee module in one
swoop then - you pass more than one C file on the command line or some
such. But it is not common for you to compile your module together
with the mutex locking function modules of the C runtime.)

David> For example:

David> extern void bar(void);

David> void foo(void)
David> {
David> int i;
David> i=3;
David> bar();
David> i--;
David> }

David> The compiler in this case might optimize 'i' away to nothing.
David> Fortunately, any possible way another thread could get its
David> hands on a variable is a way that a function in another
David> compilation unit could get its hands on the variable. Not only
David> is there no legal C way 'bar' could access 'i', there is no
David> legal C way another thread could.

I don't understand the connection of this example to your statement
above.

Joerg

Kenneth Chiu

unread,

Jan 14, 2001, 2:19:18 PM1/14/01

to

In article <93qjt6$mm2$1...@lure.pipex.net>,

Martin Berger <martin...@orange.net> wrote:
>
>Kaz Kylheku <k...@ashi.footprints.net>
>
>> >well, in the c/c++ users journal, Andrei Alexandrescu recommends using
>> >"volatile" to help avoiding race conditions. can the experts please slug
>it
>> out?
>>
>> The C/C++ Users Journal is a comedy of imbeciles.
>>
>> The article you are talking about completely ignores issues of memory
>> coherency on multiprocessor systems. It is very geared toward Windows;
>> the author seems to have little experience with multithreaded
>> programming, and especially cross-platform multithreading.
>
>would you care to elaborate how "volatile" causes problems with memory
>consistency
>on multiprocessors?

It's not that volatile itself causes memory problems. It's that
it's not sufficient (and if under POSIX should not even be used).

He gives an example, which will work in practice, but if he had two
shared variables, would fail. Code like this, for example, would
be incorrect on an MP with a relaxed memory model. The write to flag_
could occur before the write to data_, despite the order in which the
assignments are written.

class Gadget {
public:
void Wait() {
while (!flag_) {
Sleep(1000); // sleeps for 1000 milliseconds
}
do_some_work(data_);
}
void Wakeup() {
data_ = ...;
flag_ = true;
}
...
private:
volatile bool flag_;
volatile int data_;
};

Joerg Faschingbauer

unread,

Jan 14, 2001, 2:23:39 PM1/14/01

to

>>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

David> Joerg Faschingbauer wrote:

>> Now that's the whole point: the compiler has to take care that the
>> code it generates spills registers before calling a function.

David> This is really not so. It's entirely possible that the compiler might
David> have some way of assuring that the particular value cached in the
David> register isn't used by the called function, and hence it can keep it in
David> a register.

David> For example:

David> extern void bar(void);

David> void foo(void)
David> {
David> int i;
David> i=3;
David> bar();
David> i--;
David> }

David> The compiler in this case might optimize 'i' away to nothing.
David> Fortunately, any possible way another thread could get its hands on a
David> variable is a way that a function in another compilation unit could get
David> its hands on the variable. Not only is there no legal C way 'bar' could
David> access 'i', there is no legal C way another thread could.

David> ConsideR:

David> extern void bar(void);
David> extern void qux(int *);

David> void foo(void)
David> {
David> int i;
David> i=3;

David> while(i<10)
David> {
David> i++;
David> bar();
David> i++;
David> qux(&i);
David> }
David> }

David> For all the compiler knows, 'qux' stores the pointer passed to it and
David> 'bar' uses it. Think about:

David> int *ptr=NULL;
David> void bar(void)
David> {
David> if(ptr!=NULL) printf("i=%d\n", *ptr);
David> }

David> void qux(int *j)
David> {
David> ptr=j;
David> }

David> So the compiler would have to treat 'i' as if it was volatile in 'foo'
David> anyway.

Yes, I believe this (exporting the address of a variable) is called
taking an alias in compilerology. The consequence of this is that it
inhibits holding it in a register.

David> So most compilers don't need any special help to compile multithreaded
David> code. Non-multithreaded code can do the same things.

Joerg

Dylan Nicholson

unread,

Jan 14, 2001, 6:31:59 PM1/14/01

to

In article <slrn963vb...@ashi.FootPrints.net>,
k...@ashi.footprints.net wrote:
> On 14 Jan 2001 05:09:29 -0500, Martin Berger

<martin...@orange.net> wrote:
> >
> >Kaz Kylheku <k...@ashi.footprints.net>
> >
>

> The point is that are you going to take multithreading advice from
> someone who admittedly cannot eradicate known deadlocks from his
> code? But good points for the honesty, clearly.
>
Well I consider myself pretty well experienced in at least Win32
threads, and I'm working on a project now using POSIX threads (and a
POSIX wrapper for Win32 threads). I thought I had a perfectly sound
design that used only ONE mutex object, only ever used a stack-based
locker/unlocker to ensure it was never left locked, and yet I still got
deadlocks! The reason was simple, a) Calling LeaveCriticalSection on
an unowned critical section causes a deadlock in Win32 (this I consider
a bug, considering how trivial it is to test one member of the critical
section to avoid it), and b) I didn't realise that by default POSIX
mutexes only allowed one lock per thread (i.e. they were non-
recursive). To me these are quirks of the thread library, not design
faults in my code, so they don't necessarily indicate in lack of multi-
threaded knowledge. I don't pretend to know what deadlocks Andrei had,
but I wouldn't be surprised if it was a problem of that nature.
Although I haven't used the technique he described in his library, if I
had read it before I started coding my latest multi-threaded project, I
almost definitely would have given it a go.

Dylan

Sent via Deja.com
http://www.deja.com/

Martin Berger

unread,

Jan 14, 2001, 6:34:01 PM1/14/01

to

Kaz Kylheku wrote:

> >would you care to elaborate how "volatile" causes problems with memory
> >consistency
> >on multiprocessors?
>

> The point is that it doesn't *solve* these problems, not that it causes
> them. It's not enough to ensure that load and store instructions are
> *issued* in some order by the processor, but also that they complete in
> some order (or at least partial order) that is seen by all other
> processors. At best, volatile defeats access optimizations at the
> compiler level; in order to synchronize memory you need to do it at the
> hardware level as well, which is often done with a special ``memory
> barrier'' instruction.
>
> In other words, volatile is not enough to eliminate race conditions,
> at least not on all platforms.

either you or me don't quite understand the point of the article. the
semantics of "volatile" is irrelevant for his stuff to work. all that
matters is how c++ typechecks classes and methods annotated with
"volatile", together with the usual rules for overloading and casting
away volatile.

if we'd change c++ to include a modifier "blob" and add the ability
to cast away blobness and make "blob" behave like "volatile" w.r.t
typechecking, overloading ... than his scheme would work just the
same way when "volatile" is replace by blob. that's at least how
i understand it.

> The point is that are you going to take multithreading advice from
> someone who admittedly cannot eradicate known deadlocks from his
> code? But good points for the honesty, clearly.

concurrency is an area where i trust no one, including myself.

> >if so, how? i cannot find indications to deadlock inducing behavior of
> >"volatile"
> >in kernighan & ritchie
>

> This book says very little about volatile and contains no discussion of
> threads; this is squarely beyond the scope of K&R.

well, it should be part of the sematics of the language and hence covered.

dale

unread,

Jan 14, 2001, 10:59:01 PM1/14/01

to

David Schwartz wrote:

> Code shouldn't _have_ race conditions.

Well, that's not entirely correct. If you have a number of
threads writing logging data to a file, which is protected
by a mutex, then the order in which they write -is- subject
to race conditions. This may or may not matter however.

Dale

Michiel Salters

unread,

Jan 15, 2001, 8:45:20 AM1/15/01

to

Joerg Faschingbauer wrote:

> >>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

> David> Joerg Faschingbauer wrote:
>
> >> Now that's the whole point: the compiler has to take care that the
> >> code it generates spills registers before calling a function.

> David> This is really not so. It's entirely possible that the
> David> compiler might have some way of assuring that the particular
> David> value cached in the register isn't used by the called function,
> David> and hence it can keep it in a register.

> Of course you may once target a system where everything is tightly
> coupled, and where the compiler you use to compile your module knows
> about the very internals of the runtime - register allocations for
> example. Then it could keep variables in registers even across runtime
> function calls.

> Even though such a thing is possible, it is quite unlikely - consider
> the management effort of the people making (and upgrading!) such a
> system.

I don't think things are that hard - especially in C++, which already
has name mangling. For each translation unit, it is possible to determine
which functions use which registers and what functions are called outside
that translation unit.
Now encode in the name mangling of a function which registers are used,
except of course those functions imported from another translation unit.
Just add to the name something like Reg=EAX_EBX_ECX. The full set of
registers used by a function is the set of registers it uses, plus
the registers used by function it calls. This will even work for
mutually recursive functions across translation units.

With this information the linker can, for each function call, determine
which registers need to be saved. And that is closely related to the
linkers main task: creating correct function calls across translation
units.

--
Michiel Salters
Michiel...@cmg.nl
sal...@lucent.com

James Kanze

unread,

Jan 15, 2001, 8:45:01 AM1/15/01

to

Martin Berger wrote:

> Dave Butenhof wrote:

> > Don't ever use the C/C++ language volatile in threaded code. It'll
> > kill your performance, and the language definition has nothing to
> > do with what you want when writing threaded code that shares
> > data. If some OS implementation tells you that you need to use it
> > anyway on their system, (in the words of a child safety group),
> > "run, yell, and tell". That's just stupid, and they shouldn't be
> > allowed to get away with it.

This is correct up to a point. The problem is that the C++ language
has no other way of signaling that a variable may be accessed by
several threads (and thus ensuring e.g. that it is really written
before the lock is released). The problem *isn't* with the OS; it is
with code movement within the optimizer of the compiler. And while I
agree with the sentiment: volatile isn't the solution, I don't know
how many compilers offer another one. (Of course, some compilers
don't optimize enough for there to be a problem:-).)

> well, in the c/c++ users journal, Andrei Alexandrescu recommends
> using "volatile" to help avoiding race conditions. can the experts
> please slug it out? (note the cross posting)

The crux of Andrei's suggestions really just exploits the compiler
type-checking with regards to volatile, and not the actual semantics
of volatile. If I've understood the suggestion correctly, it would
even be possible to implement it without ever accessing the individual
class members as if they were volatile (although in his examples, I
think he is also counting on volatile to inhibit code movement).

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

Andrei Alexandrescu

unread,

Jan 15, 2001, 8:52:18 AM1/15/01

to

"Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message
news:93sptr$o5s$1...@flotsam.uits.indiana.edu...

> He gives an example, which will work in practice, but if he had two
> shared variables, would fail. Code like this, for example, would
> be incorrect on an MP with a relaxed memory model. The write to flag_
> could occur before the write to data_, despite the order in which the
> assignments are written.
>
> class Gadget {
> public:
> void Wait() {
> while (!flag_) {
> Sleep(1000); // sleeps for 1000 milliseconds
> }
> do_some_work(data_);
> }
> void Wakeup() {
> data_ = ...;
> flag_ = true;
> }
> ...
> private:
> volatile bool flag_;
> volatile int data_;
> };

Your statement is true. However, that is only an _introduction_ to the
meaning of volatile in multithreaded code. I guess I should have thought of
a more elaborate example that would work on any machine. Anyway, the focus
of the article is different. It's about using the type system to detect race
conditions.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 8:52:56 AM1/15/01

to

"James Dennett" <jden...@acm.org> wrote in message
news:3A611BF5...@acm.org...

> True, and Andrei's claim (which seems reasonable to me, though I've not
> verified it in depth) is that his techniques, if used consistently,
> will detect all race conditions *at compile time*.

By the way, I maintain that.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 8:51:41 AM1/15/01

to

"Martin Berger" <martinb@--remove--me--dcs.qmw.ac.uk> wrote in message
news:3A620AB8.835DD808@--remove--me--dcs.qmw.ac.uk...

> Kaz Kylheku wrote:
> > The point is that it doesn't *solve* these problems, not that it causes
> > them. It's not enough to ensure that load and store instructions are
> > *issued* in some order by the processor, but also that they complete in
> > some order (or at least partial order) that is seen by all other
> > processors. At best, volatile defeats access optimizations at the
> > compiler level; in order to synchronize memory you need to do it at the
> > hardware level as well, which is often done with a special ``memory
> > barrier'' instruction.
> >
> > In other words, volatile is not enough to eliminate race conditions,
> > at least not on all platforms.
>
> either you or me don't quite understand the point of the article.

Or maybe me :o).

> the
> semantics of "volatile" is irrelevant for his stuff to work. all that
> matters is how c++ typechecks classes and methods annotated with
> "volatile", together with the usual rules for overloading and casting
> away volatile.
>
> if we'd change c++ to include a modifier "blob" and add the ability
> to cast away blobness and make "blob" behave like "volatile" w.r.t
> typechecking, overloading ... than his scheme would work just the
> same way when "volatile" is replace by blob. that's at least how
> i understand it.

This is exactly what the point of the article was.

> > The point is that are you going to take multithreading advice from
> > someone who admittedly cannot eradicate known deadlocks from his
> > code? But good points for the honesty, clearly.

To Mr. Kylheku: There is a misunderstanding here, and a rather gross one. I
wonder what's the text that made you believe I *couldn't* erradicate known
deadlocks. I simply sad that all threading-related runtime errors of our
program were only deadlocks and not race conditions, which precisely proves
the point that the article tried to make. Of course we fixed the deadlocks.
The point is that the compiler fixed the race conditions.

It's clear that you have a great deal of experience in multithreading code
on many platform, and I would be glad to expand my knowledge in the area. If
you would be willing to discuss in more civil terms, I would be glad to.
Also, if you would like to expand the discussion *beyond* the Gadget example
in the opening section of the article, and point possible reasoning errors
that I might have done, that would help the C++ community define the
"volatile correctness" term with precision. For now, I maintain the
conjectures I made.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 8:52:00 AM1/15/01

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:3A5FCDC3...@webmaster.com...

> What race conditions? If you have a race condition, you need to _FIX_
> it. Something that might help "avoid" it isn't good enough. If I told my
> customers I added some code that helped avoid race conditions, they'd
> shoot me. Code shouldn't _have_ race conditions.

There is a misunderstanding here - actually, quite a few in this and the
following posts.

Of course code must not _have_ race conditions. So that's why you must
eliminate them, which is what I meant by "avoid". Maybe I didn't use a word
that's strong enough.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 8:52:38 AM1/15/01

to

"Martin Berger" <martin...@orange.net> wrote in message
news:93qjt6$mm2$1...@lure.pipex.net...
>
> Kaz Kylheku <k...@ashi.footprints.net>
[snip]

> > The C/C++ Users Journal is a comedy of imbeciles.

Yay, the original message was moderated out.

> > The article you are talking about completely ignores issues of memory
> > coherency on multiprocessor systems. It is very geared toward Windows;
> > the author seems to have little experience with multithreaded
> > programming, and especially cross-platform multithreading.

I'm afraid Mr. Kylheku completely ignores the gist of the article. I know
only Windows, Posix and ACE threads, but that's beyond the point - what the
article tries to say is different. The article uses the volatile modifier as
a device for helping the type system detect race conditions at compile time.

> > Luckily, he
> > provides a disclaimer by openly admitting that some code that he wrote
> > suffers from occasional deadlocks.

This is a misunderstanding. What I said is that the technique described
can't help with deadlocks. In the end of the article I mentioned some
concrete experience with the technique. Indeed there were deadlocks - _only_
deadlocks - in the multithreaded code, simply because all race conditions
were weeded out by the compiler.

> are you suggesting that these deadlock are a consequence of using
> "volatile"?
> if so, how? i cannot find indications to deadlock inducing behavior of
> "volatile"
> in kernighan & ritchie

I guess this is yet another misunderstanding :o).

Andrei

James Kanze

unread,

Jan 15, 2001, 9:35:31 AM1/15/01

to

James Moe wrote:

> Martin Berger wrote:

> > well, in the c/c++ users journal, Andrei Alexandrescu recommends
> > using "volatile" to help avoiding race conditions. can the experts
> > please slug it out? (note the cross posting)

> Use mutexes or semaphores to control access to common data
> areas. It is what they are for. "volatile" is meant for things like
> hardware access where a device register can change at any time.

You still need some way of preventing the optimizer from deferring
writes until after the lock has been released. Ideally, the compiler
will understand the locking system (mutex, or whatever), and generate
the necessary write guards itself. Off hand, I don't know of any
compiler which meets this ideal.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:36:16 AM1/15/01

to

Joerg Faschingbauer wrote:

> int i;

I've used more than one compiler that does this. In fact, most do, at
least with optimization turned on.

> If that was the case the usage of the mutex would be completely
> pointless. At the time the function returns (a long time after the
> mutex was unlocked) the value of i is written back to memory. It
> then overwrites the changes that another thread may have made to the
> value in the meantime.

You guessed it.

> This is where volatile comes in. Common understanding is that
> volatile disables any optimization on a variable, so if you declare
> i volatile, the compiler won't keep it in a register, and all is
> well (if that was the definition of volatile - my understanding is
> that volatile is just a hint to the compiler, so nothing is well if
> you put it legally). Except that this misperforms - imagine you
> wouldn't have a simple increment in the critical section, but
> instead lots more of usage of i.

Volatile is more than just a hint, but it does have a lot of
implementation defined aspects. The *intent* (according to the C
standard, to which the C++ standard refers) is roughly what you
describe.

> Now what does a compiler do? It compiles modules independently. In
> doing so it performs optimizations (fortunately). Inside the module
> that is being compiled the compiler is free to assign variables to
> registers as it likes, and every function has a certain register set
> that it uses.

There is no requirement that a compiler compile modules independantly;
at least one major compiler has a final, post-link optimization phase
in which the optimizer looks beyond the module limits.

> The compiler also generates code for calls to functions in other
> modules that it has no idea of. Having no idea of a module, among
> other things means that the compiler does not know the register set
> that particular function uses.

What set a function can use without restoring is usually defined by
the calling conventions. The compiler not only can know it, it must
know it.

> Especially, the compiler cannot know for sure that the callee's
> register set doesn't overlap with the caller's - in which case the
> caller would see useless garbage in the registers on return of the
> callee.

This depends entirely on the compiler. And the hardware -- on a
Sparc, there are four banks of registers, two of which are
systematically saved and restored by the hardware. So each function
basically has 16 registers in which it can do anything it wishes.

> Now that's the whole point: the compiler has to take care that the
> code it generates spills registers before calling a function.

Not necessarily.

> So, provided that unlock() is a function and not a macro, there is
> no need to declare i volatile.

Not necessarily. It all depends on the compiler.

If the variable is global, and the compiler cannot analyse the unlock
function, it will have to assume that unlock may access the variable,
and so must ensure that the value is up to date. In practice, this IS
generally sufficient -- at some level, unlock resolves to a system
call, and the compiler certainly has no access to the source code of
the system call. So either 1) the compiler makes no assumption about
the system call, must assume that it might access the variable, and so
ensures the correct value, or 2) the compiler knows about system
requests, and which ones can access global variables. In the latter
case, of course, the compiler *should* also know that it needs a write
barrier after unlock. But unless this is actually documented in the
compiler documentation, I'd be leary about counting on it.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:34:46 AM1/15/01

to

Martin Berger wrote:

> David Schwartz wrote:

> > > martin

> http://www.cuj.com/experts/1902/alexandr.html

That's not totally true, at least not in his examples. I think he
also counts on volatile to some degree to inhibit code movement;
i.e. to prevent the compiler from moving some of the writes to after
the lock has been freed.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:36:58 AM1/15/01

to

Joerg Faschingbauer wrote:

> >>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

> David> Joerg Faschingbauer wrote:

> >> Now that's the whole point: the compiler has to take care that
> >> the code it generates spills registers before calling a function.

> David> This is really not so. It's entirely possible that the
> David> compiler might have some way of assuring that the particular
> David> value cached in the register isn't used by the called function,
> David> and hence it can keep it in a register.

> Of course you may once target a system where everything is tightly
> coupled, and where the compiler you use to compile your module knows
> about the very internals of the runtime - register allocations for
> example. Then it could keep variables in registers even across
> runtime function calls.

The compiler always knows about the internals of the runtime register
allocations, since it is the compiler which defines them (at least
partially).

> Even though such a thing is possible, it is quite unlikely -
> consider the management effort of the people making (and upgrading!)
> such a system. And even if people dared doing such a beast, this
> wouldn't be POSIX - at least not with the functions that involve
> locking and such. (There was a discussion here recently where Dave
> Butenhof made this plausible - and I believe him :-}.)

I'm not sure I understand your point. It sounds like you are saying
that it is possible for the compiler not to know which registers it
can use, which is manifestly ridiculous.

> (Of course there are compilers that do interprocedural and
> intermodular (what a word!) optimization, involving such things as
> not spilling registers before calling an external function. But
> usually you have to compile the calling module and the callee module
> in one swoop then - you pass more than one C file on the command
> line or some such. But it is not common for you to compile your
> module together with the mutex locking function modules of the C
> runtime.)

Usually (well, in the once case I actually know of:-)), the compiler
generates extra information in the object file, which is used by the
linker.

About all you can hope for is that a compiler this intelligent also
knows about threads, and can recognize a mutex request when it sees
one.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:40:18 AM1/15/01

to

Joerg Faschingbauer wrote:

[...]

> Yes, I believe this (exporting the address of a variable) is called
> taking an alias in compilerology. The consequence of this is that it
> inhibits holding it in a register.

Correct. The entire issue is called the aliasing problem, and it
makes good optimization extremely difficult. Note well: extremely
difficult, not impossible. In recent years, a few compilers have
gotten good enough to track uses of the variable through aliases and
accross module boundaries. And eventually keep aliased variables in a
register when it will improve performance.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:39:16 AM1/15/01

to

Kaz Kylheku wrote:

> On 13 Jan 2001 23:01:04 -0500, Joerg Faschingbauer
> <jfa...@jfasch.faschingbauer.com> wrote:
> >So, provided that unlock() is a function and not a macro, there is no
> >need to declare i volatile.

> Even if a compiler implements sophisticated global optimizations
> that cross module boundaries, the compiler can still be aware of
> synchronization functions and do the right thing around calls to
> those functions.

It can be. It should be. Is it? Do todays compilers actually do the
right thing, or are we just lucking out because most of them don't
optimize very aggressively anyway?

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:46:20 AM1/15/01

to

Martin Berger wrote:

> Kaz Kylheku <k...@ashi.footprints.net>

> > >well, in the c/c++ users journal, Andrei Alexandrescu recommends
> > >using "volatile" to help avoiding race conditions. can the
> > >experts please slug it > > out?

> > The C/C++ Users Journal is a comedy of imbeciles.

And the moderator's let this through?

> > The article you are talking about completely ignores issues of
> > memory coherency on multiprocessor systems. It is very geared
> > toward Windows; the author seems to have little experience with
> > multithreaded programming, and especially cross-platform
> > multithreading.

> would you care to elaborate how "volatile" causes problems with
> memory consistency on multiprocessors?

Volatile doesn't cause problems of memory consistency. It's not
guaranteed to solve them, either.

Andrei's article didn't address the problem. Not because Andrei
didn't know the solution. (He may, or he may not. I don't know.)
But because that wasn't the subject of the article.

It might be worth pointing out the exact subject, since the poster you
are responding to obviously missed the point. Andrei basically
"overloads" the keyword volatile in a way that allows the compiler to
verify whether we will use locked access or not when accessing an
object. It offers an additional tool to simplify the writing (and the
verification) of multi-threaded code.

The article does NOT address the question of when locks are needed and
when the aren't. The article doesn't address the question of what is
actually needed when locks are needed, e.g. to ensure memory
coherency. These are other issues, and would require another article
(or maybe even an entire book). About the only real criticism I would
make about the article is that it isn't clear enough that he is
glossing over major issues, because they aren't relevant to that
particular article.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

Martin Berger

unread,

Jan 15, 2001, 11:06:17 AM1/15/01

to

Andrei Alexandrescu wrote:

> > if we'd change c++ to include a modifier "blob" and add the ability
> > to cast away blobness and make "blob" behave like "volatile" w.r.t
> > typechecking, overloading ... than his scheme would work just the
> > same way when "volatile" is replace by blob. that's at least how
> > i understand it.
>
> This is exactly what the point of the article was.

this makes me think that c++ *should* be expanded to include something
like "blob" as a modifier. or maybe even user defined modifiers.

the problem with modifiers like "sharded" and the like is that compilers
cannot effectively guarantee the absence of race conditions, as would
suggested by using a name like "shared". with "blob" on the other
hand, all the compiler guarantees is that "blobness" is preserved
which is basically an easy typechecking problem and it is up to the
programmer to use this feature in whatever way she thinks appropriate
(eg for the prevention of race conditions). i also think that user defined
modifier semantics has uses beyond preventing race conditions. how about it?

martin

Martin Berger

unread,

Jan 15, 2001, 11:05:59 AM1/15/01

to

Andrei Alexandrescu wrote:

> To Mr. Kylheku: [...]

> Also, if you would like to expand the discussion *beyond* the Gadget example
> in the opening section of the article, and point possible reasoning errors
> that I might have done, that would help the C++ community define the
> "volatile correctness" term with precision. For now, I maintain the
> conjectures I made.

that would be a worthwhile contribution.

martin

Charles Bryant

unread,

Jan 15, 2001, 11:10:21 AM1/15/01

to

In article <93t924$2vi$1...@nnrp1.deja.com>,

Dylan Nicholson <dn...@my-deja.com> wrote:
>In article <slrn963vb...@ashi.FootPrints.net>,
> k...@ashi.footprints.net wrote:
>>
>> The point is that are you going to take multithreading advice from
>> someone who admittedly cannot eradicate known deadlocks from his
>> code? But good points for the honesty, clearly.
>>
>Well I consider myself pretty well experienced in at least Win32
>threads, and I'm working on a project now using POSIX threads (and a
>POSIX wrapper for Win32 threads). I thought I had a perfectly sound
>design that used only ONE mutex object, only ever used a stack-based
>locker/unlocker to ensure it was never left locked, and yet I still got
>deadlocks! The reason was simple, a) Calling LeaveCriticalSection on
>an unowned critical section causes a deadlock in Win32 (this I consider
>a bug, considering how trivial it is to test one member of the critical
>section to avoid it)

You have a fundamental misunderstanding of the nature of programming.
Programming does not require speculation aboud how something might be
implemented and certainly does not involve writing one's own code
such that it depends on that speculation. Programming involves
determining the guaranteed and documented behaviour of the components
that will be needed and then relying solely on that documented
behaviour.

Calling LeaveCriticalSection() without entering the critical section would
be just as much a bug even if causing it to fail required a huge
amount of very slow code in the library which implements
LeaveCritcalSection. The *only* thing relevant to whether it's a bug
or not is whether the documentation permits it or not.

--
Eppur si muove

Balog Pal

unread,

Jan 15, 2001, 11:14:56 AM1/15/01

to

"Dylan Nicholson" <dn...@my-deja.com> wrote

> deadlocks! The reason was simple, a) Calling LeaveCriticalSection on
> an unowned critical section causes a deadlock in Win32 (this I consider
> a bug,

I consider doing illegal stuff a programming error. For critical sections
I'd go somewhat further, as those you shall wrap into classes anyway, and
use lock guards like CSingleLock. Then doing something odd is pretty hard.
But if you manage it must be a logic error in the program. And a sign for
looking out for other errors too.

> considering how trivial it is to test one member of the critical
> section to avoid it),

That is IMHO irrevelant.

> and b) I didn't realise that by default POSIX
> mutexes only allowed one lock per thread (i.e. they were non-
> recursive).

Yep, they are. But you can implement your wn recursice mutex (workinng like
the WIN32 criticalsection.)

> To me these are quirks of the thread library, not design
> faults in my code,

Your way of looking is somewhat nonstandard. ;-)

> so they don't necessarily indicate in lack of multi-
> threaded knowledge.

Maybe they indicate ignorance. A system API works as it is described, not
along your thoughts, or along your expectations. The user code is the thing
you must write to use tha API as it serves not the other way around. (You
can claim yourself innocent if the dox is in error, like it specifies posix
mutex being recursie then it turn out being fast. But that is not the case.)

Paul

John Mullins

unread,

Jan 15, 2001, 11:15:49 AM1/15/01

to

"James Kanze" <James...@dresdner-bank.com> wrote in message
news:3A62CF96...@dresdner-bank.com...

> That's not totally true, at least not in his examples. I think he
> also counts on volatile to some degree to inhibit code movement;
> i.e. to prevent the compiler from moving some of the writes to after
> the lock has been freed.

But his examples also rely on undefined behaviour so he can't really
count on anything.

JM

Kenneth Chiu

unread,

Jan 15, 2001, 11:16:27 AM1/15/01

to

In article <3A62CF03...@dresdner-bank.com>,

James Kanze <James...@dresdner-bank.com> wrote:
>Martin Berger wrote:
>
>> Dave Butenhof wrote:
>
>> > Don't ever use the C/C++ language volatile in threaded code. It'll
>> > kill your performance, and the language definition has nothing to
>> > do with what you want when writing threaded code that shares
>> > data. If some OS implementation tells you that you need to use it
>> > anyway on their system, (in the words of a child safety group),
>> > "run, yell, and tell". That's just stupid, and they shouldn't be
>> > allowed to get away with it.
>
>This is correct up to a point. The problem is that the C++ language
>has no other way of signaling that a variable may be accessed by
>several threads (and thus ensuring e.g. that it is really written
>before the lock is released). The problem *isn't* with the OS; it is
>with code movement within the optimizer of the compiler.

At this point it isn't really a C++ language issue anymore. However,
if a vendor claims that their compiler is compatible with POSIX
threads, then it is up to them to insure that memory is written before
the unlock.

Kenneth Chiu

unread,

Jan 15, 2001, 12:21:02 PM1/15/01

to

In article <93u77g$bqakt$3...@ID-14036.news.dfncis.de>,

Andrei Alexandrescu <andre...@hotmail.com> wrote:
>"Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message
>news:93sptr$o5s$1...@flotsam.uits.indiana.edu...
>> He gives an example, which will work in practice, but if he had two
>> shared variables, would fail. Code like this, for example, would
>> be incorrect on an MP with a relaxed memory model. The write to flag_
>> could occur before the write to data_, despite the order in which the
>> assignments are written.
>>

>> ...

>
>Your statement is true. However, that is only an _introduction_ to the
>meaning of volatile in multithreaded code. I guess I should have thought of
>a more elaborate example that would work on any machine. Anyway, the focus
>of the article is different. It's about using the type system to detect race
>conditions.

Yes, I have no quibble with the rest of the article, and in fact thought
it was an interesting idea.

However, I find some of the statements in the introduction to be overly
general.

Basically, without volatile, either writing multithreaded programs
becomes impossible, or the compiler wastes vast optimization
opportunities.

This may be true for some thread standards, but if the vendor claims
that they support POSIX threads with their C++ compiler, then shared
variables should not be declared volatile when using POSIX threads.

Andrei Alexandrescu

unread,

Jan 15, 2001, 1:20:38 PM1/15/01

to

"John Mullins" <John.M...@crossprod.co.uk> wrote in message
news:93v2gj$4fs$1...@newsreaderg1.core.theplanet.net...

>
> "James Kanze" <James...@dresdner-bank.com> wrote in message
> news:3A62CF96...@dresdner-bank.com...
>
> > That's not totally true, at least not in his examples. I think he
> > also counts on volatile to some degree to inhibit code movement;
> > i.e. to prevent the compiler from moving some of the writes to after
> > the lock has been freed.
>
> But his examples also rely on undefined behaviour so he can't really
> count on anything.

Are you referring to const_cast? Strictly speaking, indeed. But then, all MT
programming in C/C++ has undefined behavior.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 2:31:08 PM1/15/01

to

"James Kanze" <James...@dresdner-bank.com> wrote in message

news:3A62CF03...@dresdner-bank.com...

> The crux of Andrei's suggestions really just exploits the compiler
> type-checking with regards to volatile, and not the actual semantics
> of volatile. If I've understood the suggestion correctly, it would
> even be possible to implement it without ever accessing the individual
> class members as if they were volatile (although in his examples, I
> think he is also counting on volatile to inhibit code movement).

Thanks James for all your considerations before and after the article
appeared.

There is a point about the use of volatile proposed by the article. If you
write volatile correct code as prescribed by the article, you _never_
*never* NEVER use volatile variables. You _always_ *always* ALWAYS lock a
synchronization object, cast the volatile away, operate on the so-obtained
non-volatile alias, let the alias go, and unlock the synchronization object,
in this order.

Maybe I should have made it clearer that in volatile-correct code, you never
operate on volatile data - you always cast volatile away and more
specifically, you cast it away when it is *semantically correct* to do so
because you locked the afferent synchronization object.

I would be glad if someone explained me in what situations a compiler can
rearrange instructions to the extent that it would invalidate the idiom that
the article proposes. OTOH, such compilers invalidate a number of idioms
anyway, such as the Double-Check Locking Pattern, used by Doug Schmidt in
ACE.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 2:37:05 PM1/15/01

to

"Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message

news:93v8d2$3aa$1...@flotsam.uits.indiana.edu...

> However, I find some of the statements in the introduction to be overly
> general.
>
> Basically, without volatile, either writing multithreaded programs
> becomes impossible, or the compiler wastes vast optimization
> opportunities.
>
> This may be true for some thread standards, but if the vendor claims
> that they support POSIX threads with their C++ compiler, then shared
> variables should not be declared volatile when using POSIX threads.

I understand your point and agree with it. That statement of mine was a
mistake.

Andrei

Kaz Kylheku

unread,

Jan 15, 2001, 2:39:50 PM1/15/01

to

On 14 Jan 2001 22:59:01 -0500, dale <da...@cs.rmit.edu.au> wrote:
>David Schwartz wrote:
>
>> Code shouldn't _have_ race conditions.
>
>Well, that's not entirely correct. If you have a number of
>threads writing logging data to a file, which is protected
>by a mutex, then the order in which they write -is- subject
>to race conditions. This may or may not matter however.

If it doesn't matter, it's hardly a race condition! A race condition
occurs when the program fails to compute one of the possible correct
results due to a fluctuation in the execution order.

Konrad Schwarz

unread,

Jan 15, 2001, 4:15:11 PM1/15/01

to

James Kanze wrote:
>
> Martin Berger wrote:
>
> > Dave Butenhof wrote:
>
> > > Don't ever use the C/C++ language volatile in threaded code. It'll
> > > kill your performance, and the language definition has nothing to
> > > do with what you want when writing threaded code that shares
> > > data. If some OS implementation tells you that you need to use it
> > > anyway on their system, (in the words of a child safety group),
> > > "run, yell, and tell". That's just stupid, and they shouldn't be
> > > allowed to get away with it.
>
> This is correct up to a point. The problem is that the C++ language
> has no other way of signaling that a variable may be accessed by
> several threads (and thus ensuring e.g. that it is really written
> before the lock is released). The problem *isn't* with the OS; it is
> with code movement within the optimizer of the compiler. And while I
> agree with the sentiment: volatile isn't the solution, I don't know
> how many compilers offer another one. (Of course, some compilers
> don't optimize enough for there to be a problem:-).)

So the optimization of keeping variables in registers accross
function calls is illegal in general (and thus must not be performed),
if the compiler cannot prove that the code will not be linked into a
multi-threaded program or it cannot prove that those variables will
never be shared.

However, the C language has a way of signaling that local variables
cannot be accessed by other threads, namely by placing them in
the register storage class. I don't know about C++; if I remember
correctly, C++ degrades register to a mere "efficiency hint".

Konrad Schwarz

unread,

Jan 15, 2001, 4:16:41 PM1/15/01

to

James Kanze wrote:
>
> Kaz Kylheku wrote:
>
> > On 13 Jan 2001 23:01:04 -0500, Joerg Faschingbauer
> > <jfa...@jfasch.faschingbauer.com> wrote:
> > >So, provided that unlock() is a function and not a macro, there is no
> > >need to declare i volatile.
>
> > Even if a compiler implements sophisticated global optimizations
> > that cross module boundaries, the compiler can still be aware of
> > synchronization functions and do the right thing around calls to
> > those functions.
>
> It can be. It should be. Is it? Do todays compilers actually do the
> right thing, or are we just lucking out because most of them don't
> optimize very aggressively anyway?
>

If the compiler supports multi-threading (at least POSIX
multi-threading),
then it *must*, since POSIX does not require shared variables to
be volatile qualified. If the compiler decides to
keep values in registers across function calls, it must be able to prove
that
* either these variables are never shared by another thread
* or the functions in question never perform inter-thread operations

Kaz Kylheku

unread,

Jan 15, 2001, 4:17:44 PM1/15/01

to

On 15 Jan 2001 08:52:18 -0500, Andrei Alexandrescu

<andre...@hotmail.com> wrote:
>"Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message
>news:93sptr$o5s$1...@flotsam.uits.indiana.edu...
>> He gives an example, which will work in practice, but if he had two
>> shared variables, would fail. Code like this, for example, would
>> be incorrect on an MP with a relaxed memory model. The write to flag_
>> could occur before the write to data_, despite the order in which the
>> assignments are written.
>>
>> class Gadget {
>> public:
>> void Wait() {
>> while (!flag_) {
>> Sleep(1000); // sleeps for 1000 milliseconds
>> }
>> do_some_work(data_);
>> }
>> void Wakeup() {
>> data_ = ...;
>> flag_ = true;
>> }
>> ...
>> private:
>> volatile bool flag_;
>> volatile int data_;
>> };
>
>Your statement is true. However, that is only an _introduction_ to the
>meaning of volatile in multithreaded code. I guess I should have thought of
>a more elaborate example that would work on any machine.

You cannot come up with such an example without resorting to
platform-and compiler specific techniques, such as inline assembly language
to insert memory barrier instructions.

In the above example, if one thread writes to data_ and then sets flag_
there is absolutely no assurance that another thread running on another
processor will see these updates in the same order. It is possible for
flag_ to appear to flip true, but data_ to not have been updated yet!

Moreover, there is no assurance that data_ is updated atomically, so
that a processor can either see its old value or its new value, never
any half-baked value in between.

Resolving these issues can't be done in standard C++, so there is no
one examples that can fit all C++ platforms. This makes sense, since
threads are not currently part of the C++ language. (What I don't
understand is why the moderator of comp.lang.c++.moderated is even
allowing this discussion, which clearly belongs in
comp.programming.threads only).

>Anyway, the focus
>of the article is different. It's about using the type system to detect race
>conditions.

This thread was started in comp.programming.threads by Lie-Quan Lee
<ll...@lsc.nd.edu> who was specifically interested in knowing whether
rules similar to the POSIX memory visibility rules apply to other
multithreading platforms.

-> One question is, whether those memory visibility rules are applicable
-> for other thread system such as Solaris UI threads, or win32 threads,
-> or JAVA threads ...? If yes, we can follow the same spirit. Otherwise,
-> it will be a big difference. (For example, all shared variables might
-> have to be difined as volatile even with mutex protection.)

Dave Butenhof then replied:

-> Don't ever use the C/C++ language volatile in threaded code. It'll
-> kill your performance, and the language definition has nothing to do
-> with what you want when writing threaded code that shares data. If
-> some OS implementation tells you that you need to use it anyway on
-> their system, (in the words of a child safety group), "run, yell, and
-> tell". That's just stupid, and they shouldn't be allowed to get away
-> with it.

To which Martin Berger replied (and added, for some strange reason,
comp.lang.c++.moderated to the Newsgroups: header). This is the first
time the CUJ article was mentioned, clearly in the context of a
comp.programming.threads debate about memory visibility rules,
not in the context of a debate about C++ or qualifier-correctness:

-> well, in the c/c++ users journal, Andrei Alexandrescu recommends using
-> "volatile" to help avoiding race conditions. can the experts please
-> slug it out?(note the cross posting)

So it appears that, article does create some confusion at least in the
minds of some readers between volatile used as a request for special
access semantics, and volatile used as a constraint-checking access
control for class member function calls.

INcidentally, I believe that the second property can be exploited
without dragging in the semantics of volatile. Simply do something like
this:

#ifdef RACE_CHECK
#define VOLATILE volatile
#else
#define VOLATILE
#endif

When producing production object code, do not define RACE_CHECK; define
it only when you want to create extra semantic checks for the compiler
to diagnose. Making up a name other than ``VOLATILE'' might be useful
to clarify that what is being done has nothing to do with defeating
optimization.

Joerg Faschingbauer

unread,

Jan 15, 2001, 4:19:30 PM1/15/01

to

Duh! What compiler and what language are you talking about?

>>>>> "James" == James Kanze <James...@dresdner-bank.com> writes:

James> Joerg Faschingbauer wrote:
>> >>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

David> Joerg Faschingbauer wrote:

>> >> Now that's the whole point: the compiler has to take care that
>> >> the code it generates spills registers before calling a function.

David> This is really not so. It's entirely possible that the
David> compiler might have some way of assuring that the particular
David> value cached in the register isn't used by the called function,
David> and hence it can keep it in a register.

>> Of course you may once target a system where everything is tightly
>> coupled, and where the compiler you use to compile your module knows
>> about the very internals of the runtime - register allocations for
>> example. Then it could keep variables in registers even across
>> runtime function calls.

James> The compiler always knows about the internals of the runtime register
James> allocations, since it is the compiler which defines them (at least
James> partially).

>> Even though such a thing is possible, it is quite unlikely -
>> consider the management effort of the people making (and upgrading!)
>> such a system. And even if people dared doing such a beast, this
>> wouldn't be POSIX - at least not with the functions that involve
>> locking and such. (There was a discussion here recently where Dave
>> Butenhof made this plausible - and I believe him :-}.)

James> I'm not sure I understand your point. It sounds like you are saying
James> that it is possible for the compiler not to know which registers it
James> can use, which is manifestly ridiculous.

>> (Of course there are compilers that do interprocedural and
>> intermodular (what a word!) optimization, involving such things as
>> not spilling registers before calling an external function. But
>> usually you have to compile the calling module and the callee module
>> in one swoop then - you pass more than one C file on the command
>> line or some such. But it is not common for you to compile your
>> module together with the mutex locking function modules of the C
>> runtime.)

James> Usually (well, in the once case I actually know of:-)), the compiler
James> generates extra information in the object file, which is used by the
James> linker.

James> About all you can hope for is that a compiler this intelligent also
James> knows about threads, and can recognize a mutex request when it sees
James> one.

David Schwartz

unread,

Jan 15, 2001, 4:21:17 PM1/15/01

to

dale wrote:
>
> David Schwartz wrote:
>
> > Code shouldn't _have_ race conditions.
>
> Well, that's not entirely correct. If you have a number of
> threads writing logging data to a file, which is protected
> by a mutex, then the order in which they write -is- subject
> to race conditions. This may or may not matter however.

If all of the possible outputs are valid, it's not a race condition.
The definition of a "race condition" is a programming construct where
the resultant output can be valid or invalid based upon the vagaries of
system timing.

DS

Tom Payne

unread,

Jan 15, 2001, 4:21:36 PM1/15/01

to

In comp.lang.c++.moderated James Dennett <jden...@acm.org> wrote:
[...]
: Andrei's claim (which seems reasonable to me, though I've not
: verified it in depth) is that his techniques, if used consistently,
: will detect all race conditions *at compile time*.

His technique seems a good way to guarantee atomicity of certain
operations, but AFIK it doesn't detect or prevent all situation where
the outcome of multiple operations on a thread-shared object depends
on how those threads are scheduled.

class Int {
int i;
public
Int() i(0) {}
double() { i = 2*i; }
incr() { i = i+1; }
}

Apply Andrei's technique to Int and then create a static Int k and two
threads:
- thread1 increments k and then exits
- thread2 doubles k and then exits.
Visibly, the final value of k.i is going to depend on the scheduling of
these two threads.

AFIK, detecting race conditions is equivalent to the halting problem.

Tom Payne

David Schwartz

unread,

Jan 15, 2001, 4:22:12 PM1/15/01

to

James Kanze wrote:

> You still need some way of preventing the optimizer from deferring
> writes until after the lock has been released. Ideally, the compiler
> will understand the locking system (mutex, or whatever), and generate
> the necessary write guards itself. Off hand, I don't know of any
> compiler which meets this ideal.

Give an example of what you think the problem is. The typical solution
is to give the compiler no information at all about the locking system.
Since the compiler then must assume the locking system could do
anything, it can't optimize anything across it.

It's not clear to me what you mean by "deferring writes". This could
either refer to variables being cached in registers and not written back
or it couuld refer to a hardware write cache not being flushed.
Fortunately, neither is a problem. Variables can't be cached in
registers because the compiler doesn't know what the lock/unlock
functions do, and so must assume they might access those variables from
their memory locations. Hardware write caches aren't a problem, because
the lock/unlock functions contain the appropriate memory barrier. The
compiler doesn't know this, but the compiler has nothing to do with such
hardware write reordering and so doesn't need to.

DS

Kaz Kylheku

unread,

Jan 15, 2001, 4:27:31 PM1/15/01

to

On 15 Jan 2001 13:20:38 -0500, Andrei Alexandrescu

<andre...@hotmail.com> wrote:
>"John Mullins" <John.M...@crossprod.co.uk> wrote in message

>> > the lock has been freed.
>>
>> But his examples also rely on undefined behaviour so he can't really
>> count on anything.
>
>Are you referring to const_cast? Strictly speaking, indeed. But then, all MT
>programming in C/C++ has undefined behavior.

However, some MT programming has another standard to serve as a safety
net. For example, correct POSIX MT programming is well-defined within
the realm of POSIX threads, even though it's not well-defined C++.
>From a C++ language point of view, the behavior is undefined; however,
the program correctly uses a documented extension.

When you say undefined behavior, there is some implicit interface
standard that is intended, be it ANSI/ISO C++, POSIX or what have you.
In comp.programming.threads, undefined has a necessarily weaker
meaning; obviously some multithreaded programs are deemed to be well
defined with respect to some interface.

It's not clear what class of undefined behavior John was referring to
here.

Ron Natalie

unread,

Jan 15, 2001, 5:21:35 PM1/15/01

to

> However, the C language has a way of signaling that local variables
> cannot be accessed by other threads, namely by placing them in
> the register storage class.

Huh? How is that? The C language doesn't contain the word thread
anywhere. Register is auto + a hint to keep in a register.

> I don't know about C++; if I remember
> correctly, C++ degrades register to a mere "efficiency hint".

The ONLY difference between C and C++ is that C++ allows you to
take the address of something with a register storage class (noting
that doing so may force it out of a register), while C prohibits
the & operator on register-declared objects even if they weren't
actually put in a register by the compiler.

Kaz Kylheku

unread,

Jan 15, 2001, 6:33:49 PM1/15/01

to

On 15 Jan 2001 09:36:16 -0500, James Kanze
<James...@dresdner-bank.com> wrote:
>> You use a mutex to protect data against concurrent access.
>
>> int i;
>
>> void f(void) {
>> lock(mutex);
>> i++; // or something
>> unlock(mutex);
>> // some lengthy epilogue goes here
>> }
>
>> Looking at it more paranoidly, on might argue that an optimizing
>> compiler will probably want to keep i in a register for some reason,
>> and that it might want to keep it in that register until the
>> function returns.
>
>I've used more than one compiler that does this. In fact, most do, at
>least with optimization turned on.

This optimization is only permitted if the compiler ``knows'' that i is
not modified by these functions, and, in the case of POSIX, if the
compiler also knows that these functions don't call library functions
that have memory synchronizing properties.

Unless the compiler has very sophisticated global optimizations that
can look into the retained images of other translation units of the
program, this means that if lock() and unlock() are, or contain,
calls to other units, then i cannot be cached.

You will probaly find with most compilers that the most aggressive
caching optimizations are applied to auto variables whose address is
never taken. These cannot possibly be accessed or modified by another
thread or signal handler or what have you, so it is generally safe to
cache them in registers, or even optimize them to registers entirely.

Kaz Kylheku

unread,

Jan 15, 2001, 6:33:30 PM1/15/01

to

On 15 Jan 2001 08:45:01 -0500, James Kanze
<James...@dresdner-bank.com> wrote:

>> Dave Butenhof wrote:
>
>> > Don't ever use the C/C++ language volatile in threaded code. It'll

>> > kill your performance, and the language definition has nothing to
>> > do with what you want when writing threaded code that shares
>> > data. If some OS implementation tells you that you need to use it
>> > anyway on their system, (in the words of a child safety group),
>> > "run, yell, and tell". That's just stupid, and they shouldn't be
>> > allowed to get away with it.
>
>This is correct up to a point. The problem is that the C++ language
>has no other way of signaling that a variable may be accessed by
>several threads (and thus ensuring e.g. that it is really written
>before the lock is released). The problem *isn't* with the OS; it is
>with code movement within the optimizer of the compiler.

The problem is with the specification which governs the implementation
of the compiler and the operating system.

> And while I
>agree with the sentiment: volatile isn't the solution, I don't know
>how many compilers offer another one.

All POSIX implementations must honor the rules that the synchronization
functions like pthread_mutex_lock and so forth have memory
synchronizing properties. The combined implementation of language,
library and operating system must ensure that data is made consistent
across multiple processors when these functions are used. It is simply
a requirement. So a POSIX threaded application never needs to do
anything special such as using volatile; the implementation must do
whatever is needed, including having the compiler specially recognize
some functions, if that's what it takes!

Without such a statement of requirement, you cannot infer anything
about the behavior. At best you can look at what the compiler does now
and hope that it will do similar things in the future. This is the
case, e.g., with Visual C++ for Microsoft Windows. It so happens that
if you call an external function like EnterCriticalSection, then the
Microsoft compiler emits code that does not cache any non-local data.

Tom Payne

unread,

Jan 15, 2001, 6:34:15 PM1/15/01

to

In comp.lang.c++.moderated Kenneth Chiu <ch...@cs.indiana.edu> wrote:
: In article <3A62CF03...@dresdner-bank.com>,

That's a very good and important point. Volatility is neither
necessary nor sufficient for the synchronization that is needed in
multi-threading. Volatile objects get synchronized at every sequence
point, which is unnecessarily often, but only register-resident
objects get synchronized; there is no requirement to synchronized
processor-local caches.

Tom Payne

Brian McNamara!

unread,

Jan 15, 2001, 6:35:50 PM1/15/01

to

Martin Berger <martinb@--remove--me--dcs.qmw.ac.uk> once said:
>this makes me think that c++ *should* be expanded to include something
>like "blob" as a modifier. or maybe even user defined modifiers.

...

>(eg for the prevention of race conditions). i also think that user defined
>modifier semantics has uses beyond preventing race conditions. how about it?

The way I see it, Andrei's approach uses the compiler as a model
checker. Volatile/non-volatile comprises a two-state model which
distunguishes whether objects are locked or unlocked (or whatever), and
the typechecker/type system captures the model and thus forces the
compiler to verify it.

I agree that user-defined modifiers could have uses beyond preventing
race conditions, but I think that's just an abuse of the type system.
(Which is not to say I'm not a proponent of abusing the type system; I
do it all the time. But if we are speaking of extensions, we might as
well do them right, rather than continue using less-than-ideal language
constructs as a means to achieve our ends.)

I think there could be exciting things done if there were a kind of
"property system" in addition to a type system, and the compiler served
as a property verifier. Then users could define a lattice of properties
which apply to certain types of objects, as well as property-transitions
that happen when mutating operations are applied to those objects, and
the compiler could verify that user-specified properties hold for
particular objects at particular points in the code. In other words, I
mean a kind of symbolic-model-checker used for proofs-of-correctness
whose equations are solved by the dataflow analysis engine that's
already in the compiler.

As I see it, you can't do this well now with templates in C++, because
an object's type is fixed during its (static) lifetime (that is, it's
extent in the code); object properties should be able to change as the
object is mutated, in general. (Andrei uses const_cast to effect the
property-transition, and LockingPtr as a new object (and data type) to
access the property-transformed object; this is sufficient for the
two-state model needed to catch race conditions.)

I don't know that I explained that well. :)

--
Brian M. McNamara lor...@acm.org : I am a parsing fool!
** Reduce - Reuse - Recycle ** : (Where's my medication? ;) )

Kaz Kylheku

unread,

Jan 15, 2001, 9:49:10 PM1/15/01

to

On 15 Jan 2001 16:15:11 -0500, Konrad Schwarz

<konradDO...@mchpDOTsiemens.de> wrote:
>So the optimization of keeping variables in registers accross
>function calls is illegal in general (and thus must not be performed),
>if the compiler cannot prove that the code will not be linked into a
>multi-threaded program or it cannot prove that those variables will
>never be shared.

Basically that is what it boils down to. Proving otherwise in the
general case involves knowing what happens in all the translation units
that are called, and such optimizations therefore must be delayed
somehow until the program is linked.

>However, the C language has a way of signaling that local variables
>cannot be accessed by other threads, namely by placing them in
>the register storage class. I don't know about C++; if I remember
>correctly, C++ degrades register to a mere "efficiency hint".

The auto storage class suffices. The register specifier simply means that
the object (which has automatic storage class---there is no register
storage class, just like there is no typedef storage class!) cannot
have its address taken; it becomes a constraint violation to try to do
so. It's not difficult to verify that an object's address is never
taken, whether or not it is declared register.

Kaz Kylheku

unread,

Jan 15, 2001, 10:35:05 PM1/15/01

to

On 15 Jan 2001 09:39:16 -0500, James Kanze

<James...@dresdner-bank.com> wrote:
>Kaz Kylheku wrote:
>
>> On 13 Jan 2001 23:01:04 -0500, Joerg Faschingbauer
>> <jfa...@jfasch.faschingbauer.com> wrote:
>> >So, provided that unlock() is a function and not a macro, there is no
>> >need to declare i volatile.
>
>> Even if a compiler implements sophisticated global optimizations
>> that cross module boundaries, the compiler can still be aware of
>> synchronization functions and do the right thing around calls to
>> those functions.
>
>It can be. It should be. Is it? Do todays compilers actually do the
>right thing, or are we just lucking out because most of them don't
>optimize very aggressively anyway?

It seems that we are lucking out, but not really. The way compilers
typically work is enough to ensure that volatile is not needed in order
for the right load and store instructions to be issues in the right
order. The rest of the job is done by the synchronization library
implementors, who must insert the appropriate memory barrier
instructions or what have you, into the implementation of these
functions, so that the hardware doesn't make a dog's breakfast out of
the memory access requests issued by each processor.

These library implementors tend to have a clue, and tend to have
influence with the compiler writers. For example, if the GNU compiler
people implemented some sophisticated optimizations not knowing that
these break MT programs, the GNU libc people would take note and work
out some solution---perhaps a special function attribute would be
developed, so that in the library header file, one could write:

int pthread_mutex_lock(pthread_mutex_t *) __attribute___ ((sync));

or some such thing meaning, spill and reload when calling this
function.

The point is that clueful implementors are aware of the issues and are
looking out for you; it's not just some accident that things work. :)

Ron Hunsinger

unread,

Jan 16, 2001, 5:28:45 AM1/16/01

to

In article <3A62D01F...@dresdner-bank.com>, James Kanze
<James...@dresdner-bank.com> wrote:

> You still need some way of preventing the optimizer from deferring
> writes until after the lock has been released.

Not knowing what's inside unlock() should take care of this. A compiler
that doesn't bring memory up to date before calling a function it doesn't
have the source for is going to generate code that breaks even in the
absence of multithreading.

> Ideally, the compiler
> will understand the locking system (mutex, or whatever), and generate
> the necessary write guards itself. Off hand, I don't know of any
> compiler which meets this ideal.

I'd expect the write guard to be within unlock() itself. No matter what the
compiler knows or doesn't know, the writer of unlock() surely knows that a
memory barrier (maybe two) is required, and can code it in.

-Ron Hunsinger

James Kanze

unread,

Jan 16, 2001, 10:49:00 AM1/16/01

to

Kaz Kylheku wrote:

[...]

> You will probaly find with most compilers that the most aggressive
> caching optimizations are applied to auto variables whose address is
> never taken. These cannot possibly be accessed or modified by
> another thread or signal handler or what have you, so it is
> generally safe to cache them in registers, or even optimize them to
> registers entirely.

I think we basically agree. The only difference is that I'm not
content with "probably find with most compilers"; I want a writen
guarantee for the compiler I will actually use. (There are normally
contractual penalties for errors in software I write. And a threading
problem is considered an error in the software.)

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 16, 2001, 10:49:52 AM1/16/01

to

Kaz Kylheku wrote:

> On 15 Jan 2001 09:39:16 -0500, James Kanze
> <James...@dresdner-bank.com> wrote:
> >Kaz Kylheku wrote:

> >> On 13 Jan 2001 23:01:04 -0500, Joerg Faschingbauer
> >> <jfa...@jfasch.faschingbauer.com> wrote:
> >> >So, provided that unlock() is a function and not a macro, there
> >> >is no need to declare i volatile.

> >> Even if a compiler implements sophisticated global optimizations
> >> that cross module boundaries, the compiler can still be aware of
> >> synchronization functions and do the right thing around calls to
> >> those functions.

> >It can be. It should be. Is it? Do todays compilers actually do
> >the right thing, or are we just lucking out because most of them
> >don't optimize very aggressively anyway?

> It seems that we are lucking out, but not really. The way compilers
> typically work is enough to ensure that volatile is not needed in order
> for the right load and store instructions to be issues in the right
> order.

The operative word here is "typically", I think. I know that it will
work for most compilers, not necessarily because the compiler writers
have done anything special, but because they haven't done anything
really special in the lines of optimizing. I've also seen
experimental compilers which did an amazing amout intermodule
analysis, and which "knew" that system calls don't access user
variables unless they've been passed the address of the variable.

(BTW: most compilers will rearrange the order of writes. It shouldn't
matter, as long as all writes take place before the lock is released.)

> The rest of the job is done by the synchronization library
> implementors, who must insert the appropriate memory barrier
> instructions or what have you, into the implementation of these
> functions, so that the hardware doesn't make a dog's breakfast out
> of the memory access requests issued by each processor.

Agreed. No problem here.

> These library implementors tend to have a clue, and tend to have
> influence with the compiler writers. For example, if the GNU
> compiler people implemented some sophisticated optimizations not
> knowing that these break MT programs, the GNU libc people would take
> note and work out some solution---perhaps a special function
> attribute would be developed, so that in the library header file,
> one could write:

> int pthread_mutex_lock(pthread_mutex_t *) __attribute___ ((sync));
>
> or some such thing meaning, spill and reload when calling this
> function.

> The point is that clueful implementors are aware of the issues and
> are looking out for you; it's not just some accident that things
> work. :)

You've hit upon exactly the point which worries me. Thread safety is
a completely foreign domain for most compiler writers, or at least it
was when I worked on compilers. Which means that the implementor who
is clueful about optimization may not be so clueful about locking
issues and thread safety. I'd feel a lot better with an explicit
acknowledgement of the issues in the compiler documentation.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 16, 2001, 10:51:09 AM1/16/01

to

Konrad Schwarz wrote:

> If the compiler supports multi-threading (at least POSIX
> multi-threading), then it *must*, since POSIX does not require
> shared variables to be volatile qualified. If the compiler decides
> to keep values in registers across function calls, it must be able
> to prove that
> * either these variables are never shared by another thread
> * or the functions in question never perform inter-thread operations

Is this explicitly stated in the Posix standard? If so, it is the
sort of guarantee I'm looking for. Or is this just your
interpretation based on the lack of a requirement that shared
variables be volatile qualified?

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 16, 2001, 10:52:12 AM1/16/01

to

Joerg Faschingbauer wrote:

> Duh! What compiler and what language are you talking about?

The language was C++. The compiler was an experimental one which ran
on HP/UX. I don't know what percentage of the optimizations involved
have actually made it into a commercial compiler at present, but the
potential is there.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 16, 2001, 10:52:31 AM1/16/01

to

David Schwartz wrote:

> James Kanze wrote:

> > You still need some way of preventing the optimizer from deferring
> > writes until after the lock has been released. Ideally, the
> > compiler will understand the locking system (mutex, or whatever),
> > and generate the necessary write guards itself. Off hand, I don't
> > know of any compiler which meets this ideal.

> Give an example of what you think the problem is. The
> typical solution is to give the compiler no information at all about
> the locking system. Since the compiler then must assume the locking
> system could do anything, it can't optimize anything across it.

In sum, you're counting on the weaknesses of the compiler.

I've already said that in practice, it is probably not a problem,
since the compiler normally won't have accesses to the sources to the
locking system, and any compiler smart enough to know that a system
call won't modify a global variable can also know that specific system
calls involve the locking system, and so some sort of barrier is
necessary.

What I'm complaining about is the lack of explicit guarantees
regarding this. In the end, my previous paragraph is really just
speculation. I think that this will be the case. But I'm feel much
better about it if the compiler implementors specified it, so that I
could be sure that they'd considered it. Particularly because today,
it typically isn't a problem; as you say, the compiler has no
information about the system, and so supposes it can do anything.

> It's not clear to me what you mean by "deferring
> writes". This could either refer to variables being cached in
> registers and not written back or it couuld refer to a hardware
> write cache not being flushed. Fortunately, neither is a
> problem. Variables can't be cached in registers because the compiler
> doesn't know what the lock/unlock functions do, and so must assume
> they might access those variables from their memory
> locations. Hardware write caches aren't a problem, because the
> lock/unlock functions contain the appropriate memory barrier. The
> compiler doesn't know this, but the compiler has nothing to do with
> such hardware write reordering and so doesn't need to.

I was mainly thinking of the compiler optimizations. The only way to
ensure that the compiler has no knowledge of what the lock/unlock
functions do is to not make the sources, or even the binary,
accessible to it. Typically, this IS the case, because the
lock/unlock functions are implemented in the system. But a good
compiler can recognize system calls, and know that they don't change
global variables unless the address of the global variable has been
passed as a parameter. (I've actually used a compiler which did this.
Twelve years ago, no less.) Of course, one would hope that a compiler
this smart would also know which system functions involve the locking
system, and take this into account. I just happen to prefer
documented guarantees to just hoping.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

John Mullins

unread,

Jan 16, 2001, 10:53:25 AM1/16/01

to

"Andrei Alexandrescu" <andre...@hotmail.com> wrote in message
news:93v92q$bpfij$1...@ID-14036.news.dfncis.de...

> "John Mullins" <John.M...@crossprod.co.uk> wrote in message
> news:93v2gj$4fs$1...@newsreaderg1.core.theplanet.net...
> >

> > But his examples also rely on undefined behaviour so he can't really
> > count on anything.
>
> Are you referring to const_cast? Strictly speaking, indeed. But then, all
MT
> programming in C/C++ has undefined behavior.
>
> Andrei
>

Of course, and I agree when dealing with multiple threads we're somewhat
in the 'Twilight Zone'. I do find it worrying that we find this in an
'Experts' column and there is no mention that this may fail to work with a
highly aggressive optimizing compiler. Novices tend to learn from experts
and if you tell them it's okay to cast away volatile on a volatile variable,
they'll believe you. FWIW I enjoyed the article and gained lots of useful
insights but would have liked to have seen the problems acknowledged.

JM

Charles Bryant

unread,

Jan 16, 2001, 12:11:46 PM1/16/01

to

In article <3A62CE9A...@lucent.com>,

Michiel Salters <sal...@lucent.com> wrote:
>Joerg Faschingbauer wrote:
>> >>>>> "David" == David Schwartz <dav...@webmaster.com> writes:
>> David> Joerg Faschingbauer wrote:
>>
>> >> Now that's the whole point: the compiler has to take care that the
>> >> code it generates spills registers before calling a function.
>
>> David> This is really not so. It's entirely possible that the
>> David> compiler might have some way of assuring that the particular
>> David> value cached in the register isn't used by the called function,
>> David> and hence it can keep it in a register.
>

>> Even though such a thing is possible, it is quite unlikely - consider
>> the management effort of the people making (and upgrading!) such a
>> system.
>

>I don't think things are that hard - especially in C++, which already
>has name mangling. For each translation unit, it is possible to determine
>which functions use which registers and what functions are called outside
>that translation unit.
>Now encode in the name mangling of a function which registers are used,
>except of course those functions imported from another translation unit.
>Just add to the name something like Reg=EAX_EBX_ECX. The full set of
>registers used by a function is the set of registers it uses, plus
>the registers used by function it calls. This will even work for
>mutually recursive functions across translation units.

I don't see how name mangling if of the slightest use. At the point
where the compiled code refers to the function, it cannot know which
registers it uses, so it cannot know what symbol to use for the
reference. If it can look up the unmangled name in order to determine
which 'Reg=...' mangling to use, it might as well leave the name
alone.

--
Eppur si muove

Charles Bryant

unread,

Jan 16, 2001, 12:13:18 PM1/16/01

to

In article <93vb9p$bpi8g$1...@ID-14036.news.dfncis.de>,
Andrei Alexandrescu <andre...@hotmail.com> wrote:
>I would be glad if someone explained me in what situations a compiler can
>rearrange instructions to the extent that it would invalidate the idiom that
>the article proposes. OTOH, such compilers invalidate a number of idioms
>anyway, such as the Double-Check Locking Pattern, used by Doug Schmidt in
>ACE.

Not having read the article, I cannot comment on it (is there a URL
for it or for a summary?).

However, the problem is not that the compiler may rearrange
instructions. It is that the CPU may re-order the memory accesses.
For example,

x = 6; // x == 6
y = 7; // x == 6, y == 7
x = y; // x == 7, y == 7
y = 5; // x == 7, y == 5

The CPU may:
put '6' in its cache, scheduled to be written to 'x',
write '6' from cache to 'x'
put '7' in its cache, scheduled to be written to 'y',
write '7' from cache to 'y'
put '7' in its cache, scheduled to be written to 'x',
put '5' in its cache, scheduled to be written to 'y',
write '5' from cache to 'y'
write '7' from cache to 'x'

Note that this permits another CPU to see x == 6 and y == 5 in
memory, even though CPU executing the code could never see this
combination.

When a programmer wishes to enforce the relative ordering of memory
accesses on such a CPU, they make it execute a 'memory barrier'
instruction. There may be several such instructions, and in the above
example an 'store/store' memory barrier after 'x = y' would ensure
that the writing of '5' to 'y' could not be moved before any store
before the barrier, so the combination x == 6 && y ==5 would not
occur in memory.

The SPARC processor manual has a good description of this in an
appendix.

This is why the double checked locking paradigm is irrecoverably
broken. See
<URL: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html >
for a detailed explanation of why.

Charles Bryant

unread,

Jan 16, 2001, 12:13:47 PM1/16/01

to

In article <3A6369C3...@sensor.com>, Ron Natalie <r...@sensor.com>
wrote:

>
>> However, the C language has a way of signaling that local variables
>> cannot be accessed by other threads, namely by placing them in
>> the register storage class.
>
>Huh? How is that? The C language doesn't contain the word thread
>anywhere. Register is auto + a hint to keep in a register.
>
>> I don't know about C++; if I remember
>> correctly, C++ degrades register to a mere "efficiency hint".
>
>The ONLY difference between C and C++ is that C++ allows you to
>take the address of something with a register storage class (noting
>that doing so may force it out of a register), while C prohibits
>the & operator on register-declared objects even if they weren't
>actually put in a register by the compiler.

That difference is what prevents register variables being shared
across threads. Each thread has its own, private, automatic
variables, so the only way a thread could refer to an automatic
variable in another thread is if it is passed the address of the
variable. If you can't take the address of an automatic variable,
then that variable cannot be shared across threads.

--
Eppur si muove

Dennis Yelle

unread,

Jan 16, 2001, 12:14:16 PM1/16/01

to

"Brian McNamara!" wrote:
[...]

> In other words, I
> mean a kind of symbolic-model-checker used for proofs-of-correctness
> whose equations are solved by the dataflow analysis engine that's
> already in the compiler.

If C++ was easy to parse, I don't think you would be asking for that.
The real problem here is that C++ is too hard to parse.
Only a compiler can do it.
Those lisp pushers keep telling me I should change to lisp because
lisp is trivial for both compilers and people and simple programs to
parse. It sounds good, until I see all of those parens.)))))))

Dennis Yelle
--
I am a computer programmer and I am looking for a job.
There is a link to my resume here:
http://table.jps.net/~vert/

James Kanze

unread,

Jan 16, 2001, 12:16:19 PM1/16/01

to

Andrei Alexandrescu wrote:

> I would be glad if someone explained me in what situations a
> compiler can rearrange instructions to the extent that it would
> invalidate the idiom that the article proposes. OTOH, such compilers
> invalidate a number of idioms anyway, such as the Double-Check
> Locking Pattern, used by Doug Schmidt in ACE.

As far as C++ is concerned, in just about every case. The C++
standard doesn't recognize multi-threading, and a conforming compiler
can suppose that there are no accesses outside of the current thread,
and optimize in consequence.

In practice, it isn't generally a problem, because:
- there are apparently other standards (POSIX) which address the
issue, and
- in practice, compiler optimizers aren't smart enough to move code
concerning global variables accross a call to a function in
another module.
For example, although I know the failings of the double-check locking
pattern, it is widely used, and I've yet to hear of a case where it
failed to work correctly. I suspect, however, that this is mainly
because the constructor of a singleton isn't the most heavily executed
code in an application, and the few compilers where the optimizer is
intelligent enough to cause problems use profiling output to guide
optimization, and only really optimize the most heavily executed
branches.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

Dave Butenhof

unread,

Jan 16, 2001, 1:57:02 PM1/16/01

to

Andrei Alexandrescu wrote:

> "James Kanze" <James...@dresdner-bank.com> wrote in message
> news:3A62CF03...@dresdner-bank.com...
> > The crux of Andrei's suggestions really just exploits the compiler
> > type-checking with regards to volatile, and not the actual semantics
> > of volatile. If I've understood the suggestion correctly, it would
> > even be possible to implement it without ever accessing the individual
> > class members as if they were volatile (although in his examples, I
> > think he is also counting on volatile to inhibit code movement).
>
> Thanks James for all your considerations before and after the article
> appeared.
>
> There is a point about the use of volatile proposed by the article. If you
> write volatile correct code as prescribed by the article, you _never_
> *never* NEVER use volatile variables. You _always_ *always* ALWAYS lock a
> synchronization object, cast the volatile away, operate on the so-obtained
> non-volatile alias, let the alias go, and unlock the synchronization object,
> in this order.
>
> Maybe I should have made it clearer that in volatile-correct code, you never
> operate on volatile data - you always cast volatile away and more
> specifically, you cast it away when it is *semantically correct* to do so
> because you locked the afferent synchronization object.

Yes, you should have done so. The introduction, as has been pointed out many
times in this overly complicated thread (my head hurts after trying to catch up
with a couple of days' worth of posts!), oversells the technique substantially.

First off, it does nothing about "detecting races". What it DOES is try to use
(one might reasonably say "abuse") a language feature to try to detect
unsynchronized access to shared variables. That's a noble cause, but I don't
like the implementation much. By using volatile (which is already a subject of
much confusion, as you may have noticed by now), you're (perhaps unintentially)
implying a lot that the article can't do. (Note that, as pointed out elsewhere,
races can happen even when you use synchronization. Races due to improper
synchronization aren't very interesting, because that's basically what they
asked for, you and you really can't stop them from getting it.)

The intent appears to be simply that the compiler will diagnose attempts to use
the volatile members without the intended type casting to remove the volatile
attribute. You supply a guard object to lock an associated mutex and provide
the type cast. You also speak of doing the type cast directly. Yes, I see that
you suggested this be done only in "non-threaded" environments. You don't give
examples, but one might be to initialize data in main() before creating
threads. I don't see much value to this (the mutex would be uncontended). In
general, there's no way to know that the process isn't "threaded". (And, unless
your system prohibits dlopen() of the thread library, which some do, you can't
know that it won't suddenly BECOME threaded at the most inconvenient time.)
Thread-safe code should always BEHAVE as if there were multiple threads.

The article also implies that one could use the data with volatile attributes
intact to manipulate shared data. You don't appear to intend to recommend this,
and you don't give any examples, but the tantalizing (and dangerous) suggestion
remains. You'd get less flak if you removed the suggestion, and replaced it
with a flat statement that any attempt at manipulating shared data without
using standard synchronization operations is NON-portable, and that no language
feature is sufficient. It CAN be done on any platform, but you need to
understand the software and hardware architecture fairly well to even try, and
it's rarely worth anyone's time.

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

Konrad Schwarz

unread,

Jan 16, 2001, 2:04:04 PM1/16/01

to

Ron Natalie:

>
> > However, the C language has a way of signaling that local variables
> > cannot be accessed by other threads, namely by placing them in
> > the register storage class.
>
> Huh? How is that? The C language doesn't contain the word thread
> anywhere. Register is auto + a hint to keep in a register.
>
> > I don't know about C++; if I remember
> > correctly, C++ degrades register to a mere "efficiency hint".
>
> The ONLY difference between C and C++ is that C++ allows you to
> take the address of something with a register storage class (noting
> that doing so may force it out of a register), while C prohibits
> the & operator on register-declared objects even if they weren't
> actually put in a register by the compiler.

That's the point. C register variables can't have their addresses
taken,
so they can't be aliased.

Martin Berger

unread,

Jan 16, 2001, 2:03:46 PM1/16/01

to

"Brian McNamara!" wrote:

> I agree that user-defined modifiers could have uses beyond preventing
> race conditions, but I think that's just an abuse of the type system.
> (Which is not to say I'm not a proponent of abusing the type system; I
> do it all the time. But if we are speaking of extensions, we might as
> well do them right, rather than continue using less-than-ideal language
> constructs as a means to achieve our ends.)

i don't think it is an abuse of the typing system, it is a natural extension
and can easily extend to modifier polymorphism.

> I think there could be exciting things done if there were a kind of
> "property system" in addition to a type system, and the compiler served
> as a property verifier. Then users could define a lattice of properties
> which apply to certain types of objects, as well as property-transitions
> that happen when mutating operations are applied to those objects, and
> the compiler could verify that user-specified properties hold for
> particular objects at particular points in the code.

that's roughly what i had in mind.

> In other words, I
> mean a kind of symbolic-model-checker used for proofs-of-correctness
> whose equations are solved by the dataflow analysis engine that's
> already in the compiler.

what kind of algorithms the compiler uses for checking is orthogonal
to the semantics of type qualifiers.

the following two papers might be of interest:

http://http.cs.berkeley.edu/~jfoster/papers/pldi99.ps.gz
http://www.stanford.edu/~engler/mc-osdi.ps

martin

Andrei Alexandrescu

unread,

Jan 16, 2001, 4:33:35 PM1/16/01

to

"John Mullins" <John.M...@crossprod.co.uk> wrote in message

news:9412sa$mdf$1...@newsreaderm1.core.theplanet.net...

> Of course, and I agree when dealing with multiple threads we're
somewhat
> in the 'Twilight Zone'. I do find it worrying that we find this in an
> 'Experts' column and there is no mention that this may fail to work with a
> highly aggressive optimizing compiler. Novices tend to learn from experts
> and if you tell them it's okay to cast away volatile on a volatile
variable,
> they'll believe you. FWIW I enjoyed the article and gained lots of useful
> insights but would have liked to have seen the problems acknowledged.

Point taken, thanks very much.

Andrei

John Mullins

unread,

Jan 16, 2001, 4:34:28 PM1/16/01

to

"Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message
news:93sptr$o5s$1...@flotsam.uits.indiana.edu...

> It's not that volatile itself causes memory problems. It's that
> it's not sufficient (and if under POSIX should not even be used).
>
> He gives an example, which will work in practice, but if he had two
> shared variables, would fail. Code like this, for example, would
> be incorrect on an MP with a relaxed memory model. The write to flag_
> could occur before the write to data_, despite the order in which the
> assignments are written.
>
> class Gadget {
> public:
> void Wait() {
> while (!flag_) {
> Sleep(1000); // sleeps for 1000 milliseconds
> }
> do_some_work(data_);
> }
> void Wakeup() {
> data_ = ...;
> flag_ = true;
> }
> ...
> private:
> volatile bool flag_;
> volatile int data_;
> };
>
I'm not sure I agree with this, the compiler should guarantee that the
writes occur as written since this is 'observable behaviour'

JM

Ron Natalie

unread,

Jan 16, 2001, 4:35:24 PM1/16/01

to

Charles Bryant wrote:

> >> However, the C language has a way of signaling that local variables
> >> cannot be accessed by other threads, namely by placing them in
> >> the register storage class.
> >
>

> That difference is what prevents register variables being shared
> across threads. Each thread has its own, private, automatic
> variables, so the only way a thread could refer to an automatic
> variable in another thread is if it is passed the address of the
> variable. If you can't take the address of an automatic variable,
> then that variable cannot be shared across threads.
>

Ah I get it, signalling to the programmer. Not signalling them to the
compiler.

Andrei Alexandrescu

unread,

Jan 16, 2001, 8:25:08 PM1/16/01

to

"Charles Bryant" <n142036...@chch.demon.co.uk> wrote in message
news:2001-01-1...@chch.demon.co.uk...

> In article <93vb9p$bpi8g$1...@ID-14036.news.dfncis.de>,
> Andrei Alexandrescu <andre...@hotmail.com> wrote:
> >I would be glad if someone explained me in what situations a compiler can
> >rearrange instructions to the extent that it would invalidate the idiom
that
> >the article proposes. OTOH, such compilers invalidate a number of idioms
> >anyway, such as the Double-Check Locking Pattern, used by Doug Schmidt in
> >ACE.
>
> Not having read the article, I cannot comment on it (is there a URL
> for it or for a summary?).

http://cuj.com/experts/1902/alexandr.html

I'll also provide a summary for two reasons. One is that many people don't
have the time to read the whole banana. The second reason is that a subset
of those people do have time to post an opinion about the article.

Summary:

The article describes how applying the volatile modifier to class types and
member functions, in conjunction with following a number of rules, to the
end of having the compiler detect race conditions as type errors.

In essence, the article predicates qualifying with volatile all user-defined
data that is shared between threads, and remove that qualification (via a
const_cast) only in conjunction with locking a synchronization object (the
article uses a mutex) that is associated with that data. This way
multithreading semantics and volatile semantics are always in sync. The
device that helps with that is a simple class template LockingPtr.

Also, the article predicates qualifying with volatile member functions that
do their own internal synchronization. This way, volatile (shared) objects
will be able to invoke those member functions directly. Caller code witll be
able to invoke non-synchronized (non-volatile) member functions only after
locking the object with a LockingPtr.

The article makes NO CLAIM on multithreaded programming in the ABSENCE of a
mutex. The workings of LockingPtr are based on mutex semantics. I thought
this is clear enough, but too many people gloss over the fact that the
article uses mutexes or semantically equivalent devices.

> When a programmer wishes to enforce the relative ordering of memory
> accesses on such a CPU, they make it execute a 'memory barrier'
> instruction. There may be several such instructions, and in the above
> example an 'store/store' memory barrier after 'x = y' would ensure
> that the writing of '5' to 'y' could not be moved before any store
> before the barrier, so the combination x == 6 && y ==5 would not
> occur in memory.

I wonder, can memory bareers be encapsulated (like in macros) so that you
get mutex-like semantics?

> This is why the double checked locking paradigm is irrecoverably
> broken. See
> <URL:
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html >
> for a detailed explanation of why.

You know, I was thinking. In the presence of 'volatile', doesn't (or at
least shouldn't) the compiler disable reordering? For example, if the
pointer to the Singleton is volatile-qualified, then a rearranging compiler
should disable reordering instructions that involve the pointed.

Andrei

Kaz Kylheku

unread,

Jan 16, 2001, 8:26:41 PM1/16/01

to

On 16 Jan 2001 14:04:04 -0500, Konrad Schwarz

<konradDO...@mchpDOTsiemens.de> wrote:
>
>
>Ron Natalie:
>>
>> > However, the C language has a way of signaling that local variables
>> > cannot be accessed by other threads, namely by placing them in
>> > the register storage class.
>>
>> Huh? How is that? The C language doesn't contain the word thread
>> anywhere. Register is auto + a hint to keep in a register.
>>
>> > I don't know about C++; if I remember
>> > correctly, C++ degrades register to a mere "efficiency hint".
>>
>> The ONLY difference between C and C++ is that C++ allows you to
>> take the address of something with a register storage class (noting
>> that doing so may force it out of a register), while C prohibits
>> the & operator on register-declared objects even if they weren't
>> actually put in a register by the compiler.
>
>That's the point. C register variables can't have their addresses
>taken,
>so they can't be aliased.

There is no difference between an auto variable that cannot have its
address taken an one that does not have its address taken. The
register keyword only adds a constraint rule to the C language, but it
is otherwise vacuous of any special semantics.

A conforming C compiler must verify that a register variable does not
in fact have its address taken, and emit a diagnostic otherwise. It
can similarly analyze *any* auto variable to determine whether or not
its address is taken. It would be a poor quality compiler that only
made optimistic aliasing assumptions about register variables.

Kaz Kylheku

unread,

Jan 16, 2001, 8:27:44 PM1/16/01

to

On 16 Jan 2001 13:57:02 -0500, Dave Butenhof <David.B...@compaq.com> wrote:
>First off, it does nothing about "detecting races". What it DOES is try to use
>(one might reasonably say "abuse") a language feature to try to detect
>unsynchronized access to shared variables. That's a noble cause, but I don't
>like the implementation much.

I find that using simple assertions is adequate for finding
unsynchronized accesses to shared variables. A function or statement
block that expects a lock to already be held simply does something like
this:

assert (lock.current_thread_is_owner());

That's it; no messing around with contortions of the C++ type system.

Andrei Alexandrescu

unread,

Jan 16, 2001, 8:27:20 PM1/16/01

to

"Dave Butenhof" <David.B...@compaq.com> wrote in message
news:3A645F38...@compaq.com...

> Andrei Alexandrescu wrote:
> > Maybe I should have made it clearer that in volatile-correct code, you
never
> > operate on volatile data - you always cast volatile away and more
> > specifically, you cast it away when it is *semantically correct* to do
so
> > because you locked the afferent synchronization object.
>
> Yes, you should have done so. The introduction, as has been pointed out
many
> times in this overly complicated thread (my head hurts after trying to
catch up
> with a couple of days' worth of posts!), oversells the technique
substantially.

I disagree, but then I'm a very narrow audience :o).

> First off, it does nothing about "detecting races". What it DOES is try to
use
> (one might reasonably say "abuse") a language feature to try to detect
> unsynchronized access to shared variables.

In my opinion the use of volatile as done by the article is in keeping with
the built-in meaning of the keyword and dovetails nicely with volatile's
workings. I wouldn't call that abuse, but of course I'd be glad to stand
corrected.

Volatile data is data that can be modified outside compiler's knowledge.
This is *exactly* the meaning I keep for volatile. I have the data that's
used by multiple threads volatile-qualified. You can't touch the data that's
volatile and so you have to lock an afferent synchronization object before
casting away its volatileness. Once you locked the synchronization object,
the semantics of the code become single-threaded and so you can cast the
volatile away.

I find the point above quite interesting and important for understanding the
point of the article - that you can, and should, synchronize the semantics
of volatile with the semantics of locking.

> That's a noble cause, but I don't
> like the implementation much. By using volatile (which is already a
subject of
> much confusion, as you may have noticed by now), you're (perhaps
unintentially)
> implying a lot that the article can't do. (Note that, as pointed out
elsewhere,
> races can happen even when you use synchronization. Races due to improper
> synchronization aren't very interesting, because that's basically what
they
> asked for, you and you really can't stop them from getting it.)

This is trite. I'd say, you can apply any technique improperly or
mistakenly. You can say, for example, that const-correctness doesn't
necessarily lead to const-correct programs.

Volatile correctness allows you to apply a SMALL number of SIMPLE,
MECHANICAL rules that allow the compiler to detect ALL race conditions. If
you design your primitive classes wrongly, then you did not apply the rules
or you applied them incorrectly.

> The intent appears to be simply that the compiler will diagnose attempts
to use
> the volatile members without the intended type casting to remove the
volatile
> attribute. You supply a guard object to lock an associated mutex and
provide
> the type cast.

This is essential: that const_casting volatile away is done always in
conjunction with locking the synchronization object.

> The article also implies that one could use the data with volatile
attributes
> intact to manipulate shared data. You don't appear to intend to recommend
this,
> and you don't give any examples, but the tantalizing (and dangerous)
suggestion
> remains.

I am not sure what you are referring to here. Could you quote? I am sure
there is a misunderstanding. For now, I say that there is no dangerous
suggestion that the article makes. I stand behind what I wrote and I
consider it fundamentally correct.

I wish (I mean, I don't *really* wish) that someone would come with a code
sample that applies the rules of volatile-correct code, yet has trouble
related to race conditions. This would lead to completing the rules. Again,
I affirm that volatile-correctness is a fundamentally sound concept, but of
course there might be loopholes that I left in defining it.

> You'd get less flak if you removed the suggestion, and replaced it
> with a flat statement that any attempt at manipulating shared data without
> using standard synchronization operations is NON-portable, and that no
language
> feature is sufficient.

But doesn't the article say - and is quite noisy about that - that you
should cast volatileness away ONLY after locking the synchronization object?
Maybe your statement could have made things a tad clearer, but then the
article does not target programmers who don't know how to write
multithreaded programs. It targets those who do write multithreaded programs
and would like to get help from their compiler with that.

Andrei

David Schwartz

unread,

Jan 16, 2001, 8:37:40 PM1/16/01

to

James Kanze wrote:

> > Give an example of what you think the problem is. The
> > typical solution is to give the compiler no information at all about
> > the locking system. Since the compiler then must assume the locking
> > system could do anything, it can't optimize anything across it.
>
> In sum, you're counting on the weaknesses of the compiler.

No, I'm simply demonstrating that typical compilers have no difficulty
meeting the pthreads standard.

> I've already said that in practice, it is probably not a problem,
> since the compiler normally won't have accesses to the sources to the
> locking system, and any compiler smart enough to know that a system
> call won't modify a global variable can also know that specific system
> calls involve the locking system, and so some sort of barrier is
> necessary.
>
> What I'm complaining about is the lack of explicit guarantees
> regarding this. In the end, my previous paragraph is really just
> speculation. I think that this will be the case. But I'm feel much
> better about it if the compiler implementors specified it, so that I
> could be sure that they'd considered it. Particularly because today,
> it typically isn't a problem; as you say, the compiler has no
> information about the system, and so supposes it can do anything.

The pthreads standard provides an explicit guarantee. However, a simple
bit of logic will show you that this guarantee is trivial to implement
regardless of how aggressive the compiler optimizes -- anything another
thread could do to get access to a variable, the lock/unlock code could
do to get access to that same variable.

> > It's not clear to me what you mean by "deferring
> > writes". This could either refer to variables being cached in
> > registers and not written back or it couuld refer to a hardware
> > write cache not being flushed. Fortunately, neither is a
> > problem. Variables can't be cached in registers because the compiler
> > doesn't know what the lock/unlock functions do, and so must assume
> > they might access those variables from their memory
> > locations. Hardware write caches aren't a problem, because the
> > lock/unlock functions contain the appropriate memory barrier. The
> > compiler doesn't know this, but the compiler has nothing to do with
> > such hardware write reordering and so doesn't need to.

> I was mainly thinking of the compiler optimizations. The only way to
> ensure that the compiler has no knowledge of what the lock/unlock
> functions do is to not make the sources, or even the binary,
> accessible to it. Typically, this IS the case, because the
> lock/unlock functions are implemented in the system. But a good
> compiler can recognize system calls, and know that they don't change
> global variables unless the address of the global variable has been
> passed as a parameter.

For threaded code, this would be a bad, even broken, compiler.

> (I've actually used a compiler which did this.
> Twelve years ago, no less.) Of course, one would hope that a compiler
> this smart would also know which system functions involve the locking
> system, and take this into account. I just happen to prefer
> documented guarantees to just hoping.

Well the pthreads standard gives you one. It has very specific memory
visibility rules. They're documented clearly in Dr. Butenhof's book.

DS

Kaz Kylheku

unread,

Jan 17, 2001, 2:43:20 AM1/17/01

to

On 16 Jan 2001 16:34:28 -0500, John Mullins

Any optimization is valid if a correct program cannot tell the
difference; what constitutes a correct program depends on the language
standard, with whatever fine tuning added by the implementation to
allow additional non-standard programs to be correct---such as programs
that use threads, access hardware directly and so on.

The reordering of memory updates can only be observed by programs which
are far from standard C or C++. The rules which govern the correctness
of these programs, and therefore which govern what optimizations may or
may not be applied, are entirely up to the language implementors.

The system architecture of advanced multiprocessor systems typically
distinguishes differend kinds of memory regions. The reordering
optimizations are not applied to all of the regions. For example,
memory mapped hardware registers would be placed in a region to which
caching and reordering optimizations are not applied, thus programs
which use volatile lvalues to access such registers will work properly.

These multiprocessors typically run POSIX environments, which dictate
the that the use of volatile is not necessary in MT programming;
rather, what is mandatory is the use of the proper synchronization
functions. Thus a program which accesses shared data without using
these functions is incorrect, and its behavior may therefore vary with
the system configuration, the optimization settings of the compiler,
phase of the moon, etc.

Martin Berger

unread,

Jan 17, 2001, 7:14:50 AM1/17/01

to

Kaz Kylheku <k...@ashi.footprints.net> wrote

> I find that using simple assertions is adequate for finding
> unsynchronized accesses to shared variables. A function or statement
> block that expects a lock to already be held simply does something like
> this:
>
> assert (lock.current_thread_is_owner());

i for one prefer to deterministically and at compile time be told
about the possibility of a race condition rather than non-deterministically
at run-time.not to mention the overhead ...

martin

Andrei Alexandrescu

unread,

Jan 17, 2001, 7:18:09 AM1/17/01

to

"Kaz Kylheku" <k...@ashi.footprints.net> wrote in message
news:slrn9699o...@ashi.FootPrints.net...

> On 16 Jan 2001 13:57:02 -0500, Dave Butenhof <David.B...@compaq.com>
wrote:
> >First off, it does nothing about "detecting races". What it DOES is try
to use
> >(one might reasonably say "abuse") a language feature to try to detect
> >unsynchronized access to shared variables. That's a noble cause, but I
don't
> >like the implementation much.
>
> I find that using simple assertions is adequate for finding
> unsynchronized accesses to shared variables. A function or statement
> block that expects a lock to already be held simply does something like
> this:
>
> assert (lock.current_thread_is_owner());
>
> That's it; no messing around with contortions of the C++ type system.

I guess we reached an irreductible position here. For me, transforming an
assertion in a compile-time error is a cool thing. For you, it's not cool at
all. I perfectly understand your position, but I think otherwise.

Andrei

James Kanze

unread,

Jan 17, 2001, 10:29:07 AM1/17/01

to

Charles Bryant wrote:

I don't see what mangling has to do with it either, but I certainly
don't see any problem for the compiler to generate the necessary
information when it emits the function definition and the function
calls, and for the linker to patch the code up to do the necessary
saves.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 17, 2001, 10:30:33 AM1/17/01

to

Ron Hunsinger wrote:

> In article <3A62D01F...@dresdner-bank.com>, James Kanze
> <James...@dresdner-bank.com> wrote:

> > You still need some way of preventing the optimizer from deferring
> > writes until after the lock has been released.

> Not knowing what's inside unlock() should take care of this. A
> compiler that doesn't bring memory up to date before calling a
> function it doesn't have the source for is going to generate code
> that breaks even in the absence of multithreading.

It depends. I've actually used a compiler which didn't bring memory
up to date before system calls (unless the system call was passed an
address of the memory), since it knew that system calls didn't know
about user variables. And lock/unlock are typically system calls, at
least at the lowest level.

As I said: I don't think it's a practical problem, but I prefer
written guarantees. (Other posters have pointed out to me that Posix
DOES give written guarantees, so I can at least write reliable
multi-threaded code under Unix, if not on any other systems.)

> > Ideally, the compiler
> > will understand the locking system (mutex, or whatever), and generate
> > the necessary write guards itself. Off hand, I don't know of any
> > compiler which meets this ideal.

> I'd expect the write guard to be within unlock() itself. No matter
> what the compiler knows or doesn't know, the writer of unlock()
> surely knows that a memory barrier (maybe two) is required, and can
> code it in.

We're talking about two different things. The hardware write guard
should be in unlock; there's no doubt about it. What I'm talking
about is whatever the compiler uses to limit code movement.

Anyway, as far as I'm concerned, the question has been answered: Posix
does make written guarantees, and any compiler conformant to Posix
must respect them. It still means that I'm up in the air under
Windows?

John Mullins

unread,

Jan 17, 2001, 10:31:22 AM1/17/01

to

"Kaz Kylheku" <k...@ashi.footprints.net> wrote in message

news:slrn969jl...@ashi.FootPrints.net...

> On 16 Jan 2001 16:34:28 -0500, John Mullins
> <John.M...@crossprod.co.uk> wrote:

> > I'm not sure I agree with this, the compiler should guarantee that
the
> >writes occur as written since this is 'observable behaviour'
>
> Any optimization is valid if a correct program cannot tell the
> difference; what constitutes a correct program depends on the language
> standard, with whatever fine tuning added by the implementation to
> allow additional non-standard programs to be correct---such as programs
> that use threads, access hardware directly and so on.

The fact that the variables are volatile means that the writes cannot be
reordered because you CAN tell the difference. What you have to do to ensure
coherency is another matter.

JM

Michiel Salters

unread,

Jan 17, 2001, 10:38:09 AM1/17/01

to

Charles Bryant wrote:

> In article <3A62CE9A...@lucent.com>,
> Michiel Salters <sal...@lucent.com> wrote:
> >Joerg Faschingbauer wrote:
> >> >>>>> "David" == David Schwartz <dav...@webmaster.com> writes:
> >> David> Joerg Faschingbauer wrote:
> >>
> >> >> Now that's the whole point: the compiler has to take care that the
> >> >> code it generates spills registers before calling a function.

[ SNIP ]

> >I don't think things are that hard - especially in C++, which already
> >has name mangling. For each translation unit, it is possible to determine
> >which functions use which registers and what functions are called outside
> >that translation unit.
> >Now encode in the name mangling of a function which registers are used,
> >except of course those functions imported from another translation unit.
> >Just add to the name something like Reg=EAX_EBX_ECX. The full set of
> >registers used by a function is the set of registers it uses, plus
> >the registers used by function it calls. This will even work for
> >mutually recursive functions across translation units.

> I don't see how name mangling if of the slightest use. At the point
> where the compiled code refers to the function, it cannot know which
> registers it uses, so it cannot know what symbol to use for the
> reference. If it can look up the unmangled name in order to determine
> which 'Reg=...' mangling to use, it might as well leave the name
> alone.

Well, the name is a place in which information can be stored. That is the
only reason I came up with name-mangling. An extra segment in the object
file would work too.

The important point is however that no code saving registers should be
included during compilation. The information that should be saved
at the point where a function is called from another translation unit
is the name of that function, and the registers in use. The task of the
linker is to replace that function call request by an actual function
call - including the code to save some registers, perhaps set up extra
stack. Which registers should be saved is determined by extracting
register usage from the mangled name in the scheme above. It does
require that the linker compares mangled names after stripping of
register information though, as you correctly noted.

--
Michiel Salters
Michiel...@cmg.nl
sal...@lucent.com

Torsten.Robitzki

unread,

Jan 17, 2001, 11:07:10 AM1/17/01

to

Hi,
John Mullins schrieb:

>
> "Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message
> news:93sptr$o5s$1...@flotsam.uits.indiana.edu...
>
> > It's not that volatile itself causes memory problems. It's that
> > it's not sufficient (and if under POSIX should not even be used).
> >
> > He gives an example, which will work in practice, but if he had two
> > shared variables, would fail. Code like this, for example, would
> > be incorrect on an MP with a relaxed memory model. The write to flag_
> > could occur before the write to data_, despite the order in which the
> > assignments are written.
> >

[example of a class with a trigger flag and some data to read after
triggered]

> I'm not sure I agree with this, the compiler should guarantee that the
> writes occur as written since this is 'observable behaviour'

I'm reading this 'thread' with great interest and
I've got some questions not only to your post.
a)
As far as I know, the behavior of a well formed c++ program is described
by the steps that an abstract machine would taken. (or the other way
round)
Is concurrence taken into account for that machine? Is memory a part of
this
machine?
b)
What is a memory barrier? (I know it have nothing to do with c++)
Is it an instruction that synchronize memory with cache?

reards Torsten
(P.S. feel free to correct my english)

Joerg Faschingbauer

unread,

Jan 17, 2001, 11:13:08 AM1/17/01

to

>>>>> "Andrei" == Andrei Alexandrescu <andre...@hotmail.com> writes:

>> This is why the double checked locking paradigm is irrecoverably
>> broken. See
>> <URL:

Andrei> http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html

>
>> for a detailed explanation of why.

Andrei> You know, I was thinking. In the presence of 'volatile', doesn't (or
at
Andrei> least shouldn't) the compiler disable reordering? For example, if the
Andrei> pointer to the Singleton is volatile-qualified, then a rearranging
compiler
Andrei> should disable reordering instructions that involve the pointed.

That's not why it's broken. You forget about processors reordering
memory accesses. Compilers can't do anything about that. Barries (and
thus mutexes) are used to prevent reordering over a barrier.

Joerg

Andrei Alexandrescu

unread,

Jan 17, 2001, 11:46:16 AM1/17/01

to

"Tom Payne" <t...@hill.cs.ucr.edu> wrote in message
news:93vmb6$1d8h$1...@mortar.ucr.edu...
> His technique seems a good way to guarantee atomicity of certain
> operations, but AFIK it doesn't detect or prevent all situation where
> the outcome of multiple operations on a thread-shared object depends
> on how those threads are scheduled.

I disagree.

Here's your code example with some obvious errors corrected:

class Int {
int i;
public
Int() i(0) {}
void Double() { i = 2*i; }
void incr() { i = i+1; }
};

> Apply Andrei's technique to Int and then create a static Int k and two
> threads:
> - thread1 increments k and then exits
> - thread2 doubles k and then exits.
> Visibly, the final value of k.i is going to depend on the scheduling of
> these two threads.

If you apply the technique the article prescribes, the code becomes like
this:

class Int {
int i;
public
Int() i(0) {}
void Double() volatile {
acquire sync object/mem bareer
i = 2 * i;
release sync object/mem bareer
}
void Double() {
i = 2 * i;
}
void Incr() volatile {
acquire sync object/mem bareer
i = i+1;
release sync object/mem bareer
}
void Incr()

i = i+1;
}
};

Now when you call Double() or Incr() on an object, the correct version is
called provided you qualified with 'volatile' shared objects appropriately.
So what you say now is that two threads call Double and Inc respectively. So
what's the problem? A problem would have occured if one thread fetched an
incorrect temporary value from the other thread's computation.

If the granularity of Int() or Double() is too low, the design of Int is
wrong. Remove their volatile variants and provide higher-level functions.

> AFIK, detecting race conditions is equivalent to the halting problem.

In my opinion it's not.

Andrei

Joe Seigh

unread,

Jan 17, 2001, 12:16:42 PM1/17/01

to

Andrei Alexandrescu wrote:
>
> I wish (I mean, I don't *really* wish) that someone would come with a code
> sample that applies the rules of volatile-correct code, yet has trouble
> related to race conditions. This would lead to completing the rules. Again,
> I affirm that volatile-correctness is a fundamentally sound concept, but of
> course there might be loopholes that I left in defining it.
>

(in Java syntax for the purposes of illustration)

int temp; // thread local
int x; // shared

synchronized(a) {
temp = x;
}
temp++;
synchronized(a) {
x = temp;
}

which is semantically equivalent to (fetch and store of x is atomic)

x++;

Joe Seigh

Aleksey Gurtovoy

unread,

Jan 17, 2001, 12:21:41 PM1/17/01

to

Dave Butenhof wrote:
> First off, it does nothing about "detecting races". What it DOES is
try to use
> (one might reasonably say "abuse") a language feature to try to detect
> unsynchronized access to shared variables. That's a noble cause, but
I don't
> like the implementation much. By using volatile (which is already a
subject of
> much confusion, as you may have noticed by now), you're (perhaps
unintentially)
> implying a lot that the article can't do.

I tend to agree with the "abuse" comment above (even after I've read
Andrei's follow up to the article quoted above ;). Although it's true
that semantics of 'volatile' is not completely out of the domain being
discussed in the article and this thread, still (IMO) using the
qualifier as the main tool for implementing (an interesting and
repeating) idea to force the compiler to diagnose unsynchronized access
to (possibly) shared data is close to that some people might call a
hack, especially given that there are a few areas (primitive types
and 'const_cast'-ing) which are not very clean under this
implementation, and the observation that if we had yet another type
qualifier in the language, then the technique could as well use it
instead of 'volatile'. Also, as it was already said, 'volatile' is just
too confusing without any additional help ;), and assigning yet another
semantics to it is basically overloading of the keyword, and we all
know that this is not as good as function overloading :).

Having said all that, I still think that the idea behind the original
implementation is an interesting one, and I beleive that it is possible
to implement most of its key features (even missing ones) in a more
straightforward (less controversial ;) way. Below is my version of how
one of such implementations might look like. However, I do not claim
that it is in any way better when Andrei's version, mostly because
AFAIU Andrei do use presented in the article technique in real life,
and mine code is just a bunch of characters off the top of my head :).

Let me start with an example of how article's SyncBuf class definition
will look in my case:

#include "need_locking.hpp"

struct SyncBuf {
void thread_1();
void thread_2();
private:
typedef std::vector<char> buffer_type;
need_locking<buffer_type> buffer_;
mutex mutex_;
};

void SyncBuf::thread_1() {
locking_ptr<buffer_type> buf_ptr(buffer_, mutex_);
// buffer_type::iterator i = buffer_.begin(); // compile-time error
buffer_type::iterator i = buf_ptr->begin();
// for (; i != buffer_.end(); ++i ) { // compile-time error
for (; i != buf_ptr->end(); ++i ) {
*i = 0;
}
}

Basically there are only two things here that are different from the
original class - the additional #include directive at the top and the
declaration of the 'buffer_' member as
having 'need_locking<buffer_type>' type instead of 'volatile
buffer_type'. For those who is not dissapointed yet, the last one is
(almost) the whole point of this post ;). But we've also got a few
additional bonus points here: (a) there is no undefined behavior
involved (well, this is a very moot one, but see the implementation
of 'need_locking' and 'locking_ptr' below :), and (b) now the 'Counter'
class example is as "safe" as the one you've just seen (quotes here
aimed to prevent some long discussion threads that, I can feel, have
already appeared on the horizon ;).

struct Counter {
void increment() {
++*locking_ptr<int>(counter_, mutex_);
// ++counter_; // compile-time error
}
void decrement() { ++*locking_ptr<int>(counter_, mutex_); }
private:
need_locking<int> counter_;
mutex& mutex_;
};

Now to the content of the 'need_locking.hpp', that should be something
close to this:

template<class T> struct need_locking;
template<class T> struct locking_ptr {
locking_ptr(need_locking<T>& n, mutex& m)
: obj_ptr_(&n.object)
, mutex_(m)
{
mutex_.lock();
}
~locking_ptr() { mutex_.unlock(); }
T* operator->() const { return obj_ptr_; }
T& operator*() const { return *obj_ptr_; }
private:
T* obj_ptr_;
mutex& mutex_;
};

template<class T> struct need_locking {
template<class U> need_locking(U const& u) : object(u) {}
template<class U1, class U2>
need_locking(U1 const& u1, U2 const& u2) : object(u1, u2) {}
// more template contructors with 3, 4 etc. params :)
private: friend struct locking_ptr<T>;
T object;
};

Nothing complicated so far, but what about 'volatile'-qualified thread-
safe member functions and overloading (the article's 'Widget' example)?
Well, first at all, the idea of writing classes like 'Widget':

// Andrei's example from the article
class Widget
{
public:
void Operation() volatile;
void Operation();
...
private:
Mutex mtx_;
};

void Widget::Operation() volatile
{
LockingPtr<Widget> lpThis(*this, mtx_);
lpThis->Operation(); // invokes the non-volatile function
}

does not appeal to me very much, in particular because of (IMO)
unnecessary proliferation of member functions and introducing a
constant per-object (size and probably construction time) overhead even
in case if "thread-safe" part of the class' interface is never used. I
think that taking care of thread-safety issues at the class level
should be done more along the following lines:

template<typename Mutex = null_mutex>
struct Widget : private Mutex { // subject to possible empty
// base class optimization :)
void operation() {
Mutex::locker lock(*this);
// do something
}
};

Widget<> widget; // "unprotected" Widget
Widget<SomeMutex> widget; // "protected" Widget

Difference between two approaches is obvious, but I still want to point
it out: the later version of the 'Widget' class allows you to have a
(near to) zero overhead (comparing to a hypothetical implementation
without any thread synchronization support at all) in single-threaded
environment, or in case if you don't want/need to synchronize the
operations of 'Widget' level. And as soon as you apply this
parametarization technique to a class there will be no need for
having 'volatile' counterparts for old plain member functions.

Well, it's almost the end. There are a few simplifications here and
there, but it's already quite a long post, so I'll leave them
undisclosed :). I am sure, though, that it won't last long ;). Hope
that some of the above things were worth spending more than 1 hour of
writing :)

--Aleksey

Sent via Deja.com
http://www.deja.com/

James Kanze

unread,

Jan 17, 2001, 12:31:16 PM1/17/01

to

Charles Bryant wrote:

> In article <93vb9p$bpi8g$1...@ID-14036.news.dfncis.de>,
> Andrei Alexandrescu <andre...@hotmail.com> wrote:

> >I would be glad if someone explained me in what situations a
> >compiler can rearrange instructions to the extent that it would
> >invalidate the idiom that the article proposes. OTOH, such
> >compilers invalidate a number of idioms anyway, such as the
> >Double-Check Locking Pattern, used by Doug Schmidt in ACE.

[...]
> However, the problem is not that the compiler may rearrange
> instructions. It is that the CPU may re-order the memory accesses.

Both are in fact a problem. It doesn't matter who does the
re-ordering; if the re-ordering occurs, the pattern is broken. (It's
worth pointing out the hardware re-ordering, however, because
depending on how the compiler interprets the standard, even volatile
may not help.) In the double-check locking pattern, at least in its
classical presentation, the compiler *can* reorder, at least as far as
the C/C++ standards go. (I've not verified the Posix standards, but I
doubt that they forbid the reordering either. In the general case,
the ability to reorder is essential for good optimization.)

As a reminder, the double-check locking pattern is:

Singleton*
Singleton::instance()
{
static Singleton* uniqueInstance = NULL ;
if ( uniqueInstance == NULL ) {
lock() ;
if ( uniqueInstance == NULL ) {
uniqueInstance = new Singleton ; // problem
}
unlock() ;
}
return uniqueInstance ;
}

The problem is in the marked line. Although simple in appearance,
this line actually does quite a lot. It allocates memory, calls the
constructor of the Singleton, and then writes the pointer
uniqueMemory. We will suppose that the memory allocation is thread
safe -- it is part of the system, after all, and can be called from
different threads. For the rest, however, there is no guarantee that
the writes in the constructor, which initialize this memory, occur
before the write to the pointer. If the write to the pointer occurs
before the object itself is actually initialized, and a second thread
then interrupts, it will see a non-null pointer, not even request the
lock, and access the uninitialized object.

> For example,

> x = 6; // x == 6
> y = 7; // x == 6, y == 7
> x = y; // x == 7, y == 7
> y = 5; // x == 7, y == 5

> The CPU may:
> put '6' in its cache, scheduled to be written to 'x',
> write '6' from cache to 'x'
> put '7' in its cache, scheduled to be written to 'y',
> write '7' from cache to 'y'
> put '7' in its cache, scheduled to be written to 'x',
> put '5' in its cache, scheduled to be written to 'y',
> write '5' from cache to 'y'
> write '7' from cache to 'x'

> Note that this permits another CPU to see x == 6 and y == 5 in
> memory, even though CPU executing the code could never see this
> combination.

Note that a good compiler can do this too. Typically, most compilers
should be able to suppress the first two writes completely. And in
many cases, which variable actually gets written first will depend on
the register presure and the register allocation scheme of the
compiler.

This isn't just true for experimental compilers. The old Lattice C
compilers for the 8086 (and thus the Microsoft C version 1.0)
regularly wrote data back in an order different from the assignments
in the source code.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 17, 2001, 12:31:41 PM1/17/01

to

The C/C++ standards says that anything goes, as far as optimization is
concerned, as long as the observable behavior is unchanged. They also
say that accessing a volatile object is "observable behavior".
Reordering of writes to volatile objects is not allowed.

> The reordering of memory updates can only be observed by programs
> which are far from standard C or C++. The rules which govern the
> correctness of these programs, and therefore which govern what
> optimizations may or may not be applied, are entirely up to the
> language implementors.

Your logic would also apply to writes of data to the screen, so the
lines might appear in a different order than written. The screen
cannot be observed by standard C or C++ programs either.

In fact, the rule is the reverse: the order must only be maintained
when the objects can be observed from outside the C or C++ program.
And according to the standards, accessing a volatile object (whether
read or write) is observable from outside the program, by definition.
Let's not forget that the very first motivation of volatile is to
support memory mapped IO; if the specifications of my controller chip
says that I must write the control port before writing the data port,
then the compiler had better not reorder my writes.

> The system architecture of advanced multiprocessor systems typically
> distinguishes differend kinds of memory regions. The reordering
> optimizations are not applied to all of the regions. For example,
> memory mapped hardware registers would be placed in a region to
> which caching and reordering optimizations are not applied, thus
> programs which use volatile lvalues to access such registers will
> work properly.

Are you talking about hardware reordering, or software. The C
language provides no means of informing the compiler which region of
memory is involved, so I don't see how the regions can inhibit
software reordering.

> These multiprocessors typically run POSIX environments, which
> dictate the that the use of volatile is not necessary in MT
> programming; rather, what is mandatory is the use of the proper
> synchronization functions. Thus a program which accesses shared data
> without using these functions is incorrect, and its behavior may
> therefore vary with the system configuration, the optimization
> settings of the compiler, phase of the moon, etc.

Posix makes certain guarantees. The C/C++ language standard makes
others. The Posix guarantees may be a lot more useful for MT code
than those in the language standard, but this doesn't mean that the
compiler can ignore the language standard, either.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

Anthony Williams

unread,

Jan 17, 2001, 12:32:32 PM1/17/01

to

"Andrei Alexandrescu" <andre...@hotmail.com> wrote in message
news:93v92q$bpfij$1...@ID-14036.news.dfncis.de...

> "John Mullins" <John.M...@crossprod.co.uk> wrote in message

> news:93v2gj$4fs$1...@newsreaderg1.core.theplanet.net...

> >
> > "James Kanze" <James...@dresdner-bank.com> wrote in message

> > news:3A62CF96...@dresdner-bank.com...
> >
> > > That's not totally true, at least not in his examples. I think he
> > > also counts on volatile to some degree to inhibit code movement;
> > > i.e. to prevent the compiler from moving some of the writes to after
> > > the lock has been freed.
> >
> > But his examples also rely on undefined behaviour so he can't really
> > count on anything.
>
> Are you referring to const_cast? Strictly speaking, indeed. But then, all
MT
> programming in C/C++ has undefined behavior.
>
> Andrei

MT programming is beyond the scope of the C++ standard. If your compiler
vendor explicitly states that the compiler and runtime library support MT
code in C++, then I would hope that it would work.

However, the use of const_cast to remove a volatile qualifier IS covered by
the C++ standard, and produces undefined behaviour.

Anthony
--
Anthony Williams
Software Engineer, Nortel Networks
The opinions expressed in this message are mine and do not represent those
of my employer

Konrad Schwarz

unread,

Jan 17, 2001, 2:01:24 PM1/17/01

to

James Kanze wrote:
>
> Konrad Schwarz wrote:
>
> > If the compiler supports multi-threading (at least POSIX
> > multi-threading), then it *must*, since POSIX does not require
> > shared variables to be volatile qualified. If the compiler decides
> > to keep values in registers across function calls, it must be able
> > to prove that
> > * either these variables are never shared by another thread
> > * or the functions in question never perform inter-thread operations
>
> Is this explicitly stated in the Posix standard? If so, it is the
> sort of guarantee I'm looking for. Or is this just your
> interpretation based on the lack of a requirement that shared
> variables be volatile qualified?

This is my interpretation, however, I believe it is an
inference, not an interpretation.

Konrad Schwarz

unread,

Jan 17, 2001, 2:07:32 PM1/17/01

to

Kaz Kylheku schrieb:

> There is no difference between an auto variable that cannot have its
> address taken an one that does not have its address taken. The
> register keyword only adds a constraint rule to the C language, but it
> is otherwise vacuous of any special semantics.
>
> A conforming C compiler must verify that a register variable does not
> in fact have its address taken, and emit a diagnostic otherwise. It
> can similarly analyze *any* auto variable to determine whether or not
> its address is taken. It would be a poor quality compiler that only
> made optimistic aliasing assumptions about register variables.

Or one that does not fully implement ANSI aliasing rules because
its target code base is not ANSI compliant.

Kaz Kylheku

unread,

Jan 17, 2001, 3:26:14 PM1/17/01

to

On 17 Jan 2001 10:31:22 -0500, John Mullins

<John.M...@crossprod.co.uk> wrote:
>
>"Kaz Kylheku" <k...@ashi.footprints.net> wrote in message
>news:slrn969jl...@ashi.FootPrints.net...
>> On 16 Jan 2001 16:34:28 -0500, John Mullins
>> <John.M...@crossprod.co.uk> wrote:
>
>> > I'm not sure I agree with this, the compiler should guarantee that
>the
>> >writes occur as written since this is 'observable behaviour'
>>
>> Any optimization is valid if a correct program cannot tell the
>> difference; what constitutes a correct program depends on the language
>> standard, with whatever fine tuning added by the implementation to
>> allow additional non-standard programs to be correct---such as programs
>> that use threads, access hardware directly and so on.
>
> The fact that the variables are volatile means that the writes cannot be
>reordered because you CAN tell the difference.

That is incorrect reasoning. Because there are valid ways to tell the
difference and there are invalid ways. If you monitor your computer
with electronic probes, then everything is observable. Does it then
follow that all optimizations are disallowed?

Certain observations, on the part of the program, are invalid, and do
not have to be accounted for. Incorrect aliasing between objects is
such an example. A C compiler does not have to worry that a store
thorugh a double * pointer has any effect on an lvalue of type int. So
a subsequent access to the int lvalue may pull out a cached copy.

The effect of reordered writes can only be observed with an additional
processor, whose use requires a non-standard programming interface that
renders the program undefined in the face of the language standard. A
given processor always sees a coherent view of its own updates; each
memory location appears to retains its last stored value. Therefore a
strictly conforming program cannot see the effect of the reordering.

>coherency is another matter.

Reordered writes are just an aspect of coherency on some systems. As
the cache system on one processor tries to maintain a coherent view, it
does not get the updates from other processors in the same order in
which they were issued in the instruction streams of those processors.

Andrei Alexandrescu

unread,

Jan 17, 2001, 3:27:38 PM1/17/01

to

"James Kanze" <James...@dresdner-bank.com> wrote in message

news:3A656354...@dresdner-bank.com...

> As a reminder, the double-check locking pattern is:
>
> Singleton*
> Singleton::instance()
> {
> static Singleton* uniqueInstance = NULL ;
> if ( uniqueInstance == NULL ) {
> lock() ;
> if ( uniqueInstance == NULL ) {
> uniqueInstance = new Singleton ; // problem
> }
> unlock() ;
> }
> return uniqueInstance ;
> }
>
> The problem is in the marked line. Although simple in appearance,
> this line actually does quite a lot. It allocates memory, calls the
> constructor of the Singleton, and then writes the pointer
> uniqueMemory. We will suppose that the memory allocation is thread
> safe -- it is part of the system, after all, and can be called from
> different threads. For the rest, however, there is no guarantee that
> the writes in the constructor, which initialize this memory, occur
> before the write to the pointer. If the write to the pointer occurs
> before the object itself is actually initialized, and a second thread
> then interrupts, it will see a non-null pointer, not even request the
> lock, and access the uninitialized object.

Here's where I can't understand. It's so good because we are zeroing in on
my problem. Doesn't the standard say something like all side effects of a
function call must have occured at a sequence point? Would it be standard to
have unlock() called before the constructor and the assignment to
uniqueInstance have been performed?

Andrei

Andrei Alexandrescu

unread,

Jan 17, 2001, 3:28:03 PM1/17/01

to

"Aleksey Gurtovoy" <al...@meta-comm.com> wrote in message
news:943qmg$hb0$1...@nnrp1.deja.com...

> Let me start with an example of how article's SyncBuf class definition
> will look in my case:
>
> #include "need_locking.hpp"

[snip]

Got the idea. So basically you make a template need_locking that simulates a
modifier by changing the type of a variable. Cool.

> Nothing complicated so far, but what about 'volatile'-qualified thread-
> safe member functions and overloading (the article's 'Widget' example)?
> Well, first at all, the idea of writing classes like 'Widget':

Here things become not so clean and the volatile member function-based
solution is better. See below.

> // Andrei's example from the article
> class Widget
> {
> public:
> void Operation() volatile;
> void Operation();
> ...
> private:
> Mutex mtx_;
> };
>
> void Widget::Operation() volatile
> {
> LockingPtr<Widget> lpThis(*this, mtx_);
> lpThis->Operation(); // invokes the non-volatile function
> }

I let the sample above in the code for comparison's sake.

> does not appeal to me very much, in particular because of (IMO)
> unnecessary proliferation of member functions and introducing a
> constant per-object (size and probably construction time) overhead even
> in case if "thread-safe" part of the class' interface is never used.

When writing an article, I think you would agree it's best to keep focus on
one thing only. Scott Meyers taught me this, and quite convincingly. Imagine
you would have read an article that begins with a thorough description of
what volatile does, followed by an introduction to multithreading, and in
the middle of the discussion pops an explanation of empty base optimization!

> I
> think that taking care of thread-safety issues at the class level
> should be done more along the following lines:
>
> template<typename Mutex = null_mutex>
> struct Widget : private Mutex { // subject to possible empty
> // base class optimization :)
> void operation() {
> Mutex::locker lock(*this);
> // do something
> }
> };

Ouch. So now you don't have ANY chance to call the unsynchronized
operation(), even when you ALREADY HAVE A LOCKED OBJECT. Sorry for shouting
but my eyes are popping out in exhilaration. It's an important point. You
often lock a shared object once and call operation() on it until blood
starts dripping by the corner of its mouth. So you want BOTH operations: the
protected one, and the unprotected one. MORE, you also need an ordered call
mechanism so you don't call the unprotected version on unprotected objects,
and so you don't incur the overhead on objects that are already protected.
My solution provides ALL these amenities. Yours provides NONE - nothing more
than 101 rigid locking in member functions.

The issue of locking objects unnecessarily becomes very apparent when you
derive from a lockable class, or when you want to do operations that don't
lend themselves to function-level locking, such as iteration. How can you
lock an object, do some stuff, and call the base function without locking
the object twice? And there we are - back to two versions of the same
function, only that now it's all much more clumsy and dangerous!

> Difference between two approaches is obvious, but I still want to point
> it out: the later version of the 'Widget' class allows you to have a
> (near to) zero overhead (comparing to a hypothetical implementation
> without any thread synchronization support at all) in single-threaded
> environment, or in case if you don't want/need to synchronize the
> operations of 'Widget' level. And as soon as you apply this
> parametarization technique to a class there will be no need for
> having 'volatile' counterparts for old plain member functions.

I disagree with this, based on what I said above. Some old code of mine does
such stuff, but I am convinced the volatile member functions are very
effective, and have no caveats.

Andrei

Andrei Alexandrescu

unread,

Jan 17, 2001, 3:28:50 PM1/17/01

to

"Joe Seigh" <jse...@genuity.com> wrote in message
news:3A65777A...@genuity.com...

> Andrei Alexandrescu wrote:
> >
> > I wish (I mean, I don't *really* wish) that someone would come with a
code
> > sample that applies the rules of volatile-correct code, yet has trouble
> > related to race conditions. This would lead to completing the rules.
Again,
> > I affirm that volatile-correctness is a fundamentally sound concept, but
of
> > course there might be loopholes that I left in defining it.
> >
> (in Java syntax for the purposes of illustration)
>
> int temp; // thread local
> int x; // shared
>
> synchronized(a) {
> temp = x;
> }
> temp++;
> synchronized(a) {
> x = temp;
> }
>
> which is semantically equivalent to (fetch and store of x is atomic)
>
> x++;

I am sorry but I don't really understand how your code replies to my post.
For starters, the example is broken. It simulates a RMW operation with temp
taking the role of the register.

Andrei

Andrei Alexandrescu

unread,

Jan 17, 2001, 3:30:08 PM1/17/01

to

"James Kanze" <James...@dresdner-bank.com> wrote in message

news:3A655A2F...@dresdner-bank.com...

> Anyway, as far as I'm concerned, the question has been answered: Posix
> does make written guarantees, and any compiler conformant to Posix
> must respect them. It still means that I'm up in the air under
> Windows?

Windows and written guarentees? You gotta be kiddin' :o). Now seriously,
EnterCriticalSection does take the address of a user variable (the
CRITICAL_SECTION object) and so...

Anyway, in my opinion a rearranging compiler should disable ALL REARRANGING
in that could involve a volatile variable. That's because volatile means
"this variable changes beyond your ability to figure out, so stop
rearranging those statements like crazy and let me take care of my stuff".

Andrei