memory visibility between threads

Lie-Quan Lee

unread,

Jan 11, 2001, 12:34:42 PM1/11/01

to

I have been reading the book written by David Butenhof, Programming with
POSIX Threads. Thanks David, it is a wonderful book and I think it is a
must-read book to everyone who is doing multithread programming. The book
educated me a lot of valuable points I never knew before.

The memory visibilty rules are insightful. it digged out the in-depth
things behind mutex and condition variables and showed me the real meaning
of "synchronization". For example, the rules mean that "any data one put
in register or atuo variables can be read at a later time with no more
complication than a completely synchronous program." Namely, for a
shared variable, if I use a mutex to protect accessing it, I donot need to
worry about whether it is cached in a processor's register or not.

One question is, whether those memory visibility rules are applicable for
other thread system such as Solaris UI threads, or win32 threads, or JAVA
threads ...? If yes, we can follow the same spirit. Otherwise, it will be
a big difference. (For example, all shared variables might have to be
difined as volatile even with mutex protection.)

Any commments will be appreciated.

***************************************************
Rich Lee (Lie-Quan)
Lab for Scientific Computing
University of Notre Dame
Email : ll...@lsc.nd.edu
Tel : 219-631-3906 (Office)
HomePage: http://www.lsc.nd.edu/~llee1/
***************************************************

Kaz Kylheku

unread,

Jan 11, 2001, 12:58:14 PM1/11/01

to

On Thu, 11 Jan 2001 12:34:42 -0500, Lie-Quan Lee <ll...@lsc.nd.edu> wrote:
>One question is, whether those memory visibility rules are applicable for
>other thread system such as Solaris UI threads, or win32 threads, or JAVA
>threads ...?

They do not apply to Win32 threads. Win32 simply isn't as
intellectually thorough as something like POSIX. It's a moving target.

The synchronization rules are basically whatever the latest Visual C++
compiler happens to do, and whatever MSDN example source code does.
Because Microsoft's synchronizes registers with memory around calls to
external functions like EnterCriticalSection, there's no need to use
volatile. And no flavor of Windows runs on a multiprocessor system that
reorders memory accesses, or poses other kinds of difficulties to
threaded programs. Many Win32 programmers blatantly ignore such
possibilities anyway, so most Win32 applications would be in serious
trouble on such hardware.

Win32 is a one-vendor job, and it shows! Already one has do do a ton
of #ifdef's to get a codebase going on NT, 2000, CE, 98 and 95 due to
differences in the interface availability or the semantics of
interfaces that are supposed to be common.

Dave Butenhof

unread,

Jan 12, 2001, 8:32:40 AM1/12/01

to

Lie-Quan Lee wrote:

> I have been reading the book written by David Butenhof, Programming with
> POSIX Threads. Thanks David, it is a wonderful book and I think it is a
> must-read book to everyone who is doing multithread programming. The book
> educated me a lot of valuable points I never knew before.

Thank you!

> The memory visibilty rules are insightful. it digged out the in-depth
> things behind mutex and condition variables and showed me the real meaning
> of "synchronization". For example, the rules mean that "any data one put
> in register or atuo variables can be read at a later time with no more
> complication than a completely synchronous program." Namely, for a
> shared variable, if I use a mutex to protect accessing it, I donot need to
> worry about whether it is cached in a processor's register or not.

As long as you're writing in C (or, with care, in C++) and you're using a
system and compiler that conform to POSIX 1003.1-1996, of course...

> One question is, whether those memory visibility rules are applicable for
> other thread system such as Solaris UI threads, or win32 threads, or JAVA
> threads ...? If yes, we can follow the same spirit. Otherwise, it will be
> a big difference. (For example, all shared variables might have to be
> difined as volatile even with mutex protection.)

Don't ever use the C/C++ language volatile in threaded code. It'll kill your
performance, and the language definition has nothing to do with what you want
when writing threaded code that shares data. If some OS implementation tells
you that you need to use it anyway on their system, (in the words of a child
safety group), "run, yell, and tell". That's just stupid, and they shouldn't
be allowed to get away with it.

As Kaz already said, Win32 has no memory model. It's a proprietary closed
system rooted strongly and deeply in the archaic X86 hardware model. You do
whatever they expect you do to, and whenever they expect something different,
you change your code. (You may or may not be told when the expectations
change, but that's irrelevant.) This all should have been highly and
expensively embarrassing to Microsoft during the period when they were
pretending to support Windows NT on other architectures such as MIPS,
PowerPC, and Alpha, but instead the failings of the Win32 architecture were
blamed on the hardware vendors (presumably because the machines were not
X86). Memory model ought to be a major issue on IA-64, and I don't know if
(or how thoroughly) Win64 addresses it. If they ignore it, fail to make the
issues visible, or fail to thoroughly and quickly educate the vast throngs of
Win32 programmers, I suspect that IA-64 will also fail (at least as a Windows
machine).

UI threads was based loosely on POSIX (or rather, while it began somewhat
independently, it was "co-developed" with POSIX threads and began to be
targeted essentially as a "bridge" standard on a faster track). The memory
model, last I saw, was essentially the same. Both are based entirely on API
operations that imply (or are) some form of explicit synchronization between
cooperating threads.

Java has its own memory model. While it includes the POSIX aspect of explicit
synchronization, it goes beyond that to specify semantics for access that
isn't explicitly synchronized. There are a few problems and ambiguities in
the current version, and it's being substantially reworked. In Java,
"volatile" is supposed to be sufficient, because it doesn't mean at all what
the C volatile means. (I've never been happy that they chose the same word...
but then, it may well be that Java volatile means very much what most C
programmers have erroneously believed the C volatile to mean, so maybe it's
excusable.)

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

John Hickin

unread,

Jan 12, 2001, 9:09:53 AM1/12/01

to

Dave Butenhof wrote:
>

>
> Java has its own memory model. While it includes the POSIX aspect of explicit
> synchronization, it goes beyond that to specify semantics for access that
> isn't explicitly synchronized. There are a few problems and ambiguities in
> the current version, and it's being substantially reworked. In Java,
> "volatile" is supposed to be sufficient, because it doesn't mean at all what

There is a wonderful discussion of this at
http://www.cs.umd.edu/~pugh/java/memoryModel/

I remember a protracted discussion of Double Checked Locking (in a C/C++
setting) in this newsgroup and came to appreciate how dangerous it is to
try to optimize synchronization and maintain portability. The Java folks
are about to learn the same lesson (see the discussion in The
"Double-Checked Locking is Broken" Declaration part of the document).

Regards, John.

sa...@bear.com

unread,

Jan 12, 2001, 10:10:37 AM1/12/01

to

In article <3A5F0778...@compaq.com>,
Dave Butenhof <David.B...@compaq.com> wrote:

> Lie-Quan Lee wrote:
>
> Java has its own memory model. While it includes the POSIX aspect of
explicit
> synchronization, it goes beyond that to specify semantics for access
that
> isn't explicitly synchronized. There are a few problems and
ambiguities in
> the current version, and it's being substantially reworked. In Java,
> "volatile" is supposed to be sufficient, because it doesn't mean at
all what
> the C volatile means. (I've never been happy that they chose the same
word...
> but then, it may well be that Java volatile means very much what most
C
> programmers have erroneously believed the C volatile to mean, so
maybe it's
> excusable.)
>

In Java, volatile means:

1) every access to a `volatile' variable is a memory access,
2) program-ordered accesses to `volatile' variables can not be
reordered with respect to each other,
but there is no restriction on the ordering between a `volatile'
variable access, and a non-volatile variable access.

You can not use `volatile', and still get compiler optimizations
in Java (you have to use `synchronized' for that purpose), because
a `volatile' "flag" variable can be reordered with respect to a
non-volatile 'shared' data.

In C or C++, `volatile' inhibits compiler optimizations, and the
standard does not say much about the precise meaning of `volatile'.
For this reason, I think the Java designers' decision to use the
same term (`volatile') is completely defensible. It is C or C++
programmers' mistake if they do not understand `volatile' completely
(which the standard more or less leaves completely to the implementors
to define).

Thank you,
Saroj Mahapatra

Sent via Deja.com
http://www.deja.com/

Martin Berger

unread,

Jan 12, 2001, 9:27:29 PM1/12/01

to

Dave Butenhof wrote:

> > One question is, whether those memory visibility rules are applicable for
> > other thread system such as Solaris UI threads, or win32 threads, or JAVA
> > threads ...? If yes, we can follow the same spirit. Otherwise, it will be
> > a big difference. (For example, all shared variables might have to be
> > difined as volatile even with mutex protection.)
>
> Don't ever use the C/C++ language volatile in threaded code. It'll kill your
> performance, and the language definition has nothing to do with what you want
> when writing threaded code that shares data. If some OS implementation tells
> you that you need to use it anyway on their system, (in the words of a child
> safety group), "run, yell, and tell". That's just stupid, and they shouldn't
> be allowed to get away with it.
>

well, in the c/c++ users journal, Andrei Alexandrescu recommends using
"volatile" to help avoiding race conditions. can the experts please slug it out?
(note the cross posting)

martin

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

David Schwartz

unread,

Jan 13, 2001, 5:56:51 AM1/13/01

to

Martin Berger wrote:

> well, in the c/c++ users journal, Andrei Alexandrescu recommends using
> "volatile" to help avoiding race conditions. can the experts please slug it out?
> (note the cross posting)
>
> martin

What race conditions? If you have a race condition, you need to _FIX_
it. Something that might help "avoid" it isn't good enough. If I told my
customers I added some code that helped avoid race conditions, they'd
shoot me. Code shouldn't _have_ race conditions.

DS

Kaz Kylheku

unread,

Jan 13, 2001, 1:14:30 PM1/13/01

to

On 12 Jan 2001 21:27:29 -0500, Martin Berger

<martinb@--remove--me--dcs.qmw.ac.uk> wrote:
>well, in the c/c++ users journal, Andrei Alexandrescu recommends using
>"volatile" to help avoiding race conditions. can the experts please slug it
out?

The C/C++ Users Journal is a comedy of imbeciles.

The article you are talking about completely ignores issues of memory
coherency on multiprocessor systems. It is very geared toward Windows;
the author seems to have little experience with multithreaded
programming, and especially cross-platform multithreading. Luckily, he
provides a disclaimer by openly admitting that some code that he wrote
suffers from occasional deadlocks.

Martin Berger

unread,

Jan 13, 2001, 3:01:23 PM1/13/01

to

David Schwartz wrote:

> > well, in the c/c++ users journal, Andrei Alexandrescu recommends using
> > "volatile" to help avoiding race conditions. can the experts please slug it out?
> > (note the cross posting)
> >
> > martin
>
> What race conditions? If you have a race condition, you need to _FIX_
> it. Something that might help "avoid" it isn't good enough. If I told my
> customers I added some code that helped avoid race conditions, they'd
> shoot me. Code shouldn't _have_ race conditions.

maybe i should have given more details: this idea is to use certain properties
of the typing system of c++ with respect to "volatile", "const_cast", overloading
and method invocation to the effect that race conditions will be type errors or
generate at least compiler warnings (see

http://www.cuj.com/experts/1902/alexandr.html

for details). this is quite nifty, provided we ignore the problems dave has
pointed out.

interestingly, andrei's suggestions do not at all depend on the intended semantics
of "volatile", only on how the typing systems checks it and handels const_cast
and method invocation in this case. it it is possible to introduce a new c++
qualifier, say "blob" to the same effect but without the shortcomings (it would
even and in controdistinction to the "volatile" base proposal, handle built in
types correctly).

martin

Rajanish Calisa

unread,

Jan 13, 2001, 5:26:26 PM1/13/01

to

Kaz Kylheku <k...@ashi.footprints.net> wrote in message
news:slrn96173...@ashi.FootPrints.net...

The author is not suggesting that using volatile somehow means that
synchronisation is not necessary. He has come up with a convention,
where one would qualify shared objects as "volatile", and then go on
to define member functions "volatile" and non-volatile. The former is
safe to be called from multiple threads, not as a consequence of the
magic keyword "volatile", but because an internal mutex protects
the data; for the latter type, the caller must provide explicit locking
through a shared mutex.

I remember reading about "volatile" usage not recommended in
Dave Butenhof's book, because it would kill performance. I
am not sure if that applies to constructed types like classes.
If so, the technique described by the author will probably degrade
the performance. Another problem could be the casting away of
the volatile-ness and invoking the non-volatile functions, this
may be undefined by the C++ standard. I am not an expert in
C++, others can probably shed more light on this.

I am not sure if there was any obvious error relating to violation
of memory visibility rules, etc. It will be helpful if you can point
out specific errors. CUJ is a widely read magazine and readers
might adopt the techniques/suggestions in the published articles
without knowing the problems associated.

rajanish...

Martin Berger

unread,

Jan 13, 2001, 5:30:45 PM1/13/01

to

Rajanish Calisa <rca...@ozemail.com.au> wrote in message
news:Wv486.734$FC1....@ozemail.com.au...

> I am not sure if there was any obvious error relating to violation
> of memory visibility rules, etc. It will be helpful if you can point
> out specific errors. CUJ is a widely read magazine and readers
> might adopt the techniques/suggestions in the published articles
> without knowing the problems associated.

i second rajanish's suggestion but add the request to cross post
replies to comp.lang.c++.moderated

martin

James Moe

unread,

Jan 13, 2001, 10:59:46 PM1/13/01

to

Martin Berger wrote:
>
>
> well, in the c/c++ users journal, Andrei Alexandrescu recommends using
> "volatile" to help avoiding race conditions. can the experts please slug it
out?
> (note the cross posting)
>

Use mutexes or semaphores to control access to common data areas. It
is what they are for.
"volatile" is meant for things like hardware access where a device
register can change at any time.

--
sma at sohnen-moe dot com

Joerg Faschingbauer

unread,

Jan 13, 2001, 11:01:04 PM1/13/01

to

>>>>> "Martin" == Martin Berger <martinb@--remove--me--dcs.qmw.ac.uk>
>>>>> writes:

>> Don't ever use the C/C++ language volatile in threaded code. It'll
>> kill your performance, and the language definition has nothing to
>> do with what you want when writing threaded code that shares
>> data. If some OS implementation tells you that you need to use it
>> anyway on their system, (in the words of a child safety group),
>> "run, yell, and tell". That's just stupid, and they shouldn't be
>> allowed to get away with it.
>>

Martin> well, in the c/c++ users journal, Andrei Alexandrescu
Martin> recommends using "volatile" to help avoiding race
Martin> conditions. can the experts please slug it out? (note the
Martin> cross posting)

(I am not an expert, but there's a few things I understood :-)

You use a mutex to protect data against concurrent access.

int i;

void f(void) {
lock(mutex);
i++; // or something
unlock(mutex);
// some lengthy epilogue goes here
}

Looking at it more paranoidly, on might argue that an optimizing
compiler will probably want to keep i in a register for some reason,
and that it might want to keep it in that register until the function
returns.

If that was the case the usage of the mutex would be completely
pointless. At the time the function returns (a long time after the
mutex was unlocked) the value of i is written back to memory. It then
overwrites the changes that another thread may have made to the value
in the meantime.

This is where volatile comes in. Common understanding is that volatile
disables any optimization on a variable, so if you declare i volatile,
the compiler won't keep it in a register, and all is well (if that was
the definition of volatile - my understanding is that volatile is just
a hint to the compiler, so nothing is well if you put it
legally). Except that this misperforms - imagine you wouldn't have a
simple increment in the critical section, but instead lots more of
usage of i.

Now what does a compiler do? It compiles modules independently. In
doing so it performs optimizations (fortunately). Inside the module
that is being compiled the compiler is free to assign variables to
registers as it likes, and every function has a certain register set
that it uses.

The compiler also generates code for calls to functions in other
modules that it has no idea of. Having no idea of a module, among
other things means that the compiler does not know the register set
that particular function uses. Especially, the compiler cannot know
for sure that the callee's register set doesn't overlap with the
caller's - in which case the caller would see useless garbage in the
registers on return of the callee.

Now that's the whole point: the compiler has to take care that the
code it generates spills registers before calling a function.

So, provided that unlock() is a function and not a macro, there is no
need to declare i volatile.

Hope this helps,
Joerg

Martin Berger

unread,

Jan 14, 2001, 5:09:29 AM1/14/01

to

Kaz Kylheku <k...@ashi.footprints.net>

> >well, in the c/c++ users journal, Andrei Alexandrescu recommends using
> >"volatile" to help avoiding race conditions. can the experts please slug
it
> out?
>
> The C/C++ Users Journal is a comedy of imbeciles.
>
> The article you are talking about completely ignores issues of memory
> coherency on multiprocessor systems. It is very geared toward Windows;
> the author seems to have little experience with multithreaded
> programming, and especially cross-platform multithreading.

would you care to elaborate how "volatile" causes problems with memory
consistency
on multiprocessors?

> Luckily, he
> provides a disclaimer by openly admitting that some code that he wrote
> suffers from occasional deadlocks.

are you suggesting that these deadlock are a consequence of using
"volatile"?
if so, how? i cannot find indications to deadlock inducing behavior of
"volatile"
in kernighan & ritchie

James Dennett

unread,

Jan 14, 2001, 5:54:52 AM1/14/01

to

David Schwartz wrote:
>
> Martin Berger wrote:
>
> > well, in the c/c++ users journal, Andrei Alexandrescu recommends using
> > "volatile" to help avoiding race conditions. can the experts please slug it out?
> > (note the cross posting)
> >
> > martin
>
> What race conditions? If you have a race condition, you need to _FIX_
> it. Something that might help "avoid" it isn't good enough. If I told my
> customers I added some code that helped avoid race conditions, they'd
> shoot me. Code shouldn't _have_ race conditions.

True, and Andrei's claim (which seems reasonable to me, though I've not
verified it in depth) is that his techniques, if used consistently,
will detect all race conditions *at compile time*. If you want to
ensure, absolutely, that no race conditions remain, then you could
try looking into Andrei's technique as a second line of defense.

-- James Dennett <jden...@acm.org>

David Schwartz

unread,

Jan 14, 2001, 5:58:21 AM1/14/01

to

Joerg Faschingbauer wrote:

> Now that's the whole point: the compiler has to take care that the
> code it generates spills registers before calling a function.

This is really not so. It's entirely possible that the compiler might
have some way of assuring that the particular value cached in the
register isn't used by the called function, and hence it can keep it in
a register.

For example:

extern void bar(void);

void foo(void)
{
int i;
i=3;
bar();
i--;
}

The compiler in this case might optimize 'i' away to nothing.
Fortunately, any possible way another thread could get its hands on a
variable is a way that a function in another compilation unit could get
its hands on the variable. Not only is there no legal C way 'bar' could
access 'i', there is no legal C way another thread could.

ConsideR:

extern void bar(void);
extern void qux(int *);

void foo(void)
{
int i;
i=3;
while(i<10)
{
i++;
bar();
i++;
qux(&i);
}
}

For all the compiler knows, 'qux' stores the pointer passed to it and
'bar' uses it. Think about:

int *ptr=NULL;
void bar(void)
{
if(ptr!=NULL) printf("i=%d\n", *ptr);
}

void qux(int *j)
{
ptr=j;
}

So the compiler would have to treat 'i' as if it was volatile in 'foo'
anyway.

So most compilers don't need any special help to compile multithreaded
code. Non-multithreaded code can do the same things.

DS

Kaz Kylheku

unread,

Jan 14, 2001, 5:59:02 AM1/14/01

to

On 13 Jan 2001 23:01:04 -0500, Joerg Faschingbauer

<jfa...@jfasch.faschingbauer.com> wrote:
>So, provided that unlock() is a function and not a macro, there is no
>need to declare i volatile.

Even if a compiler implements sophisticated global optimizations that
cross module boundaries, the compiler can still be aware of
synchronization functions and do the right thing around calls to those
functions.

Joerg Faschingbauer

unread,

Jan 14, 2001, 2:18:53 PM1/14/01

to

>>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

David> Joerg Faschingbauer wrote:

>> Now that's the whole point: the compiler has to take care that the
>> code it generates spills registers before calling a function.

David> This is really not so. It's entirely possible that the
David> compiler might have some way of assuring that the particular
David> value cached in the register isn't used by the called function,
David> and hence it can keep it in a register.

Of course you may once target a system where everything is tightly
coupled, and where the compiler you use to compile your module knows
about the very internals of the runtime - register allocations for
example. Then it could keep variables in registers even across runtime
function calls.

Even though such a thing is possible, it is quite unlikely - consider
the management effort of the people making (and upgrading!) such a
system. And even if people dared doing such a beast, this wouldn't be
POSIX - at least not with the functions that involve locking and
such. (There was a discussion here recently where Dave Butenhof made
this plausible - and I believe him :-}.)

(Of course there are compilers that do interprocedural and
intermodular (what a word!) optimization, involving such things as not
spilling registers before calling an external function. But usually
you have to compile the calling module and the callee module in one
swoop then - you pass more than one C file on the command line or some
such. But it is not common for you to compile your module together
with the mutex locking function modules of the C runtime.)

David> For example:

David> extern void bar(void);

David> void foo(void)
David> {
David> int i;
David> i=3;
David> bar();
David> i--;
David> }

David> The compiler in this case might optimize 'i' away to nothing.
David> Fortunately, any possible way another thread could get its
David> hands on a variable is a way that a function in another
David> compilation unit could get its hands on the variable. Not only
David> is there no legal C way 'bar' could access 'i', there is no
David> legal C way another thread could.

I don't understand the connection of this example to your statement
above.

Joerg

Kenneth Chiu

unread,

Jan 14, 2001, 2:19:18 PM1/14/01

to

In article <93qjt6$mm2$1...@lure.pipex.net>,

Martin Berger <martin...@orange.net> wrote:
>
>Kaz Kylheku <k...@ashi.footprints.net>
>
>> >well, in the c/c++ users journal, Andrei Alexandrescu recommends using
>> >"volatile" to help avoiding race conditions. can the experts please slug
>it
>> out?
>>
>> The C/C++ Users Journal is a comedy of imbeciles.
>>
>> The article you are talking about completely ignores issues of memory
>> coherency on multiprocessor systems. It is very geared toward Windows;
>> the author seems to have little experience with multithreaded
>> programming, and especially cross-platform multithreading.
>
>would you care to elaborate how "volatile" causes problems with memory
>consistency
>on multiprocessors?

It's not that volatile itself causes memory problems. It's that
it's not sufficient (and if under POSIX should not even be used).

He gives an example, which will work in practice, but if he had two
shared variables, would fail. Code like this, for example, would
be incorrect on an MP with a relaxed memory model. The write to flag_
could occur before the write to data_, despite the order in which the
assignments are written.

class Gadget {
public:
void Wait() {
while (!flag_) {
Sleep(1000); // sleeps for 1000 milliseconds
}
do_some_work(data_);
}
void Wakeup() {
data_ = ...;
flag_ = true;
}
...
private:
volatile bool flag_;
volatile int data_;
};

Joerg Faschingbauer

unread,

Jan 14, 2001, 2:23:39 PM1/14/01

to

>>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

David> Joerg Faschingbauer wrote:

>> Now that's the whole point: the compiler has to take care that the
>> code it generates spills registers before calling a function.

David> This is really not so. It's entirely possible that the compiler might
David> have some way of assuring that the particular value cached in the
David> register isn't used by the called function, and hence it can keep it in
David> a register.

David> For example:

David> extern void bar(void);

David> void foo(void)
David> {
David> int i;
David> i=3;
David> bar();
David> i--;
David> }

David> The compiler in this case might optimize 'i' away to nothing.
David> Fortunately, any possible way another thread could get its hands on a
David> variable is a way that a function in another compilation unit could get
David> its hands on the variable. Not only is there no legal C way 'bar' could
David> access 'i', there is no legal C way another thread could.

David> ConsideR:

David> extern void bar(void);
David> extern void qux(int *);

David> void foo(void)
David> {
David> int i;
David> i=3;

David> while(i<10)
David> {
David> i++;
David> bar();
David> i++;
David> qux(&i);
David> }
David> }

David> For all the compiler knows, 'qux' stores the pointer passed to it and
David> 'bar' uses it. Think about:

David> int *ptr=NULL;
David> void bar(void)
David> {
David> if(ptr!=NULL) printf("i=%d\n", *ptr);
David> }

David> void qux(int *j)
David> {
David> ptr=j;
David> }

David> So the compiler would have to treat 'i' as if it was volatile in 'foo'
David> anyway.

Yes, I believe this (exporting the address of a variable) is called
taking an alias in compilerology. The consequence of this is that it
inhibits holding it in a register.

David> So most compilers don't need any special help to compile multithreaded
David> code. Non-multithreaded code can do the same things.

Joerg

Kaz Kylheku

unread,

Jan 14, 2001, 2:20:42 PM1/14/01

to

On 14 Jan 2001 05:09:29 -0500, Martin Berger <martin...@orange.net> wrote:
>
>Kaz Kylheku <k...@ashi.footprints.net>
>
>> >well, in the c/c++ users journal, Andrei Alexandrescu recommends using
>> >"volatile" to help avoiding race conditions. can the experts please slug
>it
>> out?
>>
>> The C/C++ Users Journal is a comedy of imbeciles.
>>
>> The article you are talking about completely ignores issues of memory
>> coherency on multiprocessor systems. It is very geared toward Windows;
>> the author seems to have little experience with multithreaded
>> programming, and especially cross-platform multithreading.
>
>would you care to elaborate how "volatile" causes problems with memory
>consistency
>on multiprocessors?

The point is that it doesn't *solve* these problems, not that it causes
them. It's not enough to ensure that load and store instructions are
*issued* in some order by the processor, but also that they complete in
some order (or at least partial order) that is seen by all other
processors. At best, volatile defeats access optimizations at the
compiler level; in order to synchronize memory you need to do it at the
hardware level as well, which is often done with a special ``memory
barrier'' instruction.

In other words, volatile is not enough to eliminate race conditions,
at least not on all platforms.

>> Luckily, he
>> provides a disclaimer by openly admitting that some code that he wrote
>> suffers from occasional deadlocks.
>
>are you suggesting that these deadlock are a consequence of using
>"volatile"?

The point is that are you going to take multithreading advice from
someone who admittedly cannot eradicate known deadlocks from his
code? But good points for the honesty, clearly.

>if so, how? i cannot find indications to deadlock inducing behavior of
>"volatile"
>in kernighan & ritchie

This book says very little about volatile and contains no discussion of
threads; this is squarely beyond the scope of K&R.

Dylan Nicholson

unread,

Jan 14, 2001, 6:31:59 PM1/14/01

to

In article <slrn963vb...@ashi.FootPrints.net>,

k...@ashi.footprints.net wrote:
> On 14 Jan 2001 05:09:29 -0500, Martin Berger
<martin...@orange.net> wrote:
> >
> >Kaz Kylheku <k...@ashi.footprints.net>
> >
>

> The point is that are you going to take multithreading advice from
> someone who admittedly cannot eradicate known deadlocks from his
> code? But good points for the honesty, clearly.
>

Well I consider myself pretty well experienced in at least Win32
threads, and I'm working on a project now using POSIX threads (and a
POSIX wrapper for Win32 threads). I thought I had a perfectly sound
design that used only ONE mutex object, only ever used a stack-based
locker/unlocker to ensure it was never left locked, and yet I still got
deadlocks! The reason was simple, a) Calling LeaveCriticalSection on
an unowned critical section causes a deadlock in Win32 (this I consider
a bug, considering how trivial it is to test one member of the critical
section to avoid it), and b) I didn't realise that by default POSIX
mutexes only allowed one lock per thread (i.e. they were non-
recursive). To me these are quirks of the thread library, not design
faults in my code, so they don't necessarily indicate in lack of multi-
threaded knowledge. I don't pretend to know what deadlocks Andrei had,
but I wouldn't be surprised if it was a problem of that nature.
Although I haven't used the technique he described in his library, if I
had read it before I started coding my latest multi-threaded project, I
almost definitely would have given it a go.

Dylan

Sent via Deja.com
http://www.deja.com/

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Martin Berger

unread,

Jan 14, 2001, 6:34:01 PM1/14/01

to

Kaz Kylheku wrote:

> >would you care to elaborate how "volatile" causes problems with memory
> >consistency
> >on multiprocessors?
>
> The point is that it doesn't *solve* these problems, not that it causes
> them. It's not enough to ensure that load and store instructions are
> *issued* in some order by the processor, but also that they complete in
> some order (or at least partial order) that is seen by all other
> processors. At best, volatile defeats access optimizations at the
> compiler level; in order to synchronize memory you need to do it at the
> hardware level as well, which is often done with a special ``memory
> barrier'' instruction.
>
> In other words, volatile is not enough to eliminate race conditions,
> at least not on all platforms.

either you or me don't quite understand the point of the article. the
semantics of "volatile" is irrelevant for his stuff to work. all that
matters is how c++ typechecks classes and methods annotated with
"volatile", together with the usual rules for overloading and casting
away volatile.

if we'd change c++ to include a modifier "blob" and add the ability
to cast away blobness and make "blob" behave like "volatile" w.r.t
typechecking, overloading ... than his scheme would work just the
same way when "volatile" is replace by blob. that's at least how
i understand it.

> The point is that are you going to take multithreading advice from
> someone who admittedly cannot eradicate known deadlocks from his
> code? But good points for the honesty, clearly.

concurrency is an area where i trust no one, including myself.

> >if so, how? i cannot find indications to deadlock inducing behavior of
> >"volatile"
> >in kernighan & ritchie
>
> This book says very little about volatile and contains no discussion of
> threads; this is squarely beyond the scope of K&R.

well, it should be part of the sematics of the language and hence covered.

David Schwartz

unread,

Jan 14, 2001, 8:32:54 PM1/14/01

to

Dylan Nicholson wrote:

> deadlocks! The reason was simple, a) Calling LeaveCriticalSection on
> an unowned critical section causes a deadlock in Win32 (this I consider
> a bug, considering how trivial it is to test one member of the critical
> section to avoid it),

How would you "avoid it"? IMO, the best way to avoid it is for the
library to fail or crash when you do it. The more spetacularly it fails,
the better. If you break an interface contract, there is nothing sane
the library can do about it.

> and b) I didn't realise that by default POSIX
> mutexes only allowed one lock per thread (i.e. they were non-
> recursive).

Since recursive locks are more expensive and almost never needed, this
makes perfect sense. To assume a lock would be recursive by default is
to assume inferior design.

DS

Dylan Nicholson

unread,

Jan 14, 2001, 10:45:10 PM1/14/01

to

>
> How would you "avoid it"? IMO, the best way to avoid it is for
the
> library to fail or crash when you do it. The more spetacularly it
fails,
> the better. If you break an interface contract, there is nothing sane
> the library can do about it.
>

I agree, I just think it's a poor interface contract. The critical
section has a member indicating the owning thread, all you need to do
is you compare it against the current thread id and ignore the call if
it's not owned. I use this because in the destructor for my Mutex
class, it always unlocks it. I would have to add and maintain another
member variable to avoid unlocking in the case that the mutex was not
locked on destruction, but I don't see why this should be necessary.

> > and b) I didn't realise that by default POSIX
> > mutexes only allowed one lock per thread (i.e. they were non-
> > recursive).
>
> Since recursive locks are more expensive and almost never
needed, this
> makes perfect sense. To assume a lock would be recursive by default is
> to assume inferior design.
>

Well I agree they can be a little more expensive, but from my
experience they are worth the effort - I don't like to put
preconditions on widely used functions that mutexes must have been
acquired before entry - instead I have the function attempt to acquire
the mutex, and if it is already owned by the calling thread, you simply
increase the lock count. That way I can write code like the following:

Mutex TheMutex;
int SharedValue;
string SharedString;

int GetSharedValue()
{
AutoLock lock(TheMutex);
return SharedValue;
}

void Foo()
{
int i = GetSharedValue();
DoSomething(i);
}

void Bar()
{
AutoLock lock(TheMutex);
DoWhatever(SharedString);
int i = GetSharedValue();
DoSomethingElse(i);
}

Without having to worry about the state of the Mutex before calling
GetSharedValue. Without this my current project would probably require
4-5 times the numbers of explicit locks (i.e. GetSharedValue() is
called far more from unprotected code than from protected code).
Maybe there is a better design, but it's what I've been using for years
without any serious performance problems.

dale

unread,

Jan 14, 2001, 10:59:01 PM1/14/01

to

David Schwartz wrote:

> Code shouldn't _have_ race conditions.

Well, that's not entirely correct. If you have a number of
threads writing logging data to a file, which is protected
by a mutex, then the order in which they write -is- subject
to race conditions. This may or may not matter however.

Dale

Michiel Salters

unread,

Jan 15, 2001, 8:45:20 AM1/15/01

to

Joerg Faschingbauer wrote:

> >>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

> David> Joerg Faschingbauer wrote:
>
> >> Now that's the whole point: the compiler has to take care that the
> >> code it generates spills registers before calling a function.

> David> This is really not so. It's entirely possible that the
> David> compiler might have some way of assuring that the particular
> David> value cached in the register isn't used by the called function,
> David> and hence it can keep it in a register.

> Of course you may once target a system where everything is tightly
> coupled, and where the compiler you use to compile your module knows
> about the very internals of the runtime - register allocations for
> example. Then it could keep variables in registers even across runtime
> function calls.

> Even though such a thing is possible, it is quite unlikely - consider
> the management effort of the people making (and upgrading!) such a
> system.

I don't think things are that hard - especially in C++, which already
has name mangling. For each translation unit, it is possible to determine
which functions use which registers and what functions are called outside
that translation unit.
Now encode in the name mangling of a function which registers are used,
except of course those functions imported from another translation unit.
Just add to the name something like Reg=EAX_EBX_ECX. The full set of
registers used by a function is the set of registers it uses, plus
the registers used by function it calls. This will even work for
mutually recursive functions across translation units.

With this information the linker can, for each function call, determine
which registers need to be saved. And that is closely related to the
linkers main task: creating correct function calls across translation
units.

--
Michiel Salters
Michiel...@cmg.nl
sal...@lucent.com

James Kanze

unread,

Jan 15, 2001, 8:45:01 AM1/15/01

to

Martin Berger wrote:

> Dave Butenhof wrote:

> > Don't ever use the C/C++ language volatile in threaded code. It'll
> > kill your performance, and the language definition has nothing to
> > do with what you want when writing threaded code that shares
> > data. If some OS implementation tells you that you need to use it
> > anyway on their system, (in the words of a child safety group),
> > "run, yell, and tell". That's just stupid, and they shouldn't be
> > allowed to get away with it.

This is correct up to a point. The problem is that the C++ language
has no other way of signaling that a variable may be accessed by
several threads (and thus ensuring e.g. that it is really written
before the lock is released). The problem *isn't* with the OS; it is
with code movement within the optimizer of the compiler. And while I
agree with the sentiment: volatile isn't the solution, I don't know
how many compilers offer another one. (Of course, some compilers
don't optimize enough for there to be a problem:-).)

> well, in the c/c++ users journal, Andrei Alexandrescu recommends
> using "volatile" to help avoiding race conditions. can the experts
> please slug it out? (note the cross posting)

The crux of Andrei's suggestions really just exploits the compiler
type-checking with regards to volatile, and not the actual semantics
of volatile. If I've understood the suggestion correctly, it would
even be possible to implement it without ever accessing the individual
class members as if they were volatile (although in his examples, I
think he is also counting on volatile to inhibit code movement).

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

Andrei Alexandrescu

unread,

Jan 15, 2001, 8:52:18 AM1/15/01

to

"Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message
news:93sptr$o5s$1...@flotsam.uits.indiana.edu...

> He gives an example, which will work in practice, but if he had two
> shared variables, would fail. Code like this, for example, would
> be incorrect on an MP with a relaxed memory model. The write to flag_
> could occur before the write to data_, despite the order in which the
> assignments are written.
>
> class Gadget {
> public:
> void Wait() {
> while (!flag_) {
> Sleep(1000); // sleeps for 1000 milliseconds
> }
> do_some_work(data_);
> }
> void Wakeup() {
> data_ = ...;
> flag_ = true;
> }
> ...
> private:
> volatile bool flag_;
> volatile int data_;
> };

Your statement is true. However, that is only an _introduction_ to the
meaning of volatile in multithreaded code. I guess I should have thought of
a more elaborate example that would work on any machine. Anyway, the focus
of the article is different. It's about using the type system to detect race
conditions.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 8:52:56 AM1/15/01

to

"James Dennett" <jden...@acm.org> wrote in message
news:3A611BF5...@acm.org...

> True, and Andrei's claim (which seems reasonable to me, though I've not
> verified it in depth) is that his techniques, if used consistently,
> will detect all race conditions *at compile time*.

By the way, I maintain that.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 8:51:41 AM1/15/01

to

"Martin Berger" <martinb@--remove--me--dcs.qmw.ac.uk> wrote in message
news:3A620AB8.835DD808@--remove--me--dcs.qmw.ac.uk...

> Kaz Kylheku wrote:
> > The point is that it doesn't *solve* these problems, not that it causes
> > them. It's not enough to ensure that load and store instructions are
> > *issued* in some order by the processor, but also that they complete in
> > some order (or at least partial order) that is seen by all other
> > processors. At best, volatile defeats access optimizations at the
> > compiler level; in order to synchronize memory you need to do it at the
> > hardware level as well, which is often done with a special ``memory
> > barrier'' instruction.
> >
> > In other words, volatile is not enough to eliminate race conditions,
> > at least not on all platforms.
>
> either you or me don't quite understand the point of the article.

Or maybe me :o).

> the
> semantics of "volatile" is irrelevant for his stuff to work. all that
> matters is how c++ typechecks classes and methods annotated with
> "volatile", together with the usual rules for overloading and casting
> away volatile.
>
> if we'd change c++ to include a modifier "blob" and add the ability
> to cast away blobness and make "blob" behave like "volatile" w.r.t
> typechecking, overloading ... than his scheme would work just the
> same way when "volatile" is replace by blob. that's at least how
> i understand it.

This is exactly what the point of the article was.

> > The point is that are you going to take multithreading advice from
> > someone who admittedly cannot eradicate known deadlocks from his
> > code? But good points for the honesty, clearly.

To Mr. Kylheku: There is a misunderstanding here, and a rather gross one. I
wonder what's the text that made you believe I *couldn't* erradicate known
deadlocks. I simply sad that all threading-related runtime errors of our
program were only deadlocks and not race conditions, which precisely proves
the point that the article tried to make. Of course we fixed the deadlocks.
The point is that the compiler fixed the race conditions.

It's clear that you have a great deal of experience in multithreading code
on many platform, and I would be glad to expand my knowledge in the area. If
you would be willing to discuss in more civil terms, I would be glad to.
Also, if you would like to expand the discussion *beyond* the Gadget example
in the opening section of the article, and point possible reasoning errors
that I might have done, that would help the C++ community define the
"volatile correctness" term with precision. For now, I maintain the
conjectures I made.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 8:52:00 AM1/15/01

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:3A5FCDC3...@webmaster.com...

> What race conditions? If you have a race condition, you need to _FIX_
> it. Something that might help "avoid" it isn't good enough. If I told my
> customers I added some code that helped avoid race conditions, they'd
> shoot me. Code shouldn't _have_ race conditions.

There is a misunderstanding here - actually, quite a few in this and the
following posts.

Of course code must not _have_ race conditions. So that's why you must
eliminate them, which is what I meant by "avoid". Maybe I didn't use a word
that's strong enough.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 8:52:38 AM1/15/01

to

"Martin Berger" <martin...@orange.net> wrote in message
news:93qjt6$mm2$1...@lure.pipex.net...
>
> Kaz Kylheku <k...@ashi.footprints.net>
[snip]

> > The C/C++ Users Journal is a comedy of imbeciles.

Yay, the original message was moderated out.

> > The article you are talking about completely ignores issues of memory
> > coherency on multiprocessor systems. It is very geared toward Windows;
> > the author seems to have little experience with multithreaded
> > programming, and especially cross-platform multithreading.

I'm afraid Mr. Kylheku completely ignores the gist of the article. I know
only Windows, Posix and ACE threads, but that's beyond the point - what the
article tries to say is different. The article uses the volatile modifier as
a device for helping the type system detect race conditions at compile time.

> > Luckily, he
> > provides a disclaimer by openly admitting that some code that he wrote
> > suffers from occasional deadlocks.

This is a misunderstanding. What I said is that the technique described
can't help with deadlocks. In the end of the article I mentioned some
concrete experience with the technique. Indeed there were deadlocks - _only_
deadlocks - in the multithreaded code, simply because all race conditions
were weeded out by the compiler.

> are you suggesting that these deadlock are a consequence of using
> "volatile"?
> if so, how? i cannot find indications to deadlock inducing behavior of
> "volatile"
> in kernighan & ritchie

I guess this is yet another misunderstanding :o).

Andrei

James Kanze

unread,

Jan 15, 2001, 9:35:31 AM1/15/01

to

James Moe wrote:

> Martin Berger wrote:

> > well, in the c/c++ users journal, Andrei Alexandrescu recommends
> > using "volatile" to help avoiding race conditions. can the experts
> > please slug it out? (note the cross posting)

> Use mutexes or semaphores to control access to common data
> areas. It is what they are for. "volatile" is meant for things like
> hardware access where a device register can change at any time.

You still need some way of preventing the optimizer from deferring
writes until after the lock has been released. Ideally, the compiler
will understand the locking system (mutex, or whatever), and generate
the necessary write guards itself. Off hand, I don't know of any
compiler which meets this ideal.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:36:16 AM1/15/01

to

Joerg Faschingbauer wrote:

> int i;

I've used more than one compiler that does this. In fact, most do, at
least with optimization turned on.

> If that was the case the usage of the mutex would be completely
> pointless. At the time the function returns (a long time after the
> mutex was unlocked) the value of i is written back to memory. It
> then overwrites the changes that another thread may have made to the
> value in the meantime.

You guessed it.

> This is where volatile comes in. Common understanding is that
> volatile disables any optimization on a variable, so if you declare
> i volatile, the compiler won't keep it in a register, and all is
> well (if that was the definition of volatile - my understanding is
> that volatile is just a hint to the compiler, so nothing is well if
> you put it legally). Except that this misperforms - imagine you
> wouldn't have a simple increment in the critical section, but
> instead lots more of usage of i.

Volatile is more than just a hint, but it does have a lot of
implementation defined aspects. The *intent* (according to the C
standard, to which the C++ standard refers) is roughly what you
describe.

> Now what does a compiler do? It compiles modules independently. In
> doing so it performs optimizations (fortunately). Inside the module
> that is being compiled the compiler is free to assign variables to
> registers as it likes, and every function has a certain register set
> that it uses.

There is no requirement that a compiler compile modules independantly;
at least one major compiler has a final, post-link optimization phase
in which the optimizer looks beyond the module limits.

> The compiler also generates code for calls to functions in other
> modules that it has no idea of. Having no idea of a module, among
> other things means that the compiler does not know the register set
> that particular function uses.

What set a function can use without restoring is usually defined by
the calling conventions. The compiler not only can know it, it must
know it.

> Especially, the compiler cannot know for sure that the callee's
> register set doesn't overlap with the caller's - in which case the
> caller would see useless garbage in the registers on return of the
> callee.

This depends entirely on the compiler. And the hardware -- on a
Sparc, there are four banks of registers, two of which are
systematically saved and restored by the hardware. So each function
basically has 16 registers in which it can do anything it wishes.

> Now that's the whole point: the compiler has to take care that the
> code it generates spills registers before calling a function.

Not necessarily.

> So, provided that unlock() is a function and not a macro, there is
> no need to declare i volatile.

Not necessarily. It all depends on the compiler.

If the variable is global, and the compiler cannot analyse the unlock
function, it will have to assume that unlock may access the variable,
and so must ensure that the value is up to date. In practice, this IS
generally sufficient -- at some level, unlock resolves to a system
call, and the compiler certainly has no access to the source code of
the system call. So either 1) the compiler makes no assumption about
the system call, must assume that it might access the variable, and so
ensures the correct value, or 2) the compiler knows about system
requests, and which ones can access global variables. In the latter
case, of course, the compiler *should* also know that it needs a write
barrier after unlock. But unless this is actually documented in the
compiler documentation, I'd be leary about counting on it.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:34:46 AM1/15/01

to

Martin Berger wrote:

> David Schwartz wrote:

> > > well, in the c/c++ users journal, Andrei Alexandrescu recommends
> > > using "volatile" to help avoiding race conditions. can the
> > > experts please slug it out? (note the cross posting)

> > > martin

> > What race conditions? If you have a race condition, you
> > need to _FIX_ it. Something that might help "avoid" it isn't good
> > enough. If I told my customers I added some code that helped avoid
> > race conditions, they'd shoot me. Code shouldn't _have_ race
> > conditions.

> maybe i should have given more details: this idea is to use certain
> properties of the typing system of c++ with respect to "volatile",
> "const_cast", overloading and method invocation to the effect that
> race conditions will be type errors or generate at least compiler
> warnings (see

> http://www.cuj.com/experts/1902/alexandr.html

> for details). this is quite nifty, provided we ignore the problems
> dave has pointed out.

> interestingly, andrei's suggestions do not at all depend on the
> intended semantics of "volatile", only on how the typing systems
> checks it and handels const_cast and method invocation in this
> case. it it is possible to introduce a new c++ qualifier, say "blob"
> to the same effect but without the shortcomings (it would even and
> in controdistinction to the "volatile" base proposal, handle built
> in types correctly).

That's not totally true, at least not in his examples. I think he
also counts on volatile to some degree to inhibit code movement;
i.e. to prevent the compiler from moving some of the writes to after
the lock has been freed.

James Kanze

unread,

Jan 15, 2001, 9:36:58 AM1/15/01

to

Joerg Faschingbauer wrote:

> >>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

> David> Joerg Faschingbauer wrote:

> >> Now that's the whole point: the compiler has to take care that
> >> the code it generates spills registers before calling a function.

> David> This is really not so. It's entirely possible that the
> David> compiler might have some way of assuring that the particular
> David> value cached in the register isn't used by the called function,
> David> and hence it can keep it in a register.

> Of course you may once target a system where everything is tightly
> coupled, and where the compiler you use to compile your module knows
> about the very internals of the runtime - register allocations for
> example. Then it could keep variables in registers even across
> runtime function calls.

The compiler always knows about the internals of the runtime register
allocations, since it is the compiler which defines them (at least
partially).

> Even though such a thing is possible, it is quite unlikely -
> consider the management effort of the people making (and upgrading!)
> such a system. And even if people dared doing such a beast, this
> wouldn't be POSIX - at least not with the functions that involve
> locking and such. (There was a discussion here recently where Dave
> Butenhof made this plausible - and I believe him :-}.)

I'm not sure I understand your point. It sounds like you are saying
that it is possible for the compiler not to know which registers it
can use, which is manifestly ridiculous.

> (Of course there are compilers that do interprocedural and
> intermodular (what a word!) optimization, involving such things as
> not spilling registers before calling an external function. But
> usually you have to compile the calling module and the callee module
> in one swoop then - you pass more than one C file on the command
> line or some such. But it is not common for you to compile your
> module together with the mutex locking function modules of the C
> runtime.)

Usually (well, in the once case I actually know of:-)), the compiler
generates extra information in the object file, which is used by the
linker.

About all you can hope for is that a compiler this intelligent also
knows about threads, and can recognize a mutex request when it sees
one.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:40:18 AM1/15/01

to

Joerg Faschingbauer wrote:

[...]

> Yes, I believe this (exporting the address of a variable) is called
> taking an alias in compilerology. The consequence of this is that it
> inhibits holding it in a register.

Correct. The entire issue is called the aliasing problem, and it
makes good optimization extremely difficult. Note well: extremely
difficult, not impossible. In recent years, a few compilers have
gotten good enough to track uses of the variable through aliases and
accross module boundaries. And eventually keep aliased variables in a
register when it will improve performance.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:39:16 AM1/15/01

to

Kaz Kylheku wrote:

> On 13 Jan 2001 23:01:04 -0500, Joerg Faschingbauer
> <jfa...@jfasch.faschingbauer.com> wrote:
> >So, provided that unlock() is a function and not a macro, there is no
> >need to declare i volatile.

> Even if a compiler implements sophisticated global optimizations
> that cross module boundaries, the compiler can still be aware of
> synchronization functions and do the right thing around calls to
> those functions.

It can be. It should be. Is it? Do todays compilers actually do the
right thing, or are we just lucking out because most of them don't
optimize very aggressively anyway?

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 15, 2001, 9:46:20 AM1/15/01

to

Martin Berger wrote:

> Kaz Kylheku <k...@ashi.footprints.net>

> > >well, in the c/c++ users journal, Andrei Alexandrescu recommends
> > >using "volatile" to help avoiding race conditions. can the
> > >experts please slug it > > out?

> > The C/C++ Users Journal is a comedy of imbeciles.

And the moderator's let this through?

> > The article you are talking about completely ignores issues of
> > memory coherency on multiprocessor systems. It is very geared
> > toward Windows; the author seems to have little experience with
> > multithreaded programming, and especially cross-platform
> > multithreading.

> would you care to elaborate how "volatile" causes problems with
> memory consistency on multiprocessors?

Volatile doesn't cause problems of memory consistency. It's not
guaranteed to solve them, either.

Andrei's article didn't address the problem. Not because Andrei
didn't know the solution. (He may, or he may not. I don't know.)
But because that wasn't the subject of the article.

It might be worth pointing out the exact subject, since the poster you
are responding to obviously missed the point. Andrei basically
"overloads" the keyword volatile in a way that allows the compiler to
verify whether we will use locked access or not when accessing an
object. It offers an additional tool to simplify the writing (and the
verification) of multi-threaded code.

The article does NOT address the question of when locks are needed and
when the aren't. The article doesn't address the question of what is
actually needed when locks are needed, e.g. to ensure memory
coherency. These are other issues, and would require another article
(or maybe even an entire book). About the only real criticism I would
make about the article is that it isn't clear enough that he is
glossing over major issues, because they aren't relevant to that
particular article.

Martin Berger

unread,

Jan 15, 2001, 11:06:17 AM1/15/01

to

Andrei Alexandrescu wrote:

> > if we'd change c++ to include a modifier "blob" and add the ability
> > to cast away blobness and make "blob" behave like "volatile" w.r.t
> > typechecking, overloading ... than his scheme would work just the
> > same way when "volatile" is replace by blob. that's at least how
> > i understand it.
>
> This is exactly what the point of the article was.

this makes me think that c++ *should* be expanded to include something
like "blob" as a modifier. or maybe even user defined modifiers.

the problem with modifiers like "sharded" and the like is that compilers
cannot effectively guarantee the absence of race conditions, as would
suggested by using a name like "shared". with "blob" on the other
hand, all the compiler guarantees is that "blobness" is preserved
which is basically an easy typechecking problem and it is up to the
programmer to use this feature in whatever way she thinks appropriate
(eg for the prevention of race conditions). i also think that user defined
modifier semantics has uses beyond preventing race conditions. how about it?

martin

Martin Berger

unread,

Jan 15, 2001, 11:05:59 AM1/15/01

to

Andrei Alexandrescu wrote:

> To Mr. Kylheku: [...]

> Also, if you would like to expand the discussion *beyond* the Gadget example
> in the opening section of the article, and point possible reasoning errors
> that I might have done, that would help the C++ community define the
> "volatile correctness" term with precision. For now, I maintain the
> conjectures I made.

that would be a worthwhile contribution.

martin

Charles Bryant

unread,

Jan 15, 2001, 11:10:21 AM1/15/01

to

In article <93t924$2vi$1...@nnrp1.deja.com>,

Dylan Nicholson <dn...@my-deja.com> wrote:
>In article <slrn963vb...@ashi.FootPrints.net>,
> k...@ashi.footprints.net wrote:
>>
>> The point is that are you going to take multithreading advice from
>> someone who admittedly cannot eradicate known deadlocks from his
>> code? But good points for the honesty, clearly.
>>
>Well I consider myself pretty well experienced in at least Win32
>threads, and I'm working on a project now using POSIX threads (and a
>POSIX wrapper for Win32 threads). I thought I had a perfectly sound
>design that used only ONE mutex object, only ever used a stack-based
>locker/unlocker to ensure it was never left locked, and yet I still got
>deadlocks! The reason was simple, a) Calling LeaveCriticalSection on
>an unowned critical section causes a deadlock in Win32 (this I consider
>a bug, considering how trivial it is to test one member of the critical
>section to avoid it)

You have a fundamental misunderstanding of the nature of programming.
Programming does not require speculation aboud how something might be
implemented and certainly does not involve writing one's own code
such that it depends on that speculation. Programming involves
determining the guaranteed and documented behaviour of the components
that will be needed and then relying solely on that documented
behaviour.

Calling LeaveCriticalSection() without entering the critical section would
be just as much a bug even if causing it to fail required a huge
amount of very slow code in the library which implements
LeaveCritcalSection. The *only* thing relevant to whether it's a bug
or not is whether the documentation permits it or not.

--
Eppur si muove

Balog Pal

unread,

Jan 15, 2001, 11:14:56 AM1/15/01

to

"Dylan Nicholson" <dn...@my-deja.com> wrote

> deadlocks! The reason was simple, a) Calling LeaveCriticalSection on
> an unowned critical section causes a deadlock in Win32 (this I consider
> a bug,

I consider doing illegal stuff a programming error. For critical sections
I'd go somewhat further, as those you shall wrap into classes anyway, and
use lock guards like CSingleLock. Then doing something odd is pretty hard.
But if you manage it must be a logic error in the program. And a sign for
looking out for other errors too.

> considering how trivial it is to test one member of the critical
> section to avoid it),

That is IMHO irrevelant.

> and b) I didn't realise that by default POSIX
> mutexes only allowed one lock per thread (i.e. they were non-
> recursive).

Yep, they are. But you can implement your wn recursice mutex (workinng like
the WIN32 criticalsection.)

> To me these are quirks of the thread library, not design
> faults in my code,

Your way of looking is somewhat nonstandard. ;-)

> so they don't necessarily indicate in lack of multi-
> threaded knowledge.

Maybe they indicate ignorance. A system API works as it is described, not
along your thoughts, or along your expectations. The user code is the thing
you must write to use tha API as it serves not the other way around. (You
can claim yourself innocent if the dox is in error, like it specifies posix
mutex being recursie then it turn out being fast. But that is not the case.)

Paul

John Mullins

unread,

Jan 15, 2001, 11:15:49 AM1/15/01

to

"James Kanze" <James...@dresdner-bank.com> wrote in message
news:3A62CF96...@dresdner-bank.com...

> That's not totally true, at least not in his examples. I think he
> also counts on volatile to some degree to inhibit code movement;
> i.e. to prevent the compiler from moving some of the writes to after
> the lock has been freed.

But his examples also rely on undefined behaviour so he can't really
count on anything.

JM

Kenneth Chiu

unread,

Jan 15, 2001, 11:16:27 AM1/15/01

to

In article <3A62CF03...@dresdner-bank.com>,

James Kanze <James...@dresdner-bank.com> wrote:
>Martin Berger wrote:
>
>> Dave Butenhof wrote:
>
>> > Don't ever use the C/C++ language volatile in threaded code. It'll
>> > kill your performance, and the language definition has nothing to
>> > do with what you want when writing threaded code that shares
>> > data. If some OS implementation tells you that you need to use it
>> > anyway on their system, (in the words of a child safety group),
>> > "run, yell, and tell". That's just stupid, and they shouldn't be
>> > allowed to get away with it.
>
>This is correct up to a point. The problem is that the C++ language
>has no other way of signaling that a variable may be accessed by
>several threads (and thus ensuring e.g. that it is really written
>before the lock is released). The problem *isn't* with the OS; it is
>with code movement within the optimizer of the compiler.

At this point it isn't really a C++ language issue anymore. However,
if a vendor claims that their compiler is compatible with POSIX
threads, then it is up to them to insure that memory is written before
the unlock.

Kenneth Chiu

unread,

Jan 15, 2001, 12:21:02 PM1/15/01

to

In article <93u77g$bqakt$3...@ID-14036.news.dfncis.de>,

Andrei Alexandrescu <andre...@hotmail.com> wrote:
>"Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message
>news:93sptr$o5s$1...@flotsam.uits.indiana.edu...
>> He gives an example, which will work in practice, but if he had two
>> shared variables, would fail. Code like this, for example, would
>> be incorrect on an MP with a relaxed memory model. The write to flag_
>> could occur before the write to data_, despite the order in which the
>> assignments are written.
>>

>> ...

>
>Your statement is true. However, that is only an _introduction_ to the
>meaning of volatile in multithreaded code. I guess I should have thought of
>a more elaborate example that would work on any machine. Anyway, the focus
>of the article is different. It's about using the type system to detect race
>conditions.

Yes, I have no quibble with the rest of the article, and in fact thought
it was an interesting idea.

However, I find some of the statements in the introduction to be overly
general.

Basically, without volatile, either writing multithreaded programs
becomes impossible, or the compiler wastes vast optimization
opportunities.

This may be true for some thread standards, but if the vendor claims
that they support POSIX threads with their C++ compiler, then shared
variables should not be declared volatile when using POSIX threads.

Andrei Alexandrescu

unread,

Jan 15, 2001, 1:20:38 PM1/15/01

to

"John Mullins" <John.M...@crossprod.co.uk> wrote in message
news:93v2gj$4fs$1...@newsreaderg1.core.theplanet.net...

>
> "James Kanze" <James...@dresdner-bank.com> wrote in message
> news:3A62CF96...@dresdner-bank.com...
>
> > That's not totally true, at least not in his examples. I think he
> > also counts on volatile to some degree to inhibit code movement;
> > i.e. to prevent the compiler from moving some of the writes to after
> > the lock has been freed.
>
> But his examples also rely on undefined behaviour so he can't really
> count on anything.

Are you referring to const_cast? Strictly speaking, indeed. But then, all MT
programming in C/C++ has undefined behavior.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 2:31:08 PM1/15/01

to

"James Kanze" <James...@dresdner-bank.com> wrote in message

news:3A62CF03...@dresdner-bank.com...

> The crux of Andrei's suggestions really just exploits the compiler
> type-checking with regards to volatile, and not the actual semantics
> of volatile. If I've understood the suggestion correctly, it would
> even be possible to implement it without ever accessing the individual
> class members as if they were volatile (although in his examples, I
> think he is also counting on volatile to inhibit code movement).

Thanks James for all your considerations before and after the article
appeared.

There is a point about the use of volatile proposed by the article. If you
write volatile correct code as prescribed by the article, you _never_
*never* NEVER use volatile variables. You _always_ *always* ALWAYS lock a
synchronization object, cast the volatile away, operate on the so-obtained
non-volatile alias, let the alias go, and unlock the synchronization object,
in this order.

Maybe I should have made it clearer that in volatile-correct code, you never
operate on volatile data - you always cast volatile away and more
specifically, you cast it away when it is *semantically correct* to do so
because you locked the afferent synchronization object.

I would be glad if someone explained me in what situations a compiler can
rearrange instructions to the extent that it would invalidate the idiom that
the article proposes. OTOH, such compilers invalidate a number of idioms
anyway, such as the Double-Check Locking Pattern, used by Doug Schmidt in
ACE.

Andrei

Andrei Alexandrescu

unread,

Jan 15, 2001, 2:37:05 PM1/15/01

to

"Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message

news:93v8d2$3aa$1...@flotsam.uits.indiana.edu...

> However, I find some of the statements in the introduction to be overly
> general.
>
> Basically, without volatile, either writing multithreaded programs
> becomes impossible, or the compiler wastes vast optimization
> opportunities.
>
> This may be true for some thread standards, but if the vendor claims
> that they support POSIX threads with their C++ compiler, then shared
> variables should not be declared volatile when using POSIX threads.

I understand your point and agree with it. That statement of mine was a
mistake.

Andrei

Kaz Kylheku

unread,

Jan 15, 2001, 2:39:50 PM1/15/01

to

On 14 Jan 2001 22:59:01 -0500, dale <da...@cs.rmit.edu.au> wrote:
>David Schwartz wrote:
>
>> Code shouldn't _have_ race conditions.
>
>Well, that's not entirely correct. If you have a number of
>threads writing logging data to a file, which is protected
>by a mutex, then the order in which they write -is- subject
>to race conditions. This may or may not matter however.

If it doesn't matter, it's hardly a race condition! A race condition
occurs when the program fails to compute one of the possible correct
results due to a fluctuation in the execution order.

David Schwartz

unread,

Jan 15, 2001, 3:39:11 PM1/15/01

to

Dylan Nicholson wrote:

> I agree, I just think it's a poor interface contract. The critical
> section has a member indicating the owning thread, all you need to do
> is you compare it against the current thread id and ignore the call if
> it's not owned. I use this because in the destructor for my Mutex
> class, it always unlocks it. I would have to add and maintain another
> member variable to avoid unlocking in the case that the mutex was not
> locked on destruction, but I don't see why this should be necessary.

In the debug library, sure. In the release library, no way. Every cycle
ReleaseCriticalSection takes is performance-critical. RCS is the
highest-performance synchronization object WIN32 offers (not counting
Interlocked*, which isn't usually useful).

> > Since recursive locks are more expensive and almost never
> > needed, this
> > makes perfect sense. To assume a lock would be recursive by default is
> > to assume inferior design.

> Well I agree they can be a little more expensive, but from my
> experience they are worth the effort - I don't like to put
> preconditions on widely used functions that mutexes must have been
> acquired before entry - instead I have the function attempt to acquire
> the mutex, and if it is already owned by the calling thread, you simply
> increase the lock count. That way I can write code like the following:
[snip]
> Without having to worry about the state of the Mutex before calling
> GetSharedValue. Without this my current project would probably require
> 4-5 times the numbers of explicit locks (i.e. GetSharedValue() is
> called far more from unprotected code than from protected code).
> Maybe there is a better design, but it's what I've been using for years
> without any serious performance problems.

Performance isn't so much the issue. It's failure-free and logical
operation.

Consider the following:

SomeFunction()
{
LockMutex();
DoSomeWork();
UnlockMutex();
WaitForSomething();
}

This function will deadlock if the mutex used is recursive. And
associating a condition variable with a recursive mutex can cause some
very surprising results.

Here's another one:

SomeFunction()
{
LockMutexA();
DoStuff();
LockMutexB();
DoMoreStuff();
UnlockMutexB();
UnlockMutexA();
}

If all other code locks A before B, this will never deadlock. But what
if B is recursive and this function is entered while holding mutex B?
Now you have code that seems to work, but will sometimes deadlock
unpredictably.

In order to avoid these types of pitfalls, all the code that uses a
recursive mutex _must_ be engineered together. And in that case, it's
trivial to add your own code to track the ownership of the mutex. And
unless you do this horribly wrong, this code will be more efficient than
the equivalent code in the library.

DS

Konrad Schwarz

unread,

Jan 15, 2001, 4:15:11 PM1/15/01

to

James Kanze wrote:
>
> Martin Berger wrote:
>
> > Dave Butenhof wrote:
>
> > > Don't ever use the C/C++ language volatile in threaded code. It'll
> > > kill your performance, and the language definition has nothing to
> > > do with what you want when writing threaded code that shares
> > > data. If some OS implementation tells you that you need to use it
> > > anyway on their system, (in the words of a child safety group),
> > > "run, yell, and tell". That's just stupid, and they shouldn't be
> > > allowed to get away with it.
>
> This is correct up to a point. The problem is that the C++ language
> has no other way of signaling that a variable may be accessed by
> several threads (and thus ensuring e.g. that it is really written
> before the lock is released). The problem *isn't* with the OS; it is
> with code movement within the optimizer of the compiler. And while I
> agree with the sentiment: volatile isn't the solution, I don't know
> how many compilers offer another one. (Of course, some compilers
> don't optimize enough for there to be a problem:-).)

So the optimization of keeping variables in registers accross
function calls is illegal in general (and thus must not be performed),
if the compiler cannot prove that the code will not be linked into a
multi-threaded program or it cannot prove that those variables will
never be shared.

However, the C language has a way of signaling that local variables
cannot be accessed by other threads, namely by placing them in
the register storage class. I don't know about C++; if I remember
correctly, C++ degrades register to a mere "efficiency hint".

Konrad Schwarz

unread,

Jan 15, 2001, 4:16:41 PM1/15/01

to

James Kanze wrote:
>
> Kaz Kylheku wrote:
>
> > On 13 Jan 2001 23:01:04 -0500, Joerg Faschingbauer
> > <jfa...@jfasch.faschingbauer.com> wrote:
> > >So, provided that unlock() is a function and not a macro, there is no
> > >need to declare i volatile.
>
> > Even if a compiler implements sophisticated global optimizations
> > that cross module boundaries, the compiler can still be aware of
> > synchronization functions and do the right thing around calls to
> > those functions.
>
> It can be. It should be. Is it? Do todays compilers actually do the
> right thing, or are we just lucking out because most of them don't
> optimize very aggressively anyway?
>

If the compiler supports multi-threading (at least POSIX
multi-threading),
then it *must*, since POSIX does not require shared variables to
be volatile qualified. If the compiler decides to
keep values in registers across function calls, it must be able to prove
that
* either these variables are never shared by another thread
* or the functions in question never perform inter-thread operations

Kaz Kylheku

unread,

Jan 15, 2001, 4:17:44 PM1/15/01

to

On 15 Jan 2001 08:52:18 -0500, Andrei Alexandrescu

<andre...@hotmail.com> wrote:
>"Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message

>news:93sptr$o5s$1...@flotsam.uits.indiana.edu...
>> He gives an example, which will work in practice, but if he had two
>> shared variables, would fail. Code like this, for example, would
>> be incorrect on an MP with a relaxed memory model. The write to flag_
>> could occur before the write to data_, despite the order in which the
>> assignments are written.
>>

>> class Gadget {
>> public:
>> void Wait() {
>> while (!flag_) {
>> Sleep(1000); // sleeps for 1000 milliseconds
>> }
>> do_some_work(data_);
>> }
>> void Wakeup() {
>> data_ = ...;
>> flag_ = true;
>> }
>> ...
>> private:
>> volatile bool flag_;
>> volatile int data_;
>> };

>
>Your statement is true. However, that is only an _introduction_ to the
>meaning of volatile in multithreaded code. I guess I should have thought of
>a more elaborate example that would work on any machine.

You cannot come up with such an example without resorting to
platform-and compiler specific techniques, such as inline assembly language
to insert memory barrier instructions.

In the above example, if one thread writes to data_ and then sets flag_
there is absolutely no assurance that another thread running on another
processor will see these updates in the same order. It is possible for
flag_ to appear to flip true, but data_ to not have been updated yet!

Moreover, there is no assurance that data_ is updated atomically, so
that a processor can either see its old value or its new value, never
any half-baked value in between.

Resolving these issues can't be done in standard C++, so there is no
one examples that can fit all C++ platforms. This makes sense, since
threads are not currently part of the C++ language. (What I don't
understand is why the moderator of comp.lang.c++.moderated is even
allowing this discussion, which clearly belongs in
comp.programming.threads only).

>Anyway, the focus
>of the article is different. It's about using the type system to detect race
>conditions.

This thread was started in comp.programming.threads by Lie-Quan Lee
<ll...@lsc.nd.edu> who was specifically interested in knowing whether
rules similar to the POSIX memory visibility rules apply to other
multithreading platforms.

-> One question is, whether those memory visibility rules are applicable
-> for other thread system such as Solaris UI threads, or win32 threads,
-> or JAVA threads ...? If yes, we can follow the same spirit. Otherwise,
-> it will be a big difference. (For example, all shared variables might
-> have to be difined as volatile even with mutex protection.)

Dave Butenhof then replied:

-> Don't ever use the C/C++ language volatile in threaded code. It'll
-> kill your performance, and the language definition has nothing to do
-> with what you want when writing threaded code that shares data. If
-> some OS implementation tells you that you need to use it anyway on
-> their system, (in the words of a child safety group), "run, yell, and
-> tell". That's just stupid, and they shouldn't be allowed to get away
-> with it.

To which Martin Berger replied (and added, for some strange reason,
comp.lang.c++.moderated to the Newsgroups: header). This is the first
time the CUJ article was mentioned, clearly in the context of a
comp.programming.threads debate about memory visibility rules,
not in the context of a debate about C++ or qualifier-correctness:

-> well, in the c/c++ users journal, Andrei Alexandrescu recommends using
-> "volatile" to help avoiding race conditions. can the experts please
-> slug it out?(note the cross posting)

So it appears that, article does create some confusion at least in the
minds of some readers between volatile used as a request for special
access semantics, and volatile used as a constraint-checking access
control for class member function calls.

INcidentally, I believe that the second property can be exploited
without dragging in the semantics of volatile. Simply do something like
this:

#ifdef RACE_CHECK
#define VOLATILE volatile
#else
#define VOLATILE
#endif

When producing production object code, do not define RACE_CHECK; define
it only when you want to create extra semantic checks for the compiler
to diagnose. Making up a name other than ``VOLATILE'' might be useful
to clarify that what is being done has nothing to do with defeating
optimization.

Joerg Faschingbauer

unread,

Jan 15, 2001, 4:19:30 PM1/15/01

to

Duh! What compiler and what language are you talking about?

>>>>> "James" == James Kanze <James...@dresdner-bank.com> writes:

James> Joerg Faschingbauer wrote:
>> >>>>> "David" == David Schwartz <dav...@webmaster.com> writes:

David> Joerg Faschingbauer wrote:

>> >> Now that's the whole point: the compiler has to take care that
>> >> the code it generates spills registers before calling a function.

David> This is really not so. It's entirely possible that the
David> compiler might have some way of assuring that the particular
David> value cached in the register isn't used by the called function,
David> and hence it can keep it in a register.

>> Of course you may once target a system where everything is tightly
>> coupled, and where the compiler you use to compile your module knows
>> about the very internals of the runtime - register allocations for
>> example. Then it could keep variables in registers even across
>> runtime function calls.

James> The compiler always knows about the internals of the runtime register
James> allocations, since it is the compiler which defines them (at least
James> partially).

>> Even though such a thing is possible, it is quite unlikely -
>> consider the management effort of the people making (and upgrading!)
>> such a system. And even if people dared doing such a beast, this
>> wouldn't be POSIX - at least not with the functions that involve
>> locking and such. (There was a discussion here recently where Dave
>> Butenhof made this plausible - and I believe him :-}.)

James> I'm not sure I understand your point. It sounds like you are saying
James> that it is possible for the compiler not to know which registers it
James> can use, which is manifestly ridiculous.

>> (Of course there are compilers that do interprocedural and
>> intermodular (what a word!) optimization, involving such things as
>> not spilling registers before calling an external function. But
>> usually you have to compile the calling module and the callee module
>> in one swoop then - you pass more than one C file on the command
>> line or some such. But it is not common for you to compile your
>> module together with the mutex locking function modules of the C
>> runtime.)

James> Usually (well, in the once case I actually know of:-)), the compiler
James> generates extra information in the object file, which is used by the
James> linker.

James> About all you can hope for is that a compiler this intelligent also
James> knows about threads, and can recognize a mutex request when it sees
James> one.

David Schwartz

unread,

Jan 15, 2001, 4:21:17 PM1/15/01

to

dale wrote:
>
> David Schwartz wrote:
>
> > Code shouldn't _have_ race conditions.
>
> Well, that's not entirely correct. If you have a number of
> threads writing logging data to a file, which is protected
> by a mutex, then the order in which they write -is- subject
> to race conditions. This may or may not matter however.

If all of the possible outputs are valid, it's not a race condition.
The definition of a "race condition" is a programming construct where
the resultant output can be valid or invalid based upon the vagaries of
system timing.

DS

Tom Payne

unread,

Jan 15, 2001, 4:21:36 PM1/15/01

to

In comp.lang.c++.moderated James Dennett <jden...@acm.org> wrote:
[...]
: Andrei's claim (which seems reasonable to me, though I've not
: verified it in depth) is that his techniques, if used consistently,
: will detect all race conditions *at compile time*.

His technique seems a good way to guarantee atomicity of certain
operations, but AFIK it doesn't detect or prevent all situation where
the outcome of multiple operations on a thread-shared object depends
on how those threads are scheduled.

class Int {
int i;
public
Int() i(0) {}
double() { i = 2*i; }
incr() { i = i+1; }
}

Apply Andrei's technique to Int and then create a static Int k and two
threads:
- thread1 increments k and then exits
- thread2 doubles k and then exits.
Visibly, the final value of k.i is going to depend on the scheduling of
these two threads.

AFIK, detecting race conditions is equivalent to the halting problem.

Tom Payne

David Schwartz

unread,

Jan 15, 2001, 4:22:12 PM1/15/01

to

James Kanze wrote:

> You still need some way of preventing the optimizer from deferring
> writes until after the lock has been released. Ideally, the compiler
> will understand the locking system (mutex, or whatever), and generate
> the necessary write guards itself. Off hand, I don't know of any
> compiler which meets this ideal.

Give an example of what you think the problem is. The typical solution
is to give the compiler no information at all about the locking system.
Since the compiler then must assume the locking system could do
anything, it can't optimize anything across it.

It's not clear to me what you mean by "deferring writes". This could
either refer to variables being cached in registers and not written back
or it couuld refer to a hardware write cache not being flushed.
Fortunately, neither is a problem. Variables can't be cached in
registers because the compiler doesn't know what the lock/unlock
functions do, and so must assume they might access those variables from
their memory locations. Hardware write caches aren't a problem, because
the lock/unlock functions contain the appropriate memory barrier. The
compiler doesn't know this, but the compiler has nothing to do with such
hardware write reordering and so doesn't need to.

DS

Kaz Kylheku

unread,

Jan 15, 2001, 4:27:31 PM1/15/01

to

On 15 Jan 2001 13:20:38 -0500, Andrei Alexandrescu

<andre...@hotmail.com> wrote:
>"John Mullins" <John.M...@crossprod.co.uk> wrote in message

>> > the lock has been freed.
>>
>> But his examples also rely on undefined behaviour so he can't really
>> count on anything.
>
>Are you referring to const_cast? Strictly speaking, indeed. But then, all MT
>programming in C/C++ has undefined behavior.

However, some MT programming has another standard to serve as a safety
net. For example, correct POSIX MT programming is well-defined within
the realm of POSIX threads, even though it's not well-defined C++.
>From a C++ language point of view, the behavior is undefined; however,
the program correctly uses a documented extension.

When you say undefined behavior, there is some implicit interface
standard that is intended, be it ANSI/ISO C++, POSIX or what have you.
In comp.programming.threads, undefined has a necessarily weaker
meaning; obviously some multithreaded programs are deemed to be well
defined with respect to some interface.

It's not clear what class of undefined behavior John was referring to
here.

Dylan Nicholson

unread,

Jan 15, 2001, 5:12:40 PM1/15/01

to

In article <3A635FEF...@webmaster.com>,

David Schwartz <dav...@webmaster.com> wrote:
>
> Dylan Nicholson wrote:
>
> > I agree, I just think it's a poor interface contract. The critical
> > section has a member indicating the owning thread, all you need to
do
> > is you compare it against the current thread id and ignore the call
if
> > it's not owned. I use this because in the destructor for my Mutex
> > class, it always unlocks it. I would have to add and maintain
another
> > member variable to avoid unlocking in the case that the mutex was
not
> > locked on destruction, but I don't see why this should be necessary.
>
> In the debug library, sure. In the release library, no way.
Every cycle
> ReleaseCriticalSection takes is performance-critical. RCS is the
> highest-performance synchronization object WIN32 offers (not counting
> Interlocked*, which isn't usually useful).
>

I'm not sure what you're getting at here...I can see that if
performance was the only thing that counted then checking my
own "islocked" variable before calling LeaveCriticalSection in the
destructor would be preferable, but in my case it isn't.

> Performance isn't so much the issue. It's failure-free and
logical
> operation.
>
> Consider the following:
>
> SomeFunction()
> {
> LockMutex();
> DoSomeWork();
> UnlockMutex();
> WaitForSomething();
> }
>
> This function will deadlock if the mutex used is recursive. And
> associating a condition variable with a recursive mutex can cause some
> very surprising results.
>

I don't use condition variables anyway, but you will have to explain to
me how the function can deadlock if the mutex is recursive. As far as
I understand it, recursive mutexes cannot deadlock with themselves, you
need two mutexes to get a deadlock.

> Here's another one:
>
> SomeFunction()
> {
> LockMutexA();
> DoStuff();
> LockMutexB();
> DoMoreStuff();
> UnlockMutexB();
> UnlockMutexA();
> }
>
> If all other code locks A before B, this will never deadlock.
But what
> if B is recursive and this function is entered while holding mutex B?
> Now you have code that seems to work, but will sometimes deadlock
> unpredictably.
>

Ok, I can see how this could deadlock, if one thread locked B before
entering SomeFunction, while another thread was already inside
SomeFunction and had locked Mutex A. But this would still occur if the
mutexes where non-recursive - you just might happen to catch it earlier
because it would happen within a single thread. To avoid deadlocks
between two mutexes, it probably IS necessary to put preconditions on
functions (e.g. Mutex X must not be locked), but to be honest I very
rarely need to lock more than one mutex at once.

> In order to avoid these types of pitfalls, all the code that
uses a
> recursive mutex _must_ be engineered together. And in that case, it's
> trivial to add your own code to track the ownership of the mutex. And
> unless you do this horribly wrong, this code will be more efficient
than
> the equivalent code in the library.

Actually I had to implement my own recursive mutexes, because they
weren't supported on all the platforms we target. And it's not _THAT_
trivial, it required two mutexes, an owner thread id, and a count. I'm
absolutely sure I _haven't_ implemented it more efficiently that a
library could (I know Win32 uses spinlocks to do the counting instead
of a second mutex, which if I get time I will switch over to under
POSIX).

Dylan

Sent via Deja.com
http://www.deja.com/

Ron Natalie

unread,

Jan 15, 2001, 5:21:35 PM1/15/01

to

> However, the C language has a way of signaling that local variables
> cannot be accessed by other threads, namely by placing them in
> the register storage class.

Huh? How is that? The C language doesn't contain the word thread
anywhere. Register is auto + a hint to keep in a register.

> I don't know about C++; if I remember
> correctly, C++ degrades register to a mere "efficiency hint".

The ONLY difference between C and C++ is that C++ allows you to
take the address of something with a register storage class (noting
that doing so may force it out of a register), while C prohibits
the & operator on register-declared objects even if they weren't
actually put in a register by the compiler.

David Schwartz

unread,

Jan 15, 2001, 5:29:02 PM1/15/01

to

Dylan Nicholson wrote:

> I'm not sure what you're getting at here...I can see that if
> performance was the only thing that counted then checking my
> own "islocked" variable before calling LeaveCriticalSection in the
> destructor would be preferable, but in my case it isn't.

When you are talking about the highest-performance mutex operations
available on a platform, performance in a release build is crucial.

> > SomeFunction()
> > {
> > LockMutex();
> > DoSomeWork();
> > UnlockMutex();
> > WaitForSomething();
> > }
> >
> > This function will deadlock if the mutex used is recursive. And
> > associating a condition variable with a recursive mutex can cause some
> > very surprising results.

> I don't use condition variables anyway, but you will have to explain to
> me how the function can deadlock if the mutex is recursive. As far as
> I understand it, recursive mutexes cannot deadlock with themselves, you
> need two mutexes to get a deadlock.

Sure it can deadlock. Suppose 'WaitForSomething' requires that another
thread acquire the mutex. SomeFunction thinks it unlocked the mutex,
since it called 'UnlockMutex'. However, it didn't really unlock the
mutex, so it could be waiting forever.

Whether or not you use condition variables doesn't change the fact that
their semantics are intimately entangled with the semantics of mutexes.
Further, the use of a condition variable associated with a mutex can be
of local scope and not obvious in other places.

> > Here's another one:
> >
> > SomeFunction()
> > {
> > LockMutexA();
> > DoStuff();
> > LockMutexB();
> > DoMoreStuff();
> > UnlockMutexB();
> > UnlockMutexA();
> > }
> >
> > If all other code locks A before B, this will never deadlock.
> > But what
> > if B is recursive and this function is entered while holding mutex B?
> > Now you have code that seems to work, but will sometimes deadlock
> > unpredictably.

> Ok, I can see how this could deadlock, if one thread locked B before
> entering SomeFunction, while another thread was already inside
> SomeFunction and had locked Mutex A. But this would still occur if the
> mutexes where non-recursive - you just might happen to catch it earlier
> because it would happen within a single thread.

It can't happen with non-recursive mutexes because in that case, it
would be illegal to call SomeFunction while holding either A or B. With
recursive mutxes, it is legal to call SomeFunction holding A or B.

> To avoid deadlocks
> between two mutexes, it probably IS necessary to put preconditions on
> functions (e.g. Mutex X must not be locked), but to be honest I very
> rarely need to lock more than one mutex at once.

If you have to put preconditions on functions anyway, why bother with
recursive mutexes? The only advantage of recursive mutexes is that they
relax this conditions. The problem is, it's extremely difficult to know
in general what level of requirement relaxation is safe. And recursive
mutexes relax all the requirements all the way and provide you no way to
detect when you break your rules.

> > In order to avoid these types of pitfalls, all the code that
> > uses a
> > recursive mutex _must_ be engineered together. And in that case, it's
> > trivial to add your own code to track the ownership of the mutex. And
> > unless you do this horribly wrong, this code will be more efficient
> > than
> > the equivalent code in the library.

> Actually I had to implement my own recursive mutexes, because they
> weren't supported on all the platforms we target. And it's not _THAT_
> trivial, it required two mutexes, an owner thread id, and a count. I'm
> absolutely sure I _haven't_ implemented it more efficiently that a
> library could (I know Win32 uses spinlocks to do the counting instead
> of a second mutex, which if I get time I will switch over to under
> POSIX).

You have implemented more efficiently than a library that provided
recursive mutexes by default possibly could. That's because more than
90% of the time people use mutexes, they neither want nor need
recursion. It can be done with one mutex if you're careful, because a
thread cannot get into a race condition with itself. You may need to
have one thread that never uses any recursive mutexes, so you can use
its pthread_t to mean 'unowned'.

DS

Kaz Kylheku

unread,

Jan 15, 2001, 6:33:49 PM1/15/01

to

On 15 Jan 2001 09:36:16 -0500, James Kanze
<James...@dresdner-bank.com> wrote:
>> You use a mutex to protect data against concurrent access.
>
>> int i;
>
>> void f(void) {
>> lock(mutex);
>> i++; // or something
>> unlock(mutex);
>> // some lengthy epilogue goes here
>> }
>
>> Looking at it more paranoidly, on might argue that an optimizing
>> compiler will probably want to keep i in a register for some reason,
>> and that it might want to keep it in that register until the
>> function returns.
>
>I've used more than one compiler that does this. In fact, most do, at
>least with optimization turned on.

This optimization is only permitted if the compiler ``knows'' that i is
not modified by these functions, and, in the case of POSIX, if the
compiler also knows that these functions don't call library functions
that have memory synchronizing properties.

Unless the compiler has very sophisticated global optimizations that
can look into the retained images of other translation units of the
program, this means that if lock() and unlock() are, or contain,
calls to other units, then i cannot be cached.

You will probaly find with most compilers that the most aggressive
caching optimizations are applied to auto variables whose address is
never taken. These cannot possibly be accessed or modified by another
thread or signal handler or what have you, so it is generally safe to
cache them in registers, or even optimize them to registers entirely.

Kaz Kylheku

unread,

Jan 15, 2001, 6:33:30 PM1/15/01

to

On 15 Jan 2001 08:45:01 -0500, James Kanze
<James...@dresdner-bank.com> wrote:

>> Dave Butenhof wrote:
>
>> > Don't ever use the C/C++ language volatile in threaded code. It'll

>> > kill your performance, and the language definition has nothing to
>> > do with what you want when writing threaded code that shares
>> > data. If some OS implementation tells you that you need to use it
>> > anyway on their system, (in the words of a child safety group),
>> > "run, yell, and tell". That's just stupid, and they shouldn't be
>> > allowed to get away with it.
>
>This is correct up to a point. The problem is that the C++ language
>has no other way of signaling that a variable may be accessed by
>several threads (and thus ensuring e.g. that it is really written
>before the lock is released). The problem *isn't* with the OS; it is
>with code movement within the optimizer of the compiler.

The problem is with the specification which governs the implementation
of the compiler and the operating system.

> And while I
>agree with the sentiment: volatile isn't the solution, I don't know
>how many compilers offer another one.

All POSIX implementations must honor the rules that the synchronization
functions like pthread_mutex_lock and so forth have memory
synchronizing properties. The combined implementation of language,
library and operating system must ensure that data is made consistent
across multiple processors when these functions are used. It is simply
a requirement. So a POSIX threaded application never needs to do
anything special such as using volatile; the implementation must do
whatever is needed, including having the compiler specially recognize
some functions, if that's what it takes!

Without such a statement of requirement, you cannot infer anything
about the behavior. At best you can look at what the compiler does now
and hope that it will do similar things in the future. This is the
case, e.g., with Visual C++ for Microsoft Windows. It so happens that
if you call an external function like EnterCriticalSection, then the
Microsoft compiler emits code that does not cache any non-local data.

Tom Payne

unread,

Jan 15, 2001, 6:34:15 PM1/15/01

to

In comp.lang.c++.moderated Kenneth Chiu <ch...@cs.indiana.edu> wrote:
: In article <3A62CF03...@dresdner-bank.com>,

That's a very good and important point. Volatility is neither
necessary nor sufficient for the synchronization that is needed in
multi-threading. Volatile objects get synchronized at every sequence
point, which is unnecessarily often, but only register-resident
objects get synchronized; there is no requirement to synchronized
processor-local caches.

Tom Payne

David Schwartz

unread,

Jan 15, 2001, 6:40:23 PM1/15/01

to

Kaz Kylheku wrote:

> You will probaly find with most compilers that the most aggressive
> caching optimizations are applied to auto variables whose address is
> never taken. These cannot possibly be accessed or modified by another
> thread or signal handler or what have you, so it is generally safe to
> cache them in registers, or even optimize them to registers entirely.

You might be surprised at the strange ways people might find to break
such schemes, including signals and longjmp.

DS

Charles Bryant

unread,

Jan 15, 2001, 7:01:05 PM1/15/01

to

In article <93vskg$7d5$1...@nnrp1.deja.com>,

Dylan Nicholson <dn...@my-deja.com> wrote:
>In article <3A635FEF...@webmaster.com>,
> David Schwartz <dav...@webmaster.com> wrote:

...

>> In the debug library, sure. In the release library, no way.
>Every cycle
>> ReleaseCriticalSection takes is performance-critical. RCS is the
>> highest-performance synchronization object WIN32 offers (not counting
>> Interlocked*, which isn't usually useful).
>>
>I'm not sure what you're getting at here...I can see that if
>performance was the only thing that counted then checking my
>own "islocked" variable before calling LeaveCriticalSection in the
>destructor would be preferable, but in my case it isn't.

The *CritcalSection() functions are very basic functions and,
strangely enough, not written especially for you. What is appropriate
in your case is not of the slightest relevance to how they are or
should be implemented.

>Actually I had to implement my own recursive mutexes, because they
>weren't supported on all the platforms we target. And it's not _THAT_
>trivial, it required two mutexes, an owner thread id, and a count. I'm
>absolutely sure I _haven't_ implemented it more efficiently that a
>library could (I know Win32 uses spinlocks to do the counting instead
>of a second mutex, which if I get time I will switch over to under
>POSIX).

If you do this on POSIX as it is perfectly simple using a mutex:

tyepdef struct {
pthread_mutex_t *m;
pthread_cond_t *cv;
int count;
pthread_t owner;
} RM;

recursive_lock(RM *rm)
{
pthread_mutex_lock(&rm->m);
while (rm->count && rm->owner != me) {
pthread_cond_wait(&rm->cv, &rm->m);
}
rm->owner = me;
rm->count++;
pthread_mutex_unlock(&rm->m);
}

recursive_unlock(RM *rm)
{
pthread_mutex_lock(&rm->m);
if (!--rm->count) pthread_cond_signal(&rm->cv);
pthread_mutex_unlock(&rm->m);
}

Or something like that (with proper error checking, of course).

If you cared about performance you wouldn't use recursive mutexes in
the first place, so optimisation of their implementation is absurd.

--
Eppur si muove

Kenneth Chiu

unread,

Jan 15, 2001, 7:45:22 PM1/15/01

to

In article <93vskg$7d5$1...@nnrp1.deja.com>,
Dylan Nicholson <dn...@my-deja.com> wrote:

>Actually I had to implement my own recursive mutexes, because they
>weren't supported on all the platforms we target. And it's not _THAT_
>trivial, it required two mutexes, an owner thread id, and a count.

Is this a general purpose recursive mutex? If so and it's not
proprietary, I'd be interested in seeing it.

Kaz Kylheku

unread,

Jan 15, 2001, 9:49:10 PM1/15/01

to

On 15 Jan 2001 16:15:11 -0500, Konrad Schwarz

<konradDO...@mchpDOTsiemens.de> wrote:
>So the optimization of keeping variables in registers accross
>function calls is illegal in general (and thus must not be performed),
>if the compiler cannot prove that the code will not be linked into a
>multi-threaded program or it cannot prove that those variables will
>never be shared.

Basically that is what it boils down to. Proving otherwise in the
general case involves knowing what happens in all the translation units
that are called, and such optimizations therefore must be delayed
somehow until the program is linked.

>However, the C language has a way of signaling that local variables
>cannot be accessed by other threads, namely by placing them in
>the register storage class. I don't know about C++; if I remember
>correctly, C++ degrades register to a mere "efficiency hint".

The auto storage class suffices. The register specifier simply means that
the object (which has automatic storage class---there is no register
storage class, just like there is no typedef storage class!) cannot
have its address taken; it becomes a constraint violation to try to do
so. It's not difficult to verify that an object's address is never
taken, whether or not it is declared register.

Kaz Kylheku

unread,

Jan 15, 2001, 10:35:05 PM1/15/01

to

On 15 Jan 2001 09:39:16 -0500, James Kanze

<James...@dresdner-bank.com> wrote:
>Kaz Kylheku wrote:
>
>> On 13 Jan 2001 23:01:04 -0500, Joerg Faschingbauer
>> <jfa...@jfasch.faschingbauer.com> wrote:
>> >So, provided that unlock() is a function and not a macro, there is no
>> >need to declare i volatile.
>
>> Even if a compiler implements sophisticated global optimizations
>> that cross module boundaries, the compiler can still be aware of
>> synchronization functions and do the right thing around calls to
>> those functions.
>
>It can be. It should be. Is it? Do todays compilers actually do the
>right thing, or are we just lucking out because most of them don't
>optimize very aggressively anyway?

It seems that we are lucking out, but not really. The way compilers
typically work is enough to ensure that volatile is not needed in order
for the right load and store instructions to be issues in the right
order. The rest of the job is done by the synchronization library
implementors, who must insert the appropriate memory barrier
instructions or what have you, into the implementation of these
functions, so that the hardware doesn't make a dog's breakfast out of
the memory access requests issued by each processor.

These library implementors tend to have a clue, and tend to have
influence with the compiler writers. For example, if the GNU compiler
people implemented some sophisticated optimizations not knowing that
these break MT programs, the GNU libc people would take note and work
out some solution---perhaps a special function attribute would be
developed, so that in the library header file, one could write:

int pthread_mutex_lock(pthread_mutex_t *) __attribute___ ((sync));

or some such thing meaning, spill and reload when calling this
function.

The point is that clueful implementors are aware of the issues and are
looking out for you; it's not just some accident that things work. :)

Dylan Nicholson

unread,

Jan 15, 2001, 10:47:26 PM1/15/01

to

In article <3A6379AE...@webmaster.com>,

David Schwartz <dav...@webmaster.com> wrote:
>
> Dylan Nicholson wrote:
>
>

> > I don't use condition variables anyway, but you will have to
explain to
> > me how the function can deadlock if the mutex is recursive. As far
as
> > I understand it, recursive mutexes cannot deadlock with themselves,
you
> > need two mutexes to get a deadlock.
>
> Sure it can deadlock. Suppose 'WaitForSomething' requires that
another
> thread acquire the mutex. SomeFunction thinks it unlocked the mutex,
> since it called 'UnlockMutex'. However, it didn't really unlock the
> mutex, so it could be waiting forever.
>

But that implies use of more than one synchronisation object at once
(some means is needed to wait for the other thread).
I agree that this introduces a whole new ball game, and maybe it's just
my experience, but I don't readily recall having ever written code that
requires this. The cases where it does happen you have to be extra
careful with regardless of whether your mutexes are recursive or not.

>
> It can't happen with non-recursive mutexes because in that
case, it
> would be illegal to call SomeFunction while holding either A or B.
With
> recursive mutxes, it is legal to call SomeFunction holding A or B.

"Illegal"? It just causes a deadlock. Admittedly a definite and easy
to predict deadlock, but still a deadlock none-the-less.

> If you have to put preconditions on functions anyway, why
bother with
> recursive mutexes? The only advantage of recursive mutexes is that
they
> relax this conditions. The problem is, it's extremely difficult to
know
> in general what level of requirement relaxation is safe. And recursive
> mutexes relax all the requirements all the way and provide you no way
to
> detect when you break your rules.

So, I add something like the following:

assert(!Mutex.TryLock());

Which does just a good a job with either type of mutex. If I wanted to
be extra careful, I'd throw an exception or return an error.

>
> > Actually I had to implement my own recursive mutexes, because they
> > weren't supported on all the platforms we target. And it's not
_THAT_
> > trivial, it required two mutexes, an owner thread id, and a count.
I'm
> > absolutely sure I _haven't_ implemented it more efficiently that a
> > library could (I know Win32 uses spinlocks to do the counting
instead
> > of a second mutex, which if I get time I will switch over to under
> > POSIX).
>
> You have implemented more efficiently than a library that
provided
> recursive mutexes by default possibly could. That's because more than
> 90% of the time people use mutexes, they neither want nor need
> recursion. It can be done with one mutex if you're careful, because a
> thread cannot get into a race condition with itself. You may need to
> have one thread that never uses any recursive mutexes, so you can use
> its pthread_t to mean 'unowned'.
>

Well I'd be interested in knowing how - certainly Win32 Critical
Sections protect access to the lockcount...how do you avoid one thread
incrementing the count while another is reading it?
As for your 90% comment, if you are correct then I agree mutexes need
not be recursive by default. But there must be other programmers who
have started with Win32 threads and on moving to POSIX, suddenly
realised mutexes aren't quite the same as critical sections. I
certainly believe there needs to be standard and REQUIRED support for
them, which going by all the platforms we support, is a long way off
the truth.

Dylan Nicholson

unread,

Jan 15, 2001, 10:57:40 PM1/15/01

to

In article <9405j2$7qb$1...@flotsam.uits.indiana.edu>,

It's certainly general purpose, but it is proprietary. It took about 2-
3 hours to implement and test on all our platforms (4 unix + windows),
so I'm sure you can work one out fairly easily. Someone else posted
code that was similar and probably more efficient, as I wasn't familiar
with condition variables (hence the 2nd mutex).

David Schwartz

unread,

Jan 15, 2001, 11:29:13 PM1/15/01

to

Dylan Nicholson wrote:

> But that implies use of more than one synchronisation object at once
> (some means is needed to wait for the other thread).
> I agree that this introduces a whole new ball game, and maybe it's just
> my experience, but I don't readily recall having ever written code that
> requires this. The cases where it does happen you have to be extra
> careful with regardless of whether your mutexes are recursive or not.

It doesn't require anything of the kind. This can happen with one mutex
protecting one object. Suppose we have an object that we very rarely
need to manipulate in a complex and expensive way. We don't want to hold
the lock the whole time because some threads can't delay that long. But
some operations would prefer to wait until the object is stable.

Accessors that can't wait do this:

LockObject();
if(ObjectIsBeingWorkedOn())
{
UnlockObject();
return false;
}
DoWork();
UnlockObject();
return true;

But accessors that can afford to wait do this:

LockObject();
while(ObjectIsBeingWorkedOn())
{
UnlockObject();
Sleep(10); /* this could also block on an event or c.v. */
}
DoStuff();
UnlockObject();

The problem is, the 'Sleep' sleeps with the mutex locked because the
'UnlockObject' didn't unlock the object! That's the problem with
recursive mutexes, you can never be sure you've actually unlocked them.

And these is no fix for this! Consider:

LockObject();
while(ObjectIsBeingWorkedOn())
{
CallUnlockAsManyTimesAsNeeded();
Sleep(10);
}
ReLockObjectAsManyTimesAsNeeed();
DoStuff();
UnlockObjectOnce();
return;

This causes surprises for functions that call this with the mutex
locked. They expect the object not to change while the mutex is locked
unless they lock it. Functions that unlock mutexes more times than they
lock them create unexpected behavior for their callers.

> > It can't happen with non-recursive mutexes because in that
> > case, it
> > would be illegal to call SomeFunction while holding either A or B.
> > With
> > recursive mutxes, it is legal to call SomeFunction holding A or B.

> "Illegal"? It just causes a deadlock. Admittedly a definite and easy
> to predict deadlock, but still a deadlock none-the-less.

Actually, you can do whatever you want in that case. What I do is, in a
debug build, detect it and dump core. In a release build, I don't check
for it, for performance reasons. That's way preferable to 'seeming to
work fine' in a debug build and then deadlocking in a release build.

So, in sum:

1) Recursive mutexes are more expensive than non-recursive mutexes.

2) Recursive mutexes can create serious problems unless you are super
careful when you use them.

3) Recursive mutexes can hide problems leaving your customers to find
them in release code.

4) Recursive mutexes can create ambiguous semantics that depend upon
prior conditions.

5) Recursive mutexes reduce the amount of error-checking you can do in
a debug build.

6) There are very, very few cases where recursive mutexes provide any
benefits at all.

DS

Kaz Kylheku

unread,

Jan 16, 2001, 1:13:58 AM1/16/01

to

On Tue, 16 Jan 2001 03:47:26 GMT, Dylan Nicholson <dn...@my-deja.com> wrote:
>Well I'd be interested in knowing how - certainly Win32 Critical
>Sections protect access to the lockcount...how do you avoid one thread
>incrementing the count while another is reading it?

On 80x86, the machine language uses an atomic increment instruction.
(The lock count starts at -1 so that a transition to zero can be
detected using the flags.) The lock count is just bookkeeping for how
many threads are vying for the lock, together with the auto-reset
event, the lock count forms a semaphore. (Yes the underlying
synchronization object is not a semaphore but an auto-reset event,
despite the name of the struct member). A thread which successfully
increments the lock count from -1 to 0 gets the lock, any other value
means you wait on the event. Unlocking means decrementing the lock
count; a decrement to a value other than -1 requires an additional
signaling of the event to wake up a waiting thread.

>As for your 90% comment, if you are correct then I agree mutexes need
>not be recursive by default. But there must be other programmers who
>have started with Win32 threads and on moving to POSIX, suddenly
>realised mutexes aren't quite the same as critical sections.

The Single UNIX Spec defines support for recursive mutexes, and this
has been picked up by the POSIX 200x draft.

Brain damaged or not, it's true that people porting code from Win32
need them.

Ron Hunsinger

unread,

Jan 16, 2001, 5:28:45 AM1/16/01

to

In article <3A62D01F...@dresdner-bank.com>, James Kanze
<James...@dresdner-bank.com> wrote:

> You still need some way of preventing the optimizer from deferring
> writes until after the lock has been released.

Not knowing what's inside unlock() should take care of this. A compiler
that doesn't bring memory up to date before calling a function it doesn't
have the source for is going to generate code that breaks even in the
absence of multithreading.

> Ideally, the compiler
> will understand the locking system (mutex, or whatever), and generate
> the necessary write guards itself. Off hand, I don't know of any
> compiler which meets this ideal.

I'd expect the write guard to be within unlock() itself. No matter what the
compiler knows or doesn't know, the writer of unlock() surely knows that a
memory barrier (maybe two) is required, and can code it in.

-Ron Hunsinger

James Kanze

unread,

Jan 16, 2001, 10:49:00 AM1/16/01

to

Kaz Kylheku wrote:

[...]

> You will probaly find with most compilers that the most aggressive
> caching optimizations are applied to auto variables whose address is
> never taken. These cannot possibly be accessed or modified by
> another thread or signal handler or what have you, so it is
> generally safe to cache them in registers, or even optimize them to
> registers entirely.

I think we basically agree. The only difference is that I'm not
content with "probably find with most compilers"; I want a writen
guarantee for the compiler I will actually use. (There are normally
contractual penalties for errors in software I write. And a threading
problem is considered an error in the software.)

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 16, 2001, 10:49:52 AM1/16/01

to

Kaz Kylheku wrote:

> On 15 Jan 2001 09:39:16 -0500, James Kanze
> <James...@dresdner-bank.com> wrote:
> >Kaz Kylheku wrote:

> >> On 13 Jan 2001 23:01:04 -0500, Joerg Faschingbauer
> >> <jfa...@jfasch.faschingbauer.com> wrote:
> >> >So, provided that unlock() is a function and not a macro, there
> >> >is no need to declare i volatile.

> >> Even if a compiler implements sophisticated global optimizations
> >> that cross module boundaries, the compiler can still be aware of
> >> synchronization functions and do the right thing around calls to
> >> those functions.

> >It can be. It should be. Is it? Do todays compilers actually do
> >the right thing, or are we just lucking out because most of them
> >don't optimize very aggressively anyway?

> It seems that we are lucking out, but not really. The way compilers
> typically work is enough to ensure that volatile is not needed in order
> for the right load and store instructions to be issues in the right
> order.

The operative word here is "typically", I think. I know that it will
work for most compilers, not necessarily because the compiler writers
have done anything special, but because they haven't done anything
really special in the lines of optimizing. I've also seen
experimental compilers which did an amazing amout intermodule
analysis, and which "knew" that system calls don't access user
variables unless they've been passed the address of the variable.

(BTW: most compilers will rearrange the order of writes. It shouldn't
matter, as long as all writes take place before the lock is released.)

> The rest of the job is done by the synchronization library
> implementors, who must insert the appropriate memory barrier
> instructions or what have you, into the implementation of these
> functions, so that the hardware doesn't make a dog's breakfast out
> of the memory access requests issued by each processor.

Agreed. No problem here.

> These library implementors tend to have a clue, and tend to have
> influence with the compiler writers. For example, if the GNU
> compiler people implemented some sophisticated optimizations not
> knowing that these break MT programs, the GNU libc people would take
> note and work out some solution---perhaps a special function
> attribute would be developed, so that in the library header file,
> one could write:

> int pthread_mutex_lock(pthread_mutex_t *) __attribute___ ((sync));
>
> or some such thing meaning, spill and reload when calling this
> function.

> The point is that clueful implementors are aware of the issues and
> are looking out for you; it's not just some accident that things
> work. :)

You've hit upon exactly the point which worries me. Thread safety is
a completely foreign domain for most compiler writers, or at least it
was when I worked on compilers. Which means that the implementor who
is clueful about optimization may not be so clueful about locking
issues and thread safety. I'd feel a lot better with an explicit
acknowledgement of the issues in the compiler documentation.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 16, 2001, 10:51:09 AM1/16/01

to

Konrad Schwarz wrote:

> If the compiler supports multi-threading (at least POSIX
> multi-threading), then it *must*, since POSIX does not require
> shared variables to be volatile qualified. If the compiler decides
> to keep values in registers across function calls, it must be able
> to prove that
> * either these variables are never shared by another thread
> * or the functions in question never perform inter-thread operations

Is this explicitly stated in the Posix standard? If so, it is the
sort of guarantee I'm looking for. Or is this just your
interpretation based on the lack of a requirement that shared
variables be volatile qualified?

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 16, 2001, 10:52:12 AM1/16/01

to

Joerg Faschingbauer wrote:

> Duh! What compiler and what language are you talking about?

The language was C++. The compiler was an experimental one which ran
on HP/UX. I don't know what percentage of the optimizations involved
have actually made it into a commercial compiler at present, but the
potential is there.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

James Kanze

unread,

Jan 16, 2001, 10:52:31 AM1/16/01

to

David Schwartz wrote:

> James Kanze wrote:

> > You still need some way of preventing the optimizer from deferring
> > writes until after the lock has been released. Ideally, the
> > compiler will understand the locking system (mutex, or whatever),
> > and generate the necessary write guards itself. Off hand, I don't
> > know of any compiler which meets this ideal.

> Give an example of what you think the problem is. The
> typical solution is to give the compiler no information at all about
> the locking system. Since the compiler then must assume the locking
> system could do anything, it can't optimize anything across it.

In sum, you're counting on the weaknesses of the compiler.

I've already said that in practice, it is probably not a problem,
since the compiler normally won't have accesses to the sources to the
locking system, and any compiler smart enough to know that a system
call won't modify a global variable can also know that specific system
calls involve the locking system, and so some sort of barrier is
necessary.

What I'm complaining about is the lack of explicit guarantees
regarding this. In the end, my previous paragraph is really just
speculation. I think that this will be the case. But I'm feel much
better about it if the compiler implementors specified it, so that I
could be sure that they'd considered it. Particularly because today,
it typically isn't a problem; as you say, the compiler has no
information about the system, and so supposes it can do anything.

> It's not clear to me what you mean by "deferring
> writes". This could either refer to variables being cached in
> registers and not written back or it couuld refer to a hardware
> write cache not being flushed. Fortunately, neither is a
> problem. Variables can't be cached in registers because the compiler
> doesn't know what the lock/unlock functions do, and so must assume
> they might access those variables from their memory
> locations. Hardware write caches aren't a problem, because the
> lock/unlock functions contain the appropriate memory barrier. The
> compiler doesn't know this, but the compiler has nothing to do with
> such hardware write reordering and so doesn't need to.

I was mainly thinking of the compiler optimizations. The only way to
ensure that the compiler has no knowledge of what the lock/unlock
functions do is to not make the sources, or even the binary,
accessible to it. Typically, this IS the case, because the
lock/unlock functions are implemented in the system. But a good
compiler can recognize system calls, and know that they don't change
global variables unless the address of the global variable has been
passed as a parameter. (I've actually used a compiler which did this.
Twelve years ago, no less.) Of course, one would hope that a compiler
this smart would also know which system functions involve the locking
system, and take this into account. I just happen to prefer
documented guarantees to just hoping.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

John Mullins

unread,

Jan 16, 2001, 10:53:25 AM1/16/01

to

"Andrei Alexandrescu" <andre...@hotmail.com> wrote in message
news:93v92q$bpfij$1...@ID-14036.news.dfncis.de...

> "John Mullins" <John.M...@crossprod.co.uk> wrote in message

> news:93v2gj$4fs$1...@newsreaderg1.core.theplanet.net...

> >
> > But his examples also rely on undefined behaviour so he can't really
> > count on anything.
>
> Are you referring to const_cast? Strictly speaking, indeed. But then, all
MT
> programming in C/C++ has undefined behavior.
>

> Andrei
>

Of course, and I agree when dealing with multiple threads we're somewhat
in the 'Twilight Zone'. I do find it worrying that we find this in an
'Experts' column and there is no mention that this may fail to work with a
highly aggressive optimizing compiler. Novices tend to learn from experts
and if you tell them it's okay to cast away volatile on a volatile variable,
they'll believe you. FWIW I enjoyed the article and gained lots of useful
insights but would have liked to have seen the problems acknowledged.

JM

Charles Bryant

unread,

Jan 16, 2001, 12:11:46 PM1/16/01

to

In article <3A62CE9A...@lucent.com>,
Michiel Salters <sal...@lucent.com> wrote:
>Joerg Faschingbauer wrote:
>> >>>>> "David" == David Schwartz <dav...@webmaster.com> writes:
>> David> Joerg Faschingbauer wrote:
>>
>> >> Now that's the whole point: the compiler has to take care that the
>> >> code it generates spills registers before calling a function.
>
>> David> This is really not so. It's entirely possible that the
>> David> compiler might have some way of assuring that the particular
>> David> value cached in the register isn't used by the called function,
>> David> and hence it can keep it in a register.
>
>> Even though such a thing is possible, it is quite unlikely - consider
>> the management effort of the people making (and upgrading!) such a
>> system.
>
>I don't think things are that hard - especially in C++, which already
>has name mangling. For each translation unit, it is possible to determine
>which functions use which registers and what functions are called outside
>that translation unit.
>Now encode in the name mangling of a function which registers are used,
>except of course those functions imported from another translation unit.
>Just add to the name something like Reg=EAX_EBX_ECX. The full set of
>registers used by a function is the set of registers it uses, plus
>the registers used by function it calls. This will even work for
>mutually recursive functions across translation units.

I don't see how name mangling if of the slightest use. At the point
where the compiled code refers to the function, it cannot know which
registers it uses, so it cannot know what symbol to use for the
reference. If it can look up the unmangled name in order to determine
which 'Reg=...' mangling to use, it might as well leave the name
alone.

--
Eppur si muove

Charles Bryant

unread,

Jan 16, 2001, 12:13:18 PM1/16/01

to

In article <93vb9p$bpi8g$1...@ID-14036.news.dfncis.de>,
Andrei Alexandrescu <andre...@hotmail.com> wrote:
>I would be glad if someone explained me in what situations a compiler can
>rearrange instructions to the extent that it would invalidate the idiom that
>the article proposes. OTOH, such compilers invalidate a number of idioms
>anyway, such as the Double-Check Locking Pattern, used by Doug Schmidt in
>ACE.

Not having read the article, I cannot comment on it (is there a URL
for it or for a summary?).

However, the problem is not that the compiler may rearrange
instructions. It is that the CPU may re-order the memory accesses.
For example,

x = 6; // x == 6
y = 7; // x == 6, y == 7
x = y; // x == 7, y == 7
y = 5; // x == 7, y == 5

The CPU may:
put '6' in its cache, scheduled to be written to 'x',
write '6' from cache to 'x'
put '7' in its cache, scheduled to be written to 'y',
write '7' from cache to 'y'
put '7' in its cache, scheduled to be written to 'x',
put '5' in its cache, scheduled to be written to 'y',
write '5' from cache to 'y'
write '7' from cache to 'x'

Note that this permits another CPU to see x == 6 and y == 5 in
memory, even though CPU executing the code could never see this
combination.

When a programmer wishes to enforce the relative ordering of memory
accesses on such a CPU, they make it execute a 'memory barrier'
instruction. There may be several such instructions, and in the above
example an 'store/store' memory barrier after 'x = y' would ensure
that the writing of '5' to 'y' could not be moved before any store
before the barrier, so the combination x == 6 && y ==5 would not
occur in memory.

The SPARC processor manual has a good description of this in an
appendix.

This is why the double checked locking paradigm is irrecoverably
broken. See
<URL: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html >
for a detailed explanation of why.

Charles Bryant

unread,

Jan 16, 2001, 12:13:47 PM1/16/01

to

In article <3A6369C3...@sensor.com>, Ron Natalie <r...@sensor.com>
wrote:

>
>> However, the C language has a way of signaling that local variables
>> cannot be accessed by other threads, namely by placing them in
>> the register storage class.
>
>Huh? How is that? The C language doesn't contain the word thread
>anywhere. Register is auto + a hint to keep in a register.
>
>> I don't know about C++; if I remember
>> correctly, C++ degrades register to a mere "efficiency hint".
>
>The ONLY difference between C and C++ is that C++ allows you to
>take the address of something with a register storage class (noting
>that doing so may force it out of a register), while C prohibits
>the & operator on register-declared objects even if they weren't
>actually put in a register by the compiler.

That difference is what prevents register variables being shared
across threads. Each thread has its own, private, automatic
variables, so the only way a thread could refer to an automatic
variable in another thread is if it is passed the address of the
variable. If you can't take the address of an automatic variable,
then that variable cannot be shared across threads.

--
Eppur si muove

James Kanze

unread,

Jan 16, 2001, 12:16:19 PM1/16/01

to

Andrei Alexandrescu wrote:

> I would be glad if someone explained me in what situations a
> compiler can rearrange instructions to the extent that it would
> invalidate the idiom that the article proposes. OTOH, such compilers
> invalidate a number of idioms anyway, such as the Double-Check
> Locking Pattern, used by Doug Schmidt in ACE.

As far as C++ is concerned, in just about every case. The C++
standard doesn't recognize multi-threading, and a conforming compiler
can suppose that there are no accesses outside of the current thread,
and optimize in consequence.

In practice, it isn't generally a problem, because:
- there are apparently other standards (POSIX) which address the
issue, and
- in practice, compiler optimizers aren't smart enough to move code
concerning global variables accross a call to a function in
another module.
For example, although I know the failings of the double-check locking
pattern, it is widely used, and I've yet to hear of a case where it
failed to work correctly. I suspect, however, that this is mainly
because the constructor of a singleton isn't the most heavily executed
code in an application, and the few compilers where the optimizer is
intelligent enough to cause problems use profiling output to guide
optimization, and only really optimize the most heavily executed
branches.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

Dave Butenhof

unread,

Jan 16, 2001, 1:57:02 PM1/16/01

to

Andrei Alexandrescu wrote:

> "James Kanze" <James...@dresdner-bank.com> wrote in message
> news:3A62CF03...@dresdner-bank.com...
> > The crux of Andrei's suggestions really just exploits the compiler
> > type-checking with regards to volatile, and not the actual semantics
> > of volatile. If I've understood the suggestion correctly, it would
> > even be possible to implement it without ever accessing the individual
> > class members as if they were volatile (although in his examples, I
> > think he is also counting on volatile to inhibit code movement).
>
> Thanks James for all your considerations before and after the article
> appeared.
>
> There is a point about the use of volatile proposed by the article. If you
> write volatile correct code as prescribed by the article, you _never_
> *never* NEVER use volatile variables. You _always_ *always* ALWAYS lock a
> synchronization object, cast the volatile away, operate on the so-obtained
> non-volatile alias, let the alias go, and unlock the synchronization object,
> in this order.
>
> Maybe I should have made it clearer that in volatile-correct code, you never
> operate on volatile data - you always cast volatile away and more
> specifically, you cast it away when it is *semantically correct* to do so
> because you locked the afferent synchronization object.

Yes, you should have done so. The introduction, as has been pointed out many
times in this overly complicated thread (my head hurts after trying to catch up
with a couple of days' worth of posts!), oversells the technique substantially.

First off, it does nothing about "detecting races". What it DOES is try to use
(one might reasonably say "abuse") a language feature to try to detect
unsynchronized access to shared variables. That's a noble cause, but I don't
like the implementation much. By using volatile (which is already a subject of
much confusion, as you may have noticed by now), you're (perhaps unintentially)
implying a lot that the article can't do. (Note that, as pointed out elsewhere,
races can happen even when you use synchronization. Races due to improper
synchronization aren't very interesting, because that's basically what they
asked for, you and you really can't stop them from getting it.)

The intent appears to be simply that the compiler will diagnose attempts to use
the volatile members without the intended type casting to remove the volatile
attribute. You supply a guard object to lock an associated mutex and provide
the type cast. You also speak of doing the type cast directly. Yes, I see that
you suggested this be done only in "non-threaded" environments. You don't give
examples, but one might be to initialize data in main() before creating
threads. I don't see much value to this (the mutex would be uncontended). In
general, there's no way to know that the process isn't "threaded". (And, unless
your system prohibits dlopen() of the thread library, which some do, you can't
know that it won't suddenly BECOME threaded at the most inconvenient time.)
Thread-safe code should always BEHAVE as if there were multiple threads.

The article also implies that one could use the data with volatile attributes
intact to manipulate shared data. You don't appear to intend to recommend this,
and you don't give any examples, but the tantalizing (and dangerous) suggestion
remains. You'd get less flak if you removed the suggestion, and replaced it
with a flat statement that any attempt at manipulating shared data without
using standard synchronization operations is NON-portable, and that no language
feature is sufficient. It CAN be done on any platform, but you need to
understand the software and hardware architecture fairly well to even try, and
it's rarely worth anyone's time.

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

Konrad Schwarz

unread,

Jan 16, 2001, 2:04:04 PM1/16/01

to

Ron Natalie:

>
> > However, the C language has a way of signaling that local variables
> > cannot be accessed by other threads, namely by placing them in
> > the register storage class.
>
> Huh? How is that? The C language doesn't contain the word thread
> anywhere. Register is auto + a hint to keep in a register.
>
> > I don't know about C++; if I remember
> > correctly, C++ degrades register to a mere "efficiency hint".
>
> The ONLY difference between C and C++ is that C++ allows you to
> take the address of something with a register storage class (noting
> that doing so may force it out of a register), while C prohibits
> the & operator on register-declared objects even if they weren't
> actually put in a register by the compiler.

That's the point. C register variables can't have their addresses
taken,
so they can't be aliased.

Andrei Alexandrescu

unread,

Jan 16, 2001, 4:33:35 PM1/16/01

to

"John Mullins" <John.M...@crossprod.co.uk> wrote in message

news:9412sa$mdf$1...@newsreaderm1.core.theplanet.net...

> Of course, and I agree when dealing with multiple threads we're
somewhat
> in the 'Twilight Zone'. I do find it worrying that we find this in an
> 'Experts' column and there is no mention that this may fail to work with a
> highly aggressive optimizing compiler. Novices tend to learn from experts
> and if you tell them it's okay to cast away volatile on a volatile
variable,
> they'll believe you. FWIW I enjoyed the article and gained lots of useful
> insights but would have liked to have seen the problems acknowledged.

Point taken, thanks very much.

Andrei

John Mullins

unread,

Jan 16, 2001, 4:34:28 PM1/16/01

to

"Kenneth Chiu" <ch...@cs.indiana.edu> wrote in message
news:93sptr$o5s$1...@flotsam.uits.indiana.edu...

> It's not that volatile itself causes memory problems. It's that
> it's not sufficient (and if under POSIX should not even be used).
>
> He gives an example, which will work in practice, but if he had two
> shared variables, would fail. Code like this, for example, would
> be incorrect on an MP with a relaxed memory model. The write to flag_
> could occur before the write to data_, despite the order in which the
> assignments are written.
>
> class Gadget {
> public:
> void Wait() {
> while (!flag_) {
> Sleep(1000); // sleeps for 1000 milliseconds
> }
> do_some_work(data_);
> }
> void Wakeup() {
> data_ = ...;
> flag_ = true;
> }
> ...
> private:
> volatile bool flag_;
> volatile int data_;
> };
>
I'm not sure I agree with this, the compiler should guarantee that the
writes occur as written since this is 'observable behaviour'

JM

Ron Natalie

unread,

Jan 16, 2001, 4:35:24 PM1/16/01

to

Charles Bryant wrote:

> >> However, the C language has a way of signaling that local variables
> >> cannot be accessed by other threads, namely by placing them in
> >> the register storage class.
> >
>

> That difference is what prevents register variables being shared
> across threads. Each thread has its own, private, automatic
> variables, so the only way a thread could refer to an automatic
> variable in another thread is if it is passed the address of the
> variable. If you can't take the address of an automatic variable,
> then that variable cannot be shared across threads.
>

Ah I get it, signalling to the programmer. Not signalling them to the
compiler.

Andrei Alexandrescu

unread,

Jan 16, 2001, 8:25:08 PM1/16/01

to

"Charles Bryant" <n142036...@chch.demon.co.uk> wrote in message
news:2001-01-1...@chch.demon.co.uk...

> In article <93vb9p$bpi8g$1...@ID-14036.news.dfncis.de>,
> Andrei Alexandrescu <andre...@hotmail.com> wrote:
> >I would be glad if someone explained me in what situations a compiler can
> >rearrange instructions to the extent that it would invalidate the idiom
that
> >the article proposes. OTOH, such compilers invalidate a number of idioms
> >anyway, such as the Double-Check Locking Pattern, used by Doug Schmidt in
> >ACE.
>
> Not having read the article, I cannot comment on it (is there a URL
> for it or for a summary?).

http://cuj.com/experts/1902/alexandr.html

I'll also provide a summary for two reasons. One is that many people don't
have the time to read the whole banana. The second reason is that a subset
of those people do have time to post an opinion about the article.

Summary:

The article describes how applying the volatile modifier to class types and
member functions, in conjunction with following a number of rules, to the
end of having the compiler detect race conditions as type errors.

In essence, the article predicates qualifying with volatile all user-defined
data that is shared between threads, and remove that qualification (via a
const_cast) only in conjunction with locking a synchronization object (the
article uses a mutex) that is associated with that data. This way
multithreading semantics and volatile semantics are always in sync. The
device that helps with that is a simple class template LockingPtr.

Also, the article predicates qualifying with volatile member functions that
do their own internal synchronization. This way, volatile (shared) objects
will be able to invoke those member functions directly. Caller code witll be
able to invoke non-synchronized (non-volatile) member functions only after
locking the object with a LockingPtr.

The article makes NO CLAIM on multithreaded programming in the ABSENCE of a
mutex. The workings of LockingPtr are based on mutex semantics. I thought
this is clear enough, but too many people gloss over the fact that the
article uses mutexes or semantically equivalent devices.

> When a programmer wishes to enforce the relative ordering of memory
> accesses on such a CPU, they make it execute a 'memory barrier'
> instruction. There may be several such instructions, and in the above
> example an 'store/store' memory barrier after 'x = y' would ensure
> that the writing of '5' to 'y' could not be moved before any store
> before the barrier, so the combination x == 6 && y ==5 would not
> occur in memory.

I wonder, can memory bareers be encapsulated (like in macros) so that you
get mutex-like semantics?

> This is why the double checked locking paradigm is irrecoverably
> broken. See
> <URL:
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html >
> for a detailed explanation of why.

You know, I was thinking. In the presence of 'volatile', doesn't (or at
least shouldn't) the compiler disable reordering? For example, if the
pointer to the Singleton is volatile-qualified, then a rearranging compiler
should disable reordering instructions that involve the pointed.

Andrei

Kaz Kylheku

unread,

Jan 16, 2001, 8:26:41 PM1/16/01

to

On 16 Jan 2001 14:04:04 -0500, Konrad Schwarz

<konradDO...@mchpDOTsiemens.de> wrote:
>
>
>Ron Natalie:
>>
>> > However, the C language has a way of signaling that local variables
>> > cannot be accessed by other threads, namely by placing them in
>> > the register storage class.
>>
>> Huh? How is that? The C language doesn't contain the word thread
>> anywhere. Register is auto + a hint to keep in a register.
>>
>> > I don't know about C++; if I remember
>> > correctly, C++ degrades register to a mere "efficiency hint".
>>
>> The ONLY difference between C and C++ is that C++ allows you to
>> take the address of something with a register storage class (noting
>> that doing so may force it out of a register), while C prohibits
>> the & operator on register-declared objects even if they weren't
>> actually put in a register by the compiler.
>
>That's the point. C register variables can't have their addresses
>taken,
>so they can't be aliased.

There is no difference between an auto variable that cannot have its
address taken an one that does not have its address taken. The
register keyword only adds a constraint rule to the C language, but it
is otherwise vacuous of any special semantics.

A conforming C compiler must verify that a register variable does not
in fact have its address taken, and emit a diagnostic otherwise. It
can similarly analyze *any* auto variable to determine whether or not
its address is taken. It would be a poor quality compiler that only
made optimistic aliasing assumptions about register variables.

Kaz Kylheku

unread,

Jan 16, 2001, 8:27:44 PM1/16/01

to

On 16 Jan 2001 13:57:02 -0500, Dave Butenhof <David.B...@compaq.com> wrote:
>First off, it does nothing about "detecting races". What it DOES is try to use
>(one might reasonably say "abuse") a language feature to try to detect
>unsynchronized access to shared variables. That's a noble cause, but I don't
>like the implementation much.

I find that using simple assertions is adequate for finding
unsynchronized accesses to shared variables. A function or statement
block that expects a lock to already be held simply does something like
this:

assert (lock.current_thread_is_owner());

That's it; no messing around with contortions of the C++ type system.

Andrei Alexandrescu

unread,

Jan 16, 2001, 8:27:20 PM1/16/01

to

"Dave Butenhof" <David.B...@compaq.com> wrote in message
news:3A645F38...@compaq.com...

> Andrei Alexandrescu wrote:
> > Maybe I should have made it clearer that in volatile-correct code, you
never
> > operate on volatile data - you always cast volatile away and more
> > specifically, you cast it away when it is *semantically correct* to do
so
> > because you locked the afferent synchronization object.
>
> Yes, you should have done so. The introduction, as has been pointed out
many
> times in this overly complicated thread (my head hurts after trying to
catch up
> with a couple of days' worth of posts!), oversells the technique
substantially.

I disagree, but then I'm a very narrow audience :o).

> First off, it does nothing about "detecting races". What it DOES is try to
use
> (one might reasonably say "abuse") a language feature to try to detect
> unsynchronized access to shared variables.

In my opinion the use of volatile as done by the article is in keeping with
the built-in meaning of the keyword and dovetails nicely with volatile's
workings. I wouldn't call that abuse, but of course I'd be glad to stand
corrected.

Volatile data is data that can be modified outside compiler's knowledge.
This is *exactly* the meaning I keep for volatile. I have the data that's
used by multiple threads volatile-qualified. You can't touch the data that's
volatile and so you have to lock an afferent synchronization object before
casting away its volatileness. Once you locked the synchronization object,
the semantics of the code become single-threaded and so you can cast the
volatile away.

I find the point above quite interesting and important for understanding the
point of the article - that you can, and should, synchronize the semantics
of volatile with the semantics of locking.

> That's a noble cause, but I don't
> like the implementation much. By using volatile (which is already a
subject of
> much confusion, as you may have noticed by now), you're (perhaps
unintentially)
> implying a lot that the article can't do. (Note that, as pointed out
elsewhere,
> races can happen even when you use synchronization. Races due to improper
> synchronization aren't very interesting, because that's basically what
they
> asked for, you and you really can't stop them from getting it.)

This is trite. I'd say, you can apply any technique improperly or
mistakenly. You can say, for example, that const-correctness doesn't
necessarily lead to const-correct programs.

Volatile correctness allows you to apply a SMALL number of SIMPLE,
MECHANICAL rules that allow the compiler to detect ALL race conditions. If
you design your primitive classes wrongly, then you did not apply the rules
or you applied them incorrectly.

> The intent appears to be simply that the compiler will diagnose attempts
to use
> the volatile members without the intended type casting to remove the
volatile
> attribute. You supply a guard object to lock an associated mutex and
provide
> the type cast.

This is essential: that const_casting volatile away is done always in
conjunction with locking the synchronization object.

> The article also implies that one could use the data with volatile
attributes
> intact to manipulate shared data. You don't appear to intend to recommend
this,
> and you don't give any examples, but the tantalizing (and dangerous)
suggestion
> remains.

I am not sure what you are referring to here. Could you quote? I am sure
there is a misunderstanding. For now, I say that there is no dangerous
suggestion that the article makes. I stand behind what I wrote and I
consider it fundamentally correct.

I wish (I mean, I don't *really* wish) that someone would come with a code
sample that applies the rules of volatile-correct code, yet has trouble
related to race conditions. This would lead to completing the rules. Again,
I affirm that volatile-correctness is a fundamentally sound concept, but of
course there might be loopholes that I left in defining it.

> You'd get less flak if you removed the suggestion, and replaced it
> with a flat statement that any attempt at manipulating shared data without
> using standard synchronization operations is NON-portable, and that no
language
> feature is sufficient.

But doesn't the article say - and is quite noisy about that - that you
should cast volatileness away ONLY after locking the synchronization object?
Maybe your statement could have made things a tad clearer, but then the
article does not target programmers who don't know how to write
multithreaded programs. It targets those who do write multithreaded programs
and would like to get help from their compiler with that.

Andrei

David Schwartz

unread,

Jan 16, 2001, 8:37:40 PM1/16/01

to

James Kanze wrote:

> > Give an example of what you think the problem is. The
> > typical solution is to give the compiler no information at all about
> > the locking system. Since the compiler then must assume the locking
> > system could do anything, it can't optimize anything across it.
>
> In sum, you're counting on the weaknesses of the compiler.

No, I'm simply demonstrating that typical compilers have no difficulty
meeting the pthreads standard.

> I've already said that in practice, it is probably not a problem,
> since the compiler normally won't have accesses to the sources to the
> locking system, and any compiler smart enough to know that a system
> call won't modify a global variable can also know that specific system
> calls involve the locking system, and so some sort of barrier is
> necessary.
>
> What I'm complaining about is the lack of explicit guarantees
> regarding this. In the end, my previous paragraph is really just
> speculation. I think that this will be the case. But I'm feel much
> better about it if the compiler implementors specified it, so that I
> could be sure that they'd considered it. Particularly because today,
> it typically isn't a problem; as you say, the compiler has no
> information about the system, and so supposes it can do anything.

The pthreads standard provides an explicit guarantee. However, a simple
bit of logic will show you that this guarantee is trivial to implement
regardless of how aggressive the compiler optimizes -- anything another
thread could do to get access to a variable, the lock/unlock code could
do to get access to that same variable.

> > It's not clear to me what you mean by "deferring
> > writes". This could either refer to variables being cached in
> > registers and not written back or it couuld refer to a hardware
> > write cache not being flushed. Fortunately, neither is a
> > problem. Variables can't be cached in registers because the compiler
> > doesn't know what the lock/unlock functions do, and so must assume
> > they might access those variables from their memory
> > locations. Hardware write caches aren't a problem, because the
> > lock/unlock functions contain the appropriate memory barrier. The
> > compiler doesn't know this, but the compiler has nothing to do with
> > such hardware write reordering and so doesn't need to.

> I was mainly thinking of the compiler optimizations. The only way to
> ensure that the compiler has no knowledge of what the lock/unlock
> functions do is to not make the sources, or even the binary,
> accessible to it. Typically, this IS the case, because the
> lock/unlock functions are implemented in the system. But a good
> compiler can recognize system calls, and know that they don't change
> global variables unless the address of the global variable has been
> passed as a parameter.

For threaded code, this would be a bad, even broken, compiler.

> (I've actually used a compiler which did this.
> Twelve years ago, no less.) Of course, one would hope that a compiler
> this smart would also know which system functions involve the locking
> system, and take this into account. I just happen to prefer
> documented guarantees to just hoping.

Well the pthreads standard gives you one. It has very specific memory
visibility rules. They're documented clearly in Dr. Butenhof's book.

DS

Kaz Kylheku

unread,

Jan 17, 2001, 2:43:20 AM1/17/01

to

On 16 Jan 2001 16:34:28 -0500, John Mullins

Any optimization is valid if a correct program cannot tell the
difference; what constitutes a correct program depends on the language
standard, with whatever fine tuning added by the implementation to
allow additional non-standard programs to be correct---such as programs
that use threads, access hardware directly and so on.

The reordering of memory updates can only be observed by programs which
are far from standard C or C++. The rules which govern the correctness
of these programs, and therefore which govern what optimizations may or
may not be applied, are entirely up to the language implementors.

The system architecture of advanced multiprocessor systems typically
distinguishes differend kinds of memory regions. The reordering
optimizations are not applied to all of the regions. For example,
memory mapped hardware registers would be placed in a region to which
caching and reordering optimizations are not applied, thus programs
which use volatile lvalues to access such registers will work properly.

These multiprocessors typically run POSIX environments, which dictate
the that the use of volatile is not necessary in MT programming;
rather, what is mandatory is the use of the proper synchronization
functions. Thus a program which accesses shared data without using
these functions is incorrect, and its behavior may therefore vary with
the system configuration, the optimization settings of the compiler,
phase of the moon, etc.

Martin Berger

unread,

Jan 17, 2001, 7:14:50 AM1/17/01

to

Kaz Kylheku <k...@ashi.footprints.net> wrote

> I find that using simple assertions is adequate for finding
> unsynchronized accesses to shared variables. A function or statement
> block that expects a lock to already be held simply does something like
> this:
>
> assert (lock.current_thread_is_owner());

i for one prefer to deterministically and at compile time be told
about the possibility of a race condition rather than non-deterministically
at run-time.not to mention the overhead ...

martin

Andrei Alexandrescu

unread,

Jan 17, 2001, 7:18:09 AM1/17/01

to

"Kaz Kylheku" <k...@ashi.footprints.net> wrote in message
news:slrn9699o...@ashi.FootPrints.net...

> On 16 Jan 2001 13:57:02 -0500, Dave Butenhof <David.B...@compaq.com>
wrote:
> >First off, it does nothing about "detecting races". What it DOES is try
to use
> >(one might reasonably say "abuse") a language feature to try to detect
> >unsynchronized access to shared variables. That's a noble cause, but I
don't
> >like the implementation much.
>
> I find that using simple assertions is adequate for finding
> unsynchronized accesses to shared variables. A function or statement
> block that expects a lock to already be held simply does something like
> this:
>
> assert (lock.current_thread_is_owner());
>
> That's it; no messing around with contortions of the C++ type system.

I guess we reached an irreductible position here. For me, transforming an
assertion in a compile-time error is a cool thing. For you, it's not cool at
all. I perfectly understand your position, but I think otherwise.

Andrei

James Kanze

unread,

Jan 17, 2001, 10:29:07 AM1/17/01

to

Charles Bryant wrote:

I don't see what mangling has to do with it either, but I certainly
don't see any problem for the compiler to generate the necessary
information when it emits the function definition and the function
calls, and for the linker to patch the code up to do the necessary
saves.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627