Volatile and threads.

Michael Furman

unread,

Jul 1, 2003, 4:33:36 PM7/1/03

to

About volatile.

OK, I'll try one more time to say what volatile is for.
Let's consider that we have a shared variable that is being
accessed from more then one thread:

int vshared;

I believe everybody agree that we need some
synchronization primitives for working with this variable
correctly. It could be mutexes or whatever else (may be
something that will be invented in future) - what is
important that as we using procedural language (C) these
primitives similar to language statements: it could be
function calls or some inline code (inline functions or
macros). I would call this primitives lock() and unlock() (in
reality they should probably have parameter(s), but I have
just one shared variable - so just one mutex or whatever
else synchronization object is enough for me).

And in my example CPU has many registers that can keep
int values, so compiler is allowed to optimize code by using
these registers to keep values of some variables. Though, at
some points in the compiler has to stop such optimization
practice and store some values in their memory locations
and/or refresh them by reading from the memory (or just
forget that their values are in registers). For example,
compiler must do it with variables X and Y if it sees call to
some function that can use or change these variables.

Now let us look at the fragment of code:

lock();
vshared = 125;
unlock();
.........
lock();
printf("%d\n", vshared);
unlock();

Question is whether compiler will store assigned value
(125) in the store before unlock() and read a value of
vshared after a second lock() - otherwise code could work
incorrectly because another thread can change the value of
vshared between unlock() and the second lock, even if it
uses the same lock/unlock mechanism (e.g. the same
mutex). And in this case it will print "125" rather then this
new assigned value.

Compiler dose not know about threads and the only reason
for compiler to drop register optimization is whether it
knows that lock()/unlock() primitives could not change the
value of vshared or not. Now it depend of what
lock()/unlock() are - hoe they are implemented. It could be:

1. Typical case of using mutexes. lock() and unlock() are
library functions and compiler does not see their
implementation (or it does not see implementation of some
internal functions, like OS kernel functions). This it the
case that my opponents keep in mind: compiler must think
that almost every variable could be potentially accessed -
at least all variables that could be potentially accessed by
thread code.
In this case everything will work and there is no need/help
in using volatile.
Problem with this variant - it is slow. Overhead of
storing/reloading all optimized by keeping in register
variables (though we need it for only one variable: vshaere)
will be added to overhead of using external or library
function.

2. Compiler sees lock/unlock implementation because
either compiler so smart that can analyze external and
library modules or these primitives are implemented inline.
And compiler discovers that neither lock() nor unlock()
access vshared. In this case compiler can keep its value in
the register all time and program will print old, possibly
incorrect, value.
And in this case to make it correct we need to declare
vshared volatile. It will not just make program correct (in
sense of doing what it is designed to do), but do it w/o
imposing extra overhead: it will turn off optimization of
only single variable rather then all variables used it the
code.

Notes:
1. Of cause everything is not portable in any C/C++ sense
(like we can't talk about C/C++ portability of a program
with threads). Standards have very lose definition (I would
call it comments rather then definition) of volatile keyword.
2. I believe that what I said is consistent with Andrei's
quote that was recently posted in this newsgroup.
3. I am tiered of flame wars and trying to catch opponents
in some minor mistake. I have probably make some
mistakes in this post (besides my broken English). I am just
expressing my opinion (though AFAIK it is widely
accepted) - it can't be proved as ther is no exact definition
of volatile in standards.
What I am interested is (if you do not agree):
- am I correctly interpreted your arguments as variant 1 in
my text?
- do you think that variant 2 in wrong? Impossible? If so -
why?

Regards,
Michael Furman

David Schwartz

unread,

Jul 1, 2003, 5:01:01 PM7/1/03

to

"Michael Furman" <Michae...@Yahoo.com> wrote in message
news:bdsr71$10ema1$1...@ID-122417.news.dfncis.de...

> 1. Typical case of using mutexes. lock() and unlock() are
> library functions and compiler does not see their
> implementation (or it does not see implementation of some
> internal functions, like OS kernel functions). This it the
> case that my opponents keep in mind: compiler must think
> that almost every variable could be potentially accessed -
> at least all variables that could be potentially accessed by
> thread code.
> In this case everything will work and there is no need/help
> in using volatile.
> Problem with this variant - it is slow. Overhead of
> storing/reloading all optimized by keeping in register
> variables (though we need it for only one variable: vshaere)
> will be added to overhead of using external or library
> function.

No, it's not slow. The compiler does not have to store/reload *all* the
variables it optimized. It only needs to store/reload those that could
potentially be accessed by another thread.

> 2. Compiler sees lock/unlock implementation because
> either compiler so smart that can analyze external and
> library modules or these primitives are implemented inline.
> And compiler discovers that neither lock() nor unlock()
> access vshared.

But they do, through other threads.

> In this case compiler can keep its value in
> the register all time and program will print old, possibly
> incorrect, value.
> And in this case to make it correct we need to declare
> vshared volatile. It will not just make program correct (in
> sense of doing what it is designed to do), but do it w/o
> imposing extra overhead: it will turn off optimization of
> only single variable rather then all variables used it the
> code.

*sigh* No. There is nothing in the semantics of 'volatile' that says it
guarantees memory visibility by another thread. So even though the processor
might issue an instruction to write 'vshared' to memory, there's no
guarantee that another thread will actually *see* that write when it does a
read.

So this is another case where 'volatile' does nothing, or at least, is
not guaranteed to do anything.

> Notes:
> 1. Of cause everything is not portable in any C/C++ sense
> (like we can't talk about C/C++ portability of a program
> with threads). Standards have very lose definition (I would
> call it comments rather then definition) of volatile keyword.

Right, so again, all of this might or might not apply to any particular
implementation. It would have to be specified by the implementation that
'volatile' has memory visiblity semantics. I know of no platform where this
is the case.

> 2. I believe that what I said is consistent with Andrei's
> quote that was recently posted in this newsgroup.

Yes, that's why you are both equally wrong.

> 3. I am tiered of flame wars and trying to catch opponents
> in some minor mistake. I have probably make some
> mistakes in this post (besides my broken English). I am just
> expressing my opinion (though AFAIK it is widely
> accepted) - it can't be proved as ther is no exact definition
> of volatile in standards.

That's the point! Either the standard says it will do something, or it
doesn't.

> What I am interested is (if you do not agree):
> - am I correctly interpreted your arguments as variant 1 in
> my text?

Variant 1 is how most platforms happen to implement mutexes. They could
do it some other way if they wanted to, but they must prohibit mutexes from
breaking.

> - do you think that variant 2 in wrong? Impossible? If so -
> why?

It's just pointless. First, it causes mutexes to break, even if you do
use 'volatile' on systems with weak memory ordering. Second, its only
alleged advantage is as an optimization. But actually, it's a pessimization
for several reasons. The main one is that having to do everything through
memory (what 'volatile' would have to force in order to work) would slow
down the code inside the lock so much that any alleged benefit to optimizing
the locking primitives would never be repaid.

Second, there exists no platform I know of where locks actually work
that way. Even hand-coded spinlocks always use the necessary processor magic
to disable memory optimizations across the spinlock calls. Nobody has ever
had any reason to try to create a universe such as the one you imagine. If
they did, and they specifically said that 'volatile' would provide memory
visiblity (because they had arranged it some way), then you'd be right for
that platform.

Yes, someone could create a platform where 'volatile' provided thread
safety, either alone or in combination with something else. But, to my
knowledge, nobody has yet done so, and for good reason. Forcing all access
to shared data to be through 'volatile' would slow things down horribly,
especially if 'volatile' had to provide atomicity, ordering, *and*
visibility guarantees.

DS

Michael Furman

unread,

Jul 1, 2003, 7:35:08 PM7/1/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:bdssqe$426$1...@nntp.webmaster.com...

>
> "Michael Furman" <Michae...@Yahoo.com> wrote in message
> news:bdsr71$10ema1$1...@ID-122417.news.dfncis.de...
>
> > 1. Typical case of using mutexes. lock() and unlock() are
> > library functions and compiler does not see their
> > implementation (or it does not see implementation of some
> > internal functions, like OS kernel functions). This it the
> > case that my opponents keep in mind: compiler must think
> > that almost every variable could be potentially accessed -
> > at least all variables that could be potentially accessed by
> > thread code.
> > In this case everything will work and there is no need/help
> > in using volatile.
> > Problem with this variant - it is slow. Overhead of
> > storing/reloading all optimized by keeping in register
> > variables (though we need it for only one variable: vshaere)
> > will be added to overhead of using external or library
> > function.
>
> No, it's not slow. The compiler does not have to store/reload *all*
the
> variables it optimized. It only needs to store/reload those that could
> potentially be accessed by another thread.

Yes, it is slow. Compiler does not know anything about thread, so it has
to store/reload all the veriables that could potewntially be accessed by
unknown external function. So it is:
1. All external variables.
2. Some of static/automatic variables that are exposed to external "word".
For example ones that static, that coul be accessed by non-static
function).
So, usually it is almost all variables.

>
> > 2. Compiler sees lock/unlock implementation because
> > either compiler so smart that can analyze external and
> > library modules or these primitives are implemented inline.
> > And compiler discovers that neither lock() nor unlock()
> > access vshared.
>
> But they do, through other threads.

No, they are not. Even with whole knowledge about progrem design,
lock/unlock with some mutex object related (I ca't say access) to
accesses to only shared variables that protected by it, not to all.
But it is not relevant here: Compiler does not know anything about
threads - if it does not see mentionig variable name in the implementation
of the code (I am simplifying of cause) it has rigt to do optimization.

>
> > In this case compiler can keep its value in
> > the register all time and program will print old, possibly
> > incorrect, value.
> > And in this case to make it correct we need to declare
> > vshared volatile. It will not just make program correct (in
> > sense of doing what it is designed to do), but do it w/o
> > imposing extra overhead: it will turn off optimization of
> > only single variable rather then all variables used it the
> > code.
>
> *sigh* No. There is nothing in the semantics of 'volatile' that says
it
> guarantees memory visibility by another thread. So even though the
processor
> might issue an instruction to write 'vshared' to memory, there's no
> guarantee that another thread will actually *see* that write when it does
a
> read.

What did you mean when you said first "no"? I absolutely agree with what you
said here with exception of this "no". I did not say anything about any
garantee.
Cold you please reread what I saud? You are arguing as if I said "it is
anough
to put volatale to make something correct ant to garantee something". I did
not
say anything close to that.

>
> So this is another case where 'volatile' does nothing, or at least, is
> not guaranteed to do anything.

Again I trying to do my best to interpret what you said. I did not mention
any
garantee - formally nothing could not be garanteed, because there is no
strict definition of semantics of volatile. So, I could agree again with
you,
though I think that your sentence is a kind of misleading.

What I am saying is neither it neither a negation of it:

"Absense of 'volatile' could cause the code work incorrectly in some
implementations".

I did not say that presence of 'volatile' garanteed anything! It is not
true.
And it is not even have strict sense: there is no enough formal base
for proving correctness of any code because neither 'volatile' nor threads
have formal definitions (and IMO could not have - in terms compatible with
C/C++
semantics).

> > Notes:
> > 1. Of cause everything is not portable in any C/C++ sense
> > (like we can't talk about C/C++ portability of a program
> > with threads). Standards have very lose definition (I would
> > call it comments rather then definition) of volatile keyword.
>
> Right, so again, all of this might or might not apply to any
particular
> implementation. It would have to be specified by the implementation that
> 'volatile' has memory visiblity semantics. I know of no platform where
this
> is the case.

Volatile dose not have memory visiblity semantics! and it is not intended to
have. You are saing it as if I said it does - I did not! Volatile is just
tips to
the compiler to disable some optimization.

> > 2. I believe that what I said is consistent with Andrei's
> > quote that was recently posted in this newsgroup.
>
> Yes, that's why you are both equally wrong.

Again we go to the prescool level of discussion .....

>
> > 3. I am tiered of flame wars and trying to catch opponents
> > in some minor mistake. I have probably make some
> > mistakes in this post (besides my broken English). I am just
> > expressing my opinion (though AFAIK it is widely
> > accepted) - it can't be proved as ther is no exact definition
> > of volatile in standards.
>
> That's the point! Either the standard says it will do something, or it
> doesn't.

Age you sure? Does standard say that using "valatile" will crash your
computer? Does it say it will not?

>
> > What I am interested is (if you do not agree):
> > - am I correctly interpreted your arguments as variant 1 in
> > my text?
>
> Variant 1 is how most platforms happen to implement mutexes. They
could
> do it some other way if they wanted to, but they must prohibit mutexes
from
> breaking.
>
> > - do you think that variant 2 in wrong? Impossible? If so -
> > why?
>

OK, I again try to listen.

> It's just pointless. First, it causes mutexes to break, even if you do
> use 'volatile' on systems with weak memory ordering.

Why? You say is whatever implementation of lock is, if it is inlined
or visible it is wrong? So only well hidden implementation could
be correct?

> Second, its only
> alleged advantage is as an optimization. But actually, it's a
pessimization
> for several reasons. The main one is that having to do everything through
> memory (what 'volatile' would have to force in order to work) would slow
> down the code inside the lock so much that any alleged benefit to
optimizing
> the locking primitives would never be repaid.

No, that is not how it works. If I declared vshared as volatile only
accesses to this
variable are forced to be through memory. It does not change the lock
because
lock does not access vshared.

I would imagin machine where mutexes are implemented in hardware, so
lock/unlock
are just either one inline instruction or inline access to some special
memory access.

>
> Second, there exists no platform I know of where locks actually work
> that way. Even hand-coded spinlocks always use the necessary processor
magic
> to disable memory optimizations across the spinlock calls.

Sure they have to. But any processor magic coud only disable processor
memory
optimization. Compiler could just access a copy in the register and all your
magic
will be useless.

> Nobody has ever
> had any reason to try to create a universe such as the one you imagine. If
> they did, and they specifically said that 'volatile' would provide memory
> visiblity (because they had arranged it some way), then you'd be right for
> that platform.

Volatile does not and should not provide memory visability! Do you hear me?
I neither said nor imagined that it does!

>
> Yes, someone could create a platform where 'volatile' provided thread
> safety, either alone or in combination with something else. But, to my
> knowledge, nobody has yet done so, and for good reason. Forcing all access
> to shared data to be through 'volatile' would slow things down horribly,
> especially if 'volatile' had to provide atomicity, ordering, *and*
> visibility guarantees.

How it is related to what I am saying?

OK, I should give up. I do not have any chance to be heared by you.
And I believe I have said said enough for everybody else to ether agree,
disagree, point me on some error or just comment on what I said.

Michael Furman

David Schwartz

unread,

Jul 1, 2003, 9:47:44 PM7/1/03

to

I'll try to put this as simply as possible so there's no
misunderstanding.

You have tried to construct a theoretical implementation in which
'volatile' is useful to synchronize accesses between threads. I have
responded that this theoretical situation *could* exist if someone built it,
but nobody has and nobody will.

The reason nobody has or will is because it's very bad. The problem is
that in order for 'volatile' to work for its normal use, it must disable all
compiler optimizations. This means that all accesses to shared variables
would have to be done without optimizations because that's what making them
'volatile' would mean.

The alleged advantage is speeding up the lock/unlock functions
themselves. But any realistic program spends much more CPU time *holding*
locks than *acquiring* locks. So even if you could speed up lock acquisition
by a factor of ten, the access to 'volatile' variables while holding the
lock would increase the time you kept the lock held by a lot. This would, on
balance be a *massive* loss.

Compare:

Lock(); 2%
DoStuff(); 96%
Unlock(); 2%

Your potential inlining of 'Lock' and 'Unlock' might, say, make them ten
times faster. But having to make every variable 'DoStuff' accesses
'volatile' will probably make all that stuff at least 20% slower. (I'm being
generous. Realistically, lock and unlock would be maybe 30% faster and all
the stuff would be half as fast because memory is *much* slower than
registers.) So your code would look like this:

Lock(); .2%
DoStuff(); 115%
Unlock(); .2%

In total, this code takes 115.4% of the time the normal implementation
took. So even under these extremely generous assumptions, your code is 15%
slower than the usual mutex implementation. Realistically, it will be so
much worse that it's unbelievable. (Try changing all your variables to
'volatile' in single-threaded code and watch what happens!)

So nobody has created an implementation like this. So there does not
exist an implementation where 'volatile' is useful in this way. And probably
nobody will ever create one, or if they do, it will not be widely used.

DS

Michael Furman

unread,

Jul 1, 2003, 9:58:27 PM7/1/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:bdtdk0$e04$1...@nntp.webmaster.com...

>
> I'll try to put this as simply as possible so there's no
> misunderstanding.

No, you have missed whole story. I was explaining (not to you anymore, but
other
participants of this group) how it really works.

Michael Furman

David Schwartz

unread,

Jul 1, 2003, 10:13:03 PM7/1/03

to

"Michael Furman" <Michae...@Yahoo.com> wrote in message

news:bdte84$10vvgb$1...@ID-122417.news.dfncis.de...

How *WHAT* really works? There is no platform that works the way you
suggest. If you think there is one, name the platform. VC++ is not this way.
Pthreads is not this way.

DS

Michael Furman

unread,

Jul 1, 2003, 11:15:07 PM7/1/03

to

> How *WHAT* really works? There is no platform that works the way you
> suggest. If you think there is one, name the platform. VC++ is not this
way.
> Pthreads is not this way.

You do not even try to understand what I am saying. How can you say that
something does not work in a way you don't understand?

David Schwartz

unread,

Jul 1, 2003, 11:19:45 PM7/1/03

to

"Michael Furman" <Michae...@Yahoo.com> wrote in message

news:bdtins$105gc8$1...@ID-122417.news.dfncis.de...

Because I understand the way they actually *do* work.

DS

SenderX

unread,

Jul 1, 2003, 11:29:54 PM7/1/03

to

> *sigh* No. There is nothing in the semantics of 'volatile' that says
it
> guarantees memory visibility by another thread. So even though the
processor
> might issue an instruction to write 'vshared' to memory, there's no
> guarantee that another thread will actually *see* that write when it does
a
> read.

Correct, the compiler would need to put the proper barriers on volatile
access, in order for it to overcome memory visibility...

--
The designer of the SMP and HyperThread friendly, AppCore library.

http://AppCore.home.comcast.net

Momchil Velikov

unread,

Jul 2, 2003, 3:54:47 AM7/2/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:<bdtdk0$e04$1...@nntp.webmaster.com>...

> But any realistic program spends much more CPU time *holding*
> locks than *acquiring* locks.

I don't see how this would be bad or even relevant. While the program
is holding a lock it does meaningful processing, unlike while its
acquiring a lock. It's meaningless to compare these times.

> So even if you could speed up lock acquisition
> by a factor of ten, the access to 'volatile' variables while holding the
> lock would increase the time you kept the lock held by a lot. This would, on
> balance be a *massive* loss.

Massive loss for which? For programs, whose scalability is constrained
by the time spent blocked on a lock? Statistically speaking, there
aren't any. :)

~velco

Momchil Velikov

unread,

Jul 2, 2003, 4:03:52 AM7/2/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:<bdtf3f$eo9$1...@nntp.webmaster.com>...

A number of compilers (all?) I've used, define "access to an object
that has volatile-qualified type" as actually issuing memory
load/store instructions[1] before/after sequence points.

~velco

[1] Granted, in a number of cases, not documented, but observed
behavior. I guess one can nitpick a lot about that, only that it's
pointless for the discussion - we're talking about cases where this is
in fact true. Not documenting truth does not make it false.

David Schwartz

unread,

Jul 2, 2003, 4:05:17 AM7/2/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message
news:87bded37.03070...@posting.google.com...

> "David Schwartz" <dav...@webmaster.com> wrote in message
news:<bdtdk0$e04$1...@nntp.webmaster.com>...

> > But any realistic program spends much more CPU time *holding*
> > locks than *acquiring* locks.

> I don't see how this would be bad or even relevant. While the program
> is holding a lock it does meaningful processing, unlike while its
> acquiring a lock. It's meaningless to compare these times.

Huh?

> > So even if you could speed up lock acquisition
> > by a factor of ten, the access to 'volatile' variables while holding the
> > lock would increase the time you kept the lock held by a lot. This
would, on
> > balance be a *massive* loss.

> Massive loss for which? For programs, whose scalability is constrained
> by the time spent blocked on a lock? Statistically speaking, there
> aren't any. :)

For pretty much every program ever written. The time spent holding a
lock is critical for two reasons:

1) Most code in multithreaded program spends most of its time holding
one or more locks and manipulating data that could be shared.

2) While you're holding a lock, you're reducing the ability of other
threads to do their jobs.

So a platform that required all shared variables to be 'volatile' would
be an absolute disaster. That's why there are no such platforms.

DS

David Schwartz

unread,

Jul 2, 2003, 4:12:33 AM7/2/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message
news:87bded37.03070...@posting.google.com...

> > How *WHAT* really works? There is no platform that works the way you
> > suggest. If you think there is one, name the platform. VC++ is not this
way.
> > Pthreads is not this way.

> A number of compilers (all?) I've used, define "access to an object
> that has volatile-qualified type" as actually issuing memory
> load/store instructions[1] before/after sequence points.

So what? What about memory visibility? What about ordering? What about
atomicity? This is just not enough and there is no other thing you can add
to it that gets you these other things.

> [1] Granted, in a number of cases, not documented, but observed
> behavior. I guess one can nitpick a lot about that, only that it's
> pointless for the discussion - we're talking about cases where this is
> in fact true. Not documenting truth does not make it false.

You can construct code that does one thing with volatile and something
else without by accident on some one particular platform. You can then
pronounce what it did with volatile as "right" and what it did without
volatile as "wrong". This is far from the same thing as saying that volatile
is actually useful. Things may happen to work on these platforms until
something changes, like a newer CPU that doesn't provide the same inherent
ordering guarantees as the one you tested on.

The general belief that 'volatile' is useful for synchronizating
variables across threads was harmless when systems with weak memory ordering
semantics and no memory visibility issues either didn't exist or weren't
common. It was always wrong, it just happened to work. Now, it's not just
wrong, it's dangerous.

Specifically, it does not work for pthreads and it does not work for
WIN32. So maybe you could find some platforms where it does "work", but that
would obviously be a platform-specific thing and wouldn't apply to the most
common platforms. Anyone who says 'volatile' is for thread synchronization,
without pointing out that this is only accidentally true on obscure systems,
is not just wrong but dangerously wrong.

DS

SenderX

unread,

Jul 2, 2003, 5:18:02 AM7/2/03

to

> For pretty much every program ever written. The time spent holding a
> lock is critical

Very critical!

;)

Momchil Velikov

unread,

Jul 2, 2003, 5:41:39 AM7/2/03

to

"SenderX" <x...@xxx.xxx> wrote in message news:<SksMa.13375$Xm3.2077@sccrnsc02>...

> > *sigh* No. There is nothing in the semantics of 'volatile' that says
> it
> > guarantees memory visibility by another thread. So even though the
> processor
> > might issue an instruction to write 'vshared' to memory, there's no
> > guarantee that another thread will actually *see* that write when it does
> a
> > read.
>
> Correct, the compiler would need to put the proper barriers on volatile
> access, in order for it to overcome memory visibility...

That'd be incredibly stupid, platform's synchronization primitives
must take care of memory visibility.

~velco

SenderX

unread,

Jul 2, 2003, 5:51:56 AM7/2/03

to

> That'd be incredibly stupid, platform's synchronization primitives
> must take care of memory visibility.

If volatile access needed memory visibility, then barriers would have to be
used.

David Schwartz

unread,

Jul 2, 2003, 5:51:05 AM7/2/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message
news:87bded37.03070...@posting.google.com...

> > Correct, the compiler would need to put the proper barriers on volatile

> > access, in order for it to overcome memory visibility...

> That'd be incredibly stupid, platform's synchronization primitives
> must take care of memory visibility.

Exactly. On every platform I know of:

1) The synchronization primitives are enough without volatile.

2) Volatile is not enough by itself.

3) There nothing you can add to 'volatile' such that the combination is
enough even though none of its components alone are.

In other words, on no platform is 'volatile' useful for thread safe
variables, except for two cases:

1) Variables that change only to indicate a shutdown. In this case, you
don't need code that's guaranteed to work.

2) Using the lock-free test with thread signalling trick. In this case,
'volatile' isn't applied to the variables that are shared between threads.

DS

Momchil Velikov

unread,

Jul 2, 2003, 9:00:33 AM7/2/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:<bdu45i$r4d$1...@nntp.webmaster.com>...

> "Momchil Velikov" <ve...@fadata.bg> wrote in message
> news:87bded37.03070...@posting.google.com...
>
>
> > > How *WHAT* really works? There is no platform that works the way you
> > > suggest. If you think there is one, name the platform. VC++ is not this
> way.
> > > Pthreads is not this way.
>
> > A number of compilers (all?) I've used, define "access to an object
> > that has volatile-qualified type" as actually issuing memory
> > load/store instructions[1] before/after sequence points.
>
> So what? What about memory visibility? What about ordering? What about
> atomicity? This is just not enough and there is no other thing you can add
> to it that gets you these other things.

Sure it's not enough, did I say it is ? (For that matter, did _anyone_
say it is ?)

The point is that one _must_ ensure particular ordering between memory
load/store instructions and memory synchronization instructions. What
would prevent a compiler when given [1]:

volatile int y;
x = y;
sync();

to emit

sync
lw r5, y

instead of

lw r5, y
sync

eh ?

Well, with those compilers, which define ``x = y'' to actually
comprise an access to a volatile object, it's the ``volatile''
keyword.

> > [1] Granted, in a number of cases, not documented, but observed
> > behavior. I guess one can nitpick a lot about that, only that it's
> > pointless for the discussion - we're talking about cases where this is
> > in fact true. Not documenting truth does not make it false.
>
> You can construct code that does one thing with volatile and something
> else without by accident on some one particular platform. You can then
> pronounce what it did with volatile as "right" and what it did without
> volatile as "wrong". This is far from the same thing as saying that volatile
> is actually useful. Things may happen to work on these platforms until
> something changes, like a newer CPU that doesn't provide the same inherent
> ordering guarantees as the one you tested on.
>
> The general belief that 'volatile' is useful for synchronizating
> variables across threads was harmless when systems with weak memory ordering
> semantics and no memory visibility issues either didn't exist or weren't
> common. It was always wrong, it just happened to work. Now, it's not just
> wrong, it's dangerous.
>
> Specifically, it does not work for pthreads and it does not work for
> WIN32. So maybe you could find some platforms where it does "work", but that
> would obviously be a platform-specific thing and wouldn't apply to the most
> common platforms. Anyone who says 'volatile' is for thread synchronization,
> without pointing out that this is only accidentally true on obscure systems,
> is not just wrong but dangerously wrong.

Frankly, I don't see how this could be a reply to my posting. You
refute a number of apparently wrong claims that I didn't make.

~velco

[1] lw - memory load insn with apparent semantics [2]
sync - memory-barrier, e.g. PowerPC ``sync'' insn

[2] I hope folks will not nitpick about apparent lack of documentation
of ``lw'' ;)

Momchil Velikov

unread,

Jul 2, 2003, 9:14:49 AM7/2/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:<bdu3nt$r02$1...@nntp.webmaster.com>...

> "Momchil Velikov" <ve...@fadata.bg> wrote in message
> news:87bded37.03070...@posting.google.com...
>
> > "David Schwartz" <dav...@webmaster.com> wrote in message
> news:<bdtdk0$e04$1...@nntp.webmaster.com>...
>
> > > But any realistic program spends much more CPU time *holding*
> > > locks than *acquiring* locks.
>
> > I don't see how this would be bad or even relevant. While the program
> > is holding a lock it does meaningful processing, unlike while its
> > acquiring a lock. It's meaningless to compare these times.
>
> Huh?
>
> > > So even if you could speed up lock acquisition
> > > by a factor of ten, the access to 'volatile' variables while holding the
> > > lock would increase the time you kept the lock held by a lot. This
> would, on
> > > balance be a *massive* loss.
>
> > Massive loss for which? For programs, whose scalability is constrained
> > by the time spent blocked on a lock? Statistically speaking, there
> > aren't any. :)
>
> For pretty much every program ever written.

In your universe maybe ...

> The time spent holding a lock is critical for two reasons:
>
> 1) Most code in multithreaded program spends most of its time holding
> one or more locks and manipulating data that could be shared.
>
> 2) While you're holding a lock, you're reducing the ability of other
> threads to do their jobs.

These ought to be some stupid multithreaded programs (assuming the
reason for them to be multithreaded is to increase performance via
parallelism).

If you go for scalable increase in performance you strive to minimize
shared memory areas, because cache issues hit _long_ before wait time
ones.

IOW, scalable multithreaded programs _do not concurently access
memory_. (generally speaking).

Hence, threads hold locks rarely and hold them for short periods of
time. Lock performance is meaningless, it has almost no impact on
overall performance.

~velco

David Schwartz

unread,

Jul 2, 2003, 4:50:21 PM7/2/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message

news:87bded37.0307...@posting.google.com...

> > So what? What about memory visibility? What about ordering? What
about
> > atomicity? This is just not enough and there is no other thing you can
add
> > to it that gets you these other things.

> Sure it's not enough, did I say it is ? (For that matter, did _anyone_
> say it is ?)

So the question is, what is the other thing you can add to it that would
make it enough?

> The point is that one _must_ ensure particular ordering between memory
> load/store instructions and memory synchronization instructions. What
> would prevent a compiler when given [1]:

> volatile int y;
> x = y;
> sync();
>
> to emit
>
> sync
> lw r5, y
>
> instead of
>
> lw r5, y
> sync
>
> eh ?

The fact that the 'sync' C/C++ invocation would be wrapped in such a way
that the compiler wouldn't order around it. VC++, for example, never orders
around inline assembly. GCC, for example, can be told not to by the 'memory'
invalidation and 'volatile' keyword. (Note that this is NOT using volatile
to protect shared data, it's a GCC platform-specific extension that uses the
keyword 'volatile' for a completely different purpose and doesn't apply it
to data but to code!)

> Well, with those compilers, which define ``x = y'' to actually
> comprise an access to a volatile object, it's the ``volatile''
> keyword.

Yes, if those compilers/platforms existed, but they don't. VC++ doesn't
work this way. GCC doesn't work this way.

I never denied that one could create a platform on which 'volatile'
provided thread safety. Heck, one could create a platform with single mutex
and put a full mutex lock before and full mutex unlock after every
'volatile' access.

But nobody does this.

DS

David Schwartz

unread,

Jul 2, 2003, 4:52:52 PM7/2/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message
news:87bded37.03070...@posting.google.com...

> If you go for scalable increase in performance you strive to minimize

> shared memory areas, because cache issues hit _long_ before wait time
> ones.

What does this have to do with what I said?

> IOW, scalable multithreaded programs _do not concurently access
> memory_. (generally speaking).

Yes, but when they do, it's what limits their scalability, so it's vital
that nothing slow those sections that do down.

> Hence, threads hold locks rarely and hold them for short periods of
> time. Lock performance is meaningless, it has almost no impact on
> overall performance.

Take any real-world scalable application and add code to make it hold
every lock for about twice as long as it currently does, watch what happens
to its scalability.

If you're correctly coded, all the limits to your scalability occur
while you're holding a lock. Lockless code should scale nearly perfectly. So
anything that slowed down code that held locks would trash your scalability.
Making shared data 'volatile' slows down the code that holds locks, because
it's while you have a lock held that you access data that is or might be
shared.

DS

Michael Furman

unread,

Jul 2, 2003, 5:36:37 PM7/2/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:bdvgid$kdt$1...@nntp.webmaster.com...
> [...]

> "Momchil Velikov" <ve...@fadata.bg> wrote in message
> news:87bded37.0307...@posting.google.com...
>
> > > So what? What about memory visibility? What about ordering? What
> about
> > > atomicity? This is just not enough and there is no other thing you can
> add
> > > to it that gets you these other things.
>
> > Sure it's not enough, did I say it is ? (For that matter, did _anyone_
> > say it is ?)

Would you be so kind to answer his question: "did _anyone_ say it is ?"
(in this descussion).

I would appreciate very much your direct answer (yes or no).

Regards,
Michael Furman

Alexander Terekhov

unread,

Jul 2, 2003, 6:41:05 PM7/2/03

to

Michael Furman wrote:
[...]

> I would appreciate very much your direct answer (yes or no).

^^^^^^^^^^^

Well, "it depends", of course. I'm biased toward:

#include <iostream>

enum vote { no, yes };

std::ostream& operator <<(std::ostream& os, vote v) {
return os << (!v ? "no" : "yes");
}

vote operator or(vote lhs, vote rhs) {
return vote(lhs | rhs);
}

int main() {
std::cout << (yes or no) << std::endl;
}

regards,
alexander.

David Schwartz

unread,

Jul 2, 2003, 6:46:33 PM7/2/03

to

"Michael Furman" <Michae...@Yahoo.com> wrote in message

news:bdvj97$ueq99$1...@ID-122417.news.dfncis.de...

> > So what? What about memory visibility? What about ordering? What
> > about
> > atomicity? This is just not enough and there is no other thing you can
> > add
> > to it that gets you these other things.

> > > Sure it's not enough, did I say it is ? (For that matter, did _anyone_
> > > say it is ?)

> Would you be so kind to answer his question: "did _anyone_ say it is ?"
> (in this descussion).

> I would appreciate very much your direct answer (yes or no).

No, nobody said it is. Did I say anyone said it was? (yes or no) See, I
can be a kindergardener too.

Let's try a nice kindergarten analogy. Suppose you need 8 tomatoes, and
there are two boxes you can take, one with 8 and one with 2. Why is the box
with 2 of no use? Because it's not enough. So if someone said the box of 2
was of some use, pointing out that it's not enough would be necessary to
show that it's of no use.

Do you get it now? This is a question of that type. You must have 8
tomatos. Volatile provides 2. Mutexes provide 8.

So volatile is not enough, and there's nothing you can add to it that
isn't enough by itself that together is enough. The fact that it's not
enough is critical to understanding that it's of no use at all.

DS

Michael Furman

unread,

Jul 2, 2003, 7:31:00 PM7/2/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:bdvnc9$oil$1...@nntp.webmaster.com...

>
> "Michael Furman" <Michae...@Yahoo.com> wrote in message
> news:bdvj97$ueq99$1...@ID-122417.news.dfncis.de...
>
> > > So what? What about memory visibility? What about ordering? What
> > > about
> > > atomicity? This is just not enough and there is no other thing you can
> > > add
> > > to it that gets you these other things.
>
> > > > Sure it's not enough, did I say it is ? (For that matter, did
_anyone_
> > > > say it is ?)
>
> > Would you be so kind to answer his question: "did _anyone_ say it is ?"
> > (in this descussion).
>
> > I would appreciate very much your direct answer (yes or no).
>
> No, nobody said it is.

Thanks.

> [...]

You must have 8 tomatos. Volatile provides 2. Mutexes provide 8.

Thanks - It is more that I would want. I will print it and put on the (some)
wall!

Michael Furman

unread,

Jul 2, 2003, 7:37:56 PM7/2/03

to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3F035F81...@web.de...

>
> Michael Furman wrote:
> [...]
> > I would appreciate very much your direct answer (yes or no).
> ^^^^^^^^^^^
>
> Well, "it depends", of course. I'm biased toward:

Of cause it was not question for you. BTW, do you always prefer
"|" to "&"? And how about "lhs ^ rhs" or even "lhs | !rhs" etc. ?
Michael

Alexander Terekhov

unread,

Jul 2, 2003, 7:54:16 PM7/2/03

to

Michael Furman wrote:
>
> "Alexander Terekhov" <tere...@web.de> wrote in message
> news:3F035F81...@web.de...
> >
> > Michael Furman wrote:
> > [...]
> > > I would appreciate very much your direct answer (yes or no).
> > ^^^^^^^^^^^
> >
> > Well, "it depends", of course. I'm biased toward:
>
> Of cause it was not question for you.

Ah. Silly me. It's time to go to bed.

> BTW, do you always prefer "|" to "&"?

yes or no?

> And how about "lhs ^ rhs"

yes.

> or even "lhs | !rhs" etc. ?

yes.

regards,
alexander.

P.S. or no.

David Schwartz

unread,

Jul 3, 2003, 2:16:17 AM7/3/03

to

"Michael Furman" <Michae...@Yahoo.com> wrote in message

news:bdvpvm$117643$1...@ID-122417.news.dfncis.de...

> You must have 8 tomatos. Volatile provides 2. Mutexes provide 8.

> Thanks - It is more that I would want. I will print it and put on the
(some)
> wall!

Volatile is not enough. You are too much.

DS

Momchil Velikov

unread,

Jul 3, 2003, 4:15:07 AM7/3/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:<bdvgid$kdt$1...@nntp.webmaster.com>...

> "Momchil Velikov" <ve...@fadata.bg> wrote in message
> news:87bded37.0307...@posting.google.com...
>
> > > So what? What about memory visibility? What about ordering? What
> about
> > > atomicity? This is just not enough and there is no other thing you can
> add
> > > to it that gets you these other things.
>
> > Sure it's not enough, did I say it is ? (For that matter, did _anyone_
> > say it is ?)
>
> So the question is, what is the other thing you can add to it that would
> make it enough?

Nope, this is not the question, not in this thread at least.

>
> > The point is that one _must_ ensure particular ordering between memory
> > load/store instructions and memory synchronization instructions. What
> > would prevent a compiler when given [1]:
>
> > volatile int y;
> > x = y;
> > sync();
> >
> > to emit
> >
> > sync
> > lw r5, y
> >
> > instead of
> >
> > lw r5, y
> > sync
> >
> > eh ?
>
> The fact that the 'sync' C/C++ invocation would be wrapped in such a way
> that the compiler wouldn't order around it. VC++, for example, never orders around inline assembly.

Some compilers may have other means to ensure the intended ordering,
yes. Nevertheless, speaking of VC (MSDN Oct 2002):

"The volatile keyword is a type qualifier used to declare that an
object
can be modified in the program by something such as the operating
system,
the hardware, or a concurrently executing thread."

"... The system always reads the current value of
a volatile object at the point it is requested, even if the
previous
instruction asked for a value from the same object."

I interpret it as a stronger form of "before the next sequence
point."[1]

"Also, the value of the object is written immediately on
assignment."

Likewise, I intrepret this as a stronger form of "before the next
sequence point."

> GCC, for example, can be told not to by the 'memory'
> invalidation and 'volatile' keyword. (Note that this is NOT using volatile
> to protect shared data, it's a GCC platform-specific extension that uses the
> keyword 'volatile' for a completely different purpose and doesn't apply it
> to data but to code!)

Thanks, I'm fully aware of the GCC extended asm. Indeed, memory
clobber would prevent reordering of the accesses across exented asm
statement, however that'd be unnecessarily pessimizing, because GCC
_does_ reorder across inline assembly.

"Note that even a volatile `asm' instruction can be moved in ways
that appear insignificant to the compiler, such as across jump
instructions. You can't expect a sequence of volatile `asm'
instructions to remain perfectly consecutive. If you want
consecutive
output, use a single `asm'. Also, GCC will perform some
optimizations
across a volatile `asm' instruction; GCC does not "forget
everything"
when it encounters a volatile `asm' instruction the way some other
compilers do."

>
> > Well, with those compilers, which define ``x = y'' to actually
> > comprise an access to a volatile object, it's the ``volatile''
> > keyword.
>
> Yes, if those compilers/platforms existed, but they don't. VC++ doesn't
> work this way. GCC doesn't work this way.

Yes, VC does work this way - see the above MSDN quote.
Yes, GCC does work this way - see
http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Volatiles.html#Volatiles

And from another compiler's documentation:
"In C and C++, a volatile object is accessed if any word or byte (or
halfword on XYZ architectures with halfword support) of the object
is
read or written. For volatile objects, reads and writes occur as
directly
implied by the source code, in the order implied by the source
code."

And from another compiler's documentation:

"... If you have code that depends on memory accesses exactly as
written in the C/C++ code, you must use the volatile keyword to
identify
these accesses."

etc, etc.

> I never denied that one could create a platform on which 'volatile'
> provided thread safety. Heck, one could create a platform with single mutex
> and put a full mutex lock before and full mutex unlock after every
> 'volatile' access.
>
> But nobody does this.

I find you manner of making ridiculuos claims that noone else made and
then refuting them to contribute very little (if anything) to the
discussion. Besides ... disturbing. :)

~velco

[1] Concentrate on "instruction", not "current value".

Momchil Velikov

unread,

Jul 3, 2003, 4:37:49 AM7/3/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:<bdvgn5$kg2$1...@nntp.webmaster.com>...

> "Momchil Velikov" <ve...@fadata.bg> wrote in message
> news:87bded37.03070...@posting.google.com...
>
> > If you go for scalable increase in performance you strive to minimize
> > shared memory areas, because cache issues hit _long_ before wait time
> > ones.
>
> What does this have to do with what I said?
>
> > IOW, scalable multithreaded programs _do not concurently access
> > memory_. (generally speaking).

"IOW, scalable multithreaded programs _do not concurently access
memory_. (generally speaking)."

That's what.

>
> Yes, but when they do, it's what limits their scalability, so it's vital
> that nothing slow those sections that do down.
>
> > Hence, threads hold locks rarely and hold them for short periods of
> > time. Lock performance is meaningless, it has almost no impact on
> > overall performance.
>
> Take any real-world scalable application and add code to make it hold
> every lock for about twice as long as it currently does, watch what happens
> to its scalability.
>
> If you're correctly coded, all the limits to your scalability occur
> while you're holding a lock.

This may have been true in 1990. But not anymore - what limits you
scalability is lock acquire time, even for uncontentded locks. Get an
already reasonably scalable program written with reader-writer locks
and change each rwlock to a mutex. You'd hardly notice any decrease in
scalability.

> Lockless code should scale nearly perfectly.

No. Code that does not share memory should scale nearly perfectly.
Lock-free/wait-free mechanisms still suffer from cache effects.

> So anything that slowed down code that held locks would trash your
> scalability.

No. It'll trash performance, not scalability. In order to have
negative scalability effect:

a) the increase in lock hold time must change the uncontended locks
to
contended ones and

b) the time spent _blocked_ must dominate the time spent _running_
acquiring
the lock.

In most decently written programs you'd have to increase the lock hold
time several 100s of percents before you start to notice scalability
problems (performance decrease will be evident right away, of course).

> Making shared data 'volatile' slows down the code that holds locks, because
> it's while you have a lock held that you access data that is or might be
> shared.

C'mon, it's trivial to copy the volatile object to a non-volatile one
after the lock is acquired and copy it back before the lock is
released.

~velco

Alexander Terekhov

unread,

Jul 3, 2003, 5:18:05 AM7/3/03

to

Momchil Velikov wrote:
[...]

> Some compilers may have other means to ensure the intended ordering,
> yes. Nevertheless, speaking of VC (MSDN Oct 2002):
>
> "The volatile keyword is a type qualifier used to declare that an object
> can be modified in the program by something such as the operating system,
> the hardware, or a concurrently executing thread."
>
> "... The system always reads the current value of a volatile object at
> the point it is requested, even if the previous instruction asked for a
> value from the same object."

DS, I guess it's time to apply some "CycleFree(TM)" technique/method
to sort of "clear the mess" here. ;-)

regards,
alexander.

David Schwartz

unread,

Jul 3, 2003, 5:58:52 PM7/3/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message

news:87bded37.0307...@posting.google.com...

> C'mon, it's trivial to copy the volatile object to a non-volatile one
> after the lock is acquired and copy it back before the lock is
> released.

This is sheer insanity!

DS

David Schwartz

unread,

Jul 3, 2003, 5:57:48 PM7/3/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message

news:87bded37.03070...@posting.google.com...

> Some compilers may have other means to ensure the intended ordering,
> yes. Nevertheless, speaking of VC (MSDN Oct 2002):

> "The volatile keyword is a type qualifier used to declare that an
> object
> can be modified in the program by something such as the operating
> system,
> the hardware, or a concurrently executing thread."

> "... The system always reads the current value of
> a volatile object at the point it is requested, even if the
> previous
> instruction asked for a value from the same object."

> I interpret it as a stronger form of "before the next sequence
> point."[1]

> "Also, the value of the object is written immediately on
> assignment."

> Likewise, I intrepret this as a stronger form of "before the next
> sequence point."

Look very closely at the page you cited. You'll notice it's part of the
C++ reference and it's not marked Microsoft-specific. In other words, this
is Microsoft's understanding of the C++ language. As I've already said, any
semantics for 'volatile' with regard to threads would have to be
implementation-specific.

> > Yes, if those compilers/platforms existed, but they don't. VC++
doesn't
> > work this way. GCC doesn't work this way.

> Yes, VC does work this way - see the above MSDN quote.

The above MSDN quote is not about VC, it's not Microsoft-specific. Look
at the page.

> Yes, GCC does work this way - see
> http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Volatiles.html#Volatiles

*sigh* I give up. Either I'm not capable of explaining this to you or
you're not capable of understanding it.

Note that this section says nothing about memory visibility, speculative
reads, or posted writes. And there is no gcc mechanism to fix these problems
that doesn't also fix everything else. So again, 'volatile' is not useful.

If you want to continue to insist that 'volatile' is useful for thread
synchronization with GCC, I just ask you to provide me with one single
example piece of code in which the 'volatile' is useful.

> And from another compiler's documentation:
> "In C and C++, a volatile object is accessed if any word or byte (or
> halfword on XYZ architectures with halfword support) of the object
> is
> read or written. For volatile objects, reads and writes occur as
> directly
> implied by the source code, in the order implied by the source
> code."

Same problem, that's not enough (think about word tearing, memory
visiblity, posted writes, and so on) and there's nothing you can add to it
to make it enough. So it's still not useful.

> And from another compiler's documentation:
>
> "... If you have code that depends on memory accesses exactly as
> written in the C/C++ code, you must use the volatile keyword to
> identify
> these accesses."

This is meaningless in the context of threads. It is impossible to write
legal threaded code that depends on memory accesses exactly as written in
the C/C++ code. If you don't believe me, prove it, write such code.

> > I never denied that one could create a platform on which 'volatile'
> > provided thread safety. Heck, one could create a platform with single
mutex
> > and put a full mutex lock before and full mutex unlock after every
> > 'volatile' access.

> > But nobody does this.

> I find you manner of making ridiculuos claims that noone else made and
> then refuting them to contribute very little (if anything) to the
> discussion. Besides ... disturbing. :)

Then show me one code example where a variable shared between threads is
marked 'volatile', wherein the code is guaranteed to work with 'volatile'
and not guaranteed to work without it.

DS

Momchil Velikov

unread,

Jul 4, 2003, 4:02:33 AM7/4/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:<be28ss$7t2$1...@nntp.webmaster.com>...

> "Momchil Velikov" <ve...@fadata.bg> wrote in message
> news:87bded37.03070...@posting.google.com...
>
> > Some compilers may have other means to ensure the intended ordering,
> > yes. Nevertheless, speaking of VC (MSDN Oct 2002):
>
> > "The volatile keyword is a type qualifier used to declare that an
> > object
> > can be modified in the program by something such as the operating
> > system,
> > the hardware, or a concurrently executing thread."
>
> > "... The system always reads the current value of
> > a volatile object at the point it is requested, even if the
> > previous
> > instruction asked for a value from the same object."
>
> > I interpret it as a stronger form of "before the next sequence
> > point."[1]
>
> > "Also, the value of the object is written immediately on
> > assignment."
>
> > Likewise, I intrepret this as a stronger form of "before the next
> > sequence point."
>
> Look very closely at the page you cited. You'll notice it's part of the
> C++ reference and it's not marked Microsoft-specific. In other words, this
> is Microsoft's understanding of the C++ language.

Huh? Doesn't VC implement Microsoft's understanding of the C/C++
language? Do you think volatile semantics differ in the MS C and C++
compilers?

> As I've already said, any
> semantics for 'volatile' with regard to threads would have to be
> implementation-specific.

That's what I'm trying to tell you - that the majority of the
compilers define the access to a volatile qualified object in a way
suitable to ensure certain ordering of the instructions in the
instruction stream.

>
> > > Yes, if those compilers/platforms existed, but they don't. VC++
> doesn't
> > > work this way. GCC doesn't work this way.
>
> > Yes, VC does work this way - see the above MSDN quote.
>
> The above MSDN quote is not about VC, it's not Microsoft-specific. Look
> at the page.

A documentation, which specifies behavior of access to
volatile-qualified objects is implementation specific, no matter if
the decumentation itself brands it as such. Period.

> > Yes, GCC does work this way - see
> > http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Volatiles.html#Volatiles
>
> *sigh* I give up. Either I'm not capable of explaining this to you or
> you're not capable of understanding it.
> Note that this section says nothing about memory visibility, speculative
> reads, or posted writes. And there is no gcc mechanism to fix these problems
> that doesn't also fix everything else. So again, 'volatile' is not useful.

You judge based on your limited experience with two compilers. I
said, they may have other means to ensure instruction stream ordering,
other compilers may not have the same. But most compilers (including
these two) define "volatile" semantics in practically the same manner.

> If you want to continue to insist that 'volatile' is useful for thread
> synchronization with GCC, I just ask you to provide me with one single
> example piece of code in which the 'volatile' is useful.

I already provided it (the sync() example). Here's it again.

volatile int y;
x = y;

__asm__ ("sync":::);

If ``y'' is not volatile, the compiler is free to move it after the
``sync''. As documented. Period.

> > And from another compiler's documentation:
> > "In C and C++, a volatile object is accessed if any word or byte (or
> > halfword on XYZ architectures with halfword support) of the object
> > is
> > read or written. For volatile objects, reads and writes occur as
> > directly
> > implied by the source code, in the order implied by the source
> > code."
>
> Same problem, that's not enough (think about word tearing, memory
> visiblity, posted writes, and so on) and there's nothing you can add to it
> to make it enough. So it's still not useful.

I'M TALKING ABOUT ORDERING OF INSTRUCTIONS IN THE INSTRUCTION STREAM
!!!

Now do you hear it ?

> > And from another compiler's documentation:
> >
> > "... If you have code that depends on memory accesses exactly as
> > written in the C/C++ code, you must use the volatile keyword to
> > identify
> > these accesses."
>
> This is meaningless in the context of threads. It is impossible to write
> legal threaded code that depends on memory accesses exactly as written in
> the C/C++ code. If you don't believe me, prove it, write such code.

What's the meaning of "legal" ? It looks like you're talking about
some normative document ? Do you, for example, rule out kernels as
"legal" code ? I assure you, it's impossible to write a kernel, which
is "strictly conforming" or even "conforming" to the C Standard.
Nevertheless, there're lots of kernels out there and they work, no
matter if you disapprove some techniques used to make them work.

(Or you maybe you consider SMP/reentrant/preemptible kernels not to be
"multi-threaded" ?)

> > > I never denied that one could create a platform on which 'volatile'
> > > provided thread safety. Heck, one could create a platform with single
> mutex
> > > and put a full mutex lock before and full mutex unlock after every
> > > 'volatile' access.
>
> > > But nobody does this.
>
> > I find you manner of making ridiculuos claims that noone else made and
> > then refuting them to contribute very little (if anything) to the
> > discussion. Besides ... disturbing. :)
>
> Then show me one code example where a variable shared between threads is
> marked 'volatile', wherein the code is guaranteed to work with 'volatile'
> and not guaranteed to work without it.

Already did (the sync() example). The load instruction will be emitted
an lower address than the sync instruction, which is a _necessary_
condition for certain visibility properties to be satisfied.

Your fault is that you strive hard to not unrdestand that I'm talking
about certain _necessary_, but not sufficient precondition.

~velco

Momchil Velikov

unread,

Jul 4, 2003, 4:06:50 AM7/4/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:<be28ut$7t3$1...@nntp.webmaster.com>...

Could you elaborate? I hope you don't think I'm talking about objects,
bigger than register size?

~velco

Alexander Terekhov

unread,

Jul 4, 2003, 4:59:30 AM7/4/03

to

Momchil Velikov wrote:
>
> "David Schwartz" <dav...@webmaster.com> wrote in message news:<be28ut$7t3$1...@nntp.webmaster.com>...
> > "Momchil Velikov" <ve...@fadata.bg> wrote in message
> > news:87bded37.0307...@posting.google.com...
> >
> > > C'mon, it's trivial to copy the volatile object to a non-volatile one
> > > after the lock is acquired and copy it back before the lock is
> > > released.
> >
> > This is sheer insanity!
>
> Could you elaborate?

Yeah. I second that request! DS, this time with potatoes, please. ;-)

regards,
alexander.

David Schwartz

unread,

Jul 4, 2003, 5:25:13 AM7/4/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message
news:87bded37.03070...@posting.google.com...

> > Look very closely at the page you cited. You'll notice it's part of

the
> > C++ reference and it's not marked Microsoft-specific. In other words,
this
> > is Microsoft's understanding of the C++ language.

> Huh? Doesn't VC implement Microsoft's understanding of the C/C++
> language? Do you think volatile semantics differ in the MS C and C++
> compilers?

Microsoft does not necessarily have one consistent company-wide
understanding of the C/C++ language. What the docmuentation says is there
understanding of the C++ standard need not be the way the compiler is
implemented. And, in fact, it isn't. In VC++, 'volatile' is not useful for
thread synchronization because it has no ordering or visibility guarantees.

> > As I've already said, any
> > semantics for 'volatile' with regard to threads would have to be
> > implementation-specific.

> That's what I'm trying to tell you - that the majority of the
> compilers define the access to a volatile qualified object in a way
> suitable to ensure certain ordering of the instructions in the
> instruction stream.

But that doesn't help. Any optimization the compiler could make in the
instructions could theoretically be made by the processor after the
instructions. Maybe you can't imagine any such optimizations, but prior to
the existence of processors with speculative reads, you might not have
imagined that either. That's why it's important only to use what is
guaranteed to work, that way, it won't break in the future.

> > > > Yes, if those compilers/platforms existed, but they don't. VC++
> > doesn't
> > > > work this way. GCC doesn't work this way.
> >
> > > Yes, VC does work this way - see the above MSDN quote.
> >
> > The above MSDN quote is not about VC, it's not Microsoft-specific.
Look
> > at the page.

> A documentation, which specifies behavior of access to
> volatile-qualified objects is implementation specific, no matter if
> the decumentation itself brands it as such. Period.

I don't agree. There are specific standards-compliant ways to use
volatile, for example, with respect to signal handlers. In any event, if we
agree that the page is Microsof-specific but not so indicated, then it is
erroneous and it would be foolish to rely on something known to have one
serious error. (The scope of the authority for a comment is extremely
serious.)

> I'M TALKING ABOUT ORDERING OF INSTRUCTIONS IN THE INSTRUCTION STREAM
> !!!

> Now do you hear it ?

Any optimization the compiler could do could likewise be done by the
processor. So can we at least agree that 'volatile' is not useful unless
when used in conjunction with something that stops the processor from doing
similar optimizations? That would be at least a point of agreement from
which to start.

> > > And from another compiler's documentation:
> > >
> > > "... If you have code that depends on memory accesses exactly as
> > > written in the C/C++ code, you must use the volatile keyword to
> > > identify
> > > these accesses."
> >
> > This is meaningless in the context of threads. It is impossible to
write
> > legal threaded code that depends on memory accesses exactly as written
in
> > the C/C++ code. If you don't believe me, prove it, write such code.

> What's the meaning of "legal" ? It looks like you're talking about
> some normative document ? Do you, for example, rule out kernels as
> "legal" code ? I assure you, it's impossible to write a kernel, which
> is "strictly conforming" or even "conforming" to the C Standard.
> Nevertheless, there're lots of kernels out there and they work, no
> matter if you disapprove some techniques used to make them work.

If they use 'volatile' for this purpose, they work temporarily. Until
the next build of the compiler or the next revision of the processor adds
some new optimization. I can cite example after example of this if you'd
like.

You have no choice when writing kernel code. You know that you're
processor-specific, compiler-specific, and many other things-specific. But
that's not what we're talking about, we're talking about threads (contexts
of execution within a process).

> (Or you maybe you consider SMP/reentrant/preemptible kernels not to be
> "multi-threaded" ?)

I do not consider them multi-threaded applications. When writing kernel
code, you often have to act without compiler/processor guarantees, it goes
with the territory. So the rules are very different, and the risks much
greater.

DS

David Schwartz

unread,

Jul 4, 2003, 5:58:08 AM7/4/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message
news:87bded37.03070...@posting.google.com...

> I already provided it (the sync() example). Here's it again.

>
> volatile int y;
> x = y;
> __asm__ ("sync":::);

This deserves a post of its own, because it's wrong and dangerous and
for extremely subtle reasons.

What if you had written this code before the advent of weakly-ordered
processors? You would have left off the 'sync'. What would happen when the
next processor iteration was weakly-ordered? Your code would break. What
will happen when the next generation of processors requires something the
above code is missing? It will break.

The problem with this code is that it assumes that you know everything
that can go wrong. You know compiler optimizations can cause problems, so
you use 'volatile'. You know processor optimizations can cause problems, so
you use "sync". You assume that this all that can go wrong. But what happens
when something else in the system starts making optimizations in the future?

The problem is, you are relying upon a guarantee you don't have. You are
guaranteed that 'volatile' disables processor optimizations. You are
guaranteed that 'sync' solves processor ordering issues. But you are not
guaranteed that these are the only things that can go wrong.

On the other hand, had you used the platforms mutexes, you would have
been guaranteed that nothing can go wrong. This is because only mutexes
provide the guarantee that everything necessary will be done.

This is an unacceptable piece of code because it relies upon the
assumption that everything that can possibly go wrong has been covered. The
invention of a new thing that can go wrong causes this code to break. So it
is not guaranteed to work.

As I said, it's possible to write code that happens to work with
volatile and happens to not work without it, provided you define 'work' as
"whatever it does with 'volatile'".

Now you may consider my position to be extreme, and I admit that in this
case it is a fairly radical position. But I have seen too much code fail on
newer processors or newer chipsets to take a less radical position.

Use 'volatile' as a platform and compiler-specific optimization to
optimize proven bottlenecks and I won't argue with you. But don't pretend
such code is safe.

DS

Charles Bryant

unread,

Jul 4, 2003, 9:34:35 PM7/4/03

to

In article <87bded37.0307...@posting.google.com>,

Momchil Velikov <ve...@fadata.bg> wrote:
>"David Schwartz" <dav...@webmaster.com> wrote in message news:<bdvgn5$kg2$1...@nntp.webmaster.com>...

>> Making shared data 'volatile' slows down the code that holds locks, because
>> it's while you have a lock held that you access data that is or might be
>> shared.
>
>C'mon, it's trivial to copy the volatile object to a non-volatile one
>after the lock is acquired and copy it back before the lock is
>released.

It may be trivial, but that is irrelevant. Since the only reason that
such a copy needs to be made is that the programmer stupidly made the
data 'volatile', I think it is perfectly obvious to say that making it
volatile slows down the code.

--
Eppur si muove

Michael Furman

unread,

Jul 5, 2003, 4:12:44 PM7/5/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message
news:87bded37.03070...@posting.google.com...

> [...]

> That's what I'm trying to tell you - that the majority of the
> compilers define the access to a volatile qualified object in a way
> suitable to ensure certain ordering of the instructions in the
> instruction stream.

Do you know compiler(s) that do not do it? As far that I understand,
it is exact intent if creators of C (and later C++) language - tho only
ptoblem that it is impossible to formalize it in term of definition of
the language.
Regards,
Michael Furman

Michael Furman

unread,

Jul 5, 2003, 4:28:40 PM7/5/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:be3h5q$v9b$1...@nntp.webmaster.com...
> [...]

> In VC++, 'volatile' is not useful for
> thread synchronization because it has no ordering or visibility
guarantees.

I hear something so familiar to me in this argumentation.... Yes:
Soviet Russia (where I spent half of my life if not more):
"Marxism-Leninism has unlimited power because it is right".
I had to memorize this sentence (as many others) to get my "C"
at college exams... (Sorry - could not resist to share this
recollection).

> [...]

> So can we at least agree that 'volatile' is not useful unless
> when used in conjunction with something that stops the processor from
doing
> similar optimizations? That would be at least a point of agreement from
> which to start.

That is correct. What people trying to stell you is that also correct:

"something that stops the processor from doing similar optimizations

is alsu useless is not useful wit 'volatile' that stops compiler from
such optimizations"!
Do you hear it now?

> [...]

> > What's the meaning of "legal" ? It looks like you're talking about
> > some normative document ? Do you, for example, rule out kernels as
> > "legal" code ? I assure you, it's impossible to write a kernel, which
> > is "strictly conforming" or even "conforming" to the C Standard.
> > Nevertheless, there're lots of kernels out there and they work, no
> > matter if you disapprove some techniques used to make them work.
>
> If they use 'volatile' for this purpose, they work temporarily. Until
> the next build of the compiler or the next revision of the processor adds
> some new optimization. I can cite example after example of this if you'd
> like.

That could be interestion. Please do it.

> [...]

Michael Furman

Alexander Terekhov

unread,

Jul 7, 2003, 5:32:34 AM7/7/03

to

Michael Furman wrote:
[...]

> I hear something so familiar to me in this argumentation.... Yes:
> Soviet Russia (where I spent half of my life if not more):
> "Marxism-Leninism has unlimited power because it is right".
> I had to memorize this sentence (as many others) to get my "C"
> at college exams... (Sorry - could not resist to share this
> recollection).

Yeah... and Microsoft is the KPSS of the twenty first century! ;-)

regards,
alexander. < former Komsomol Committee official; elected; at school >

David Schwartz

unread,

Jul 7, 2003, 6:53:14 AM7/7/03

to

"SenderX" <x...@xxx.xxx> wrote in message
news:0XxMa.84276$R73.10535@sccrnsc04...
> > That'd be incredibly stupid, platform's synchronization primitives
> > must take care of memory visibility.
>
> If volatile access needed memory visibility, then barriers would have to
be
> used.

C9X, believe it or not, says:

"At sequence points, volatile objects are stable in the sense that previous
accesses are complete and subsequent accesses have not yet occurred."

Which seems to imply that barriers *do* have to be used. You'd have to
get pretty creative in redefining what an "access" is and what it means for
one to "complete" or "occur" to get around this statement requiring barriers
before and after sequence points accessing volatile variables.

DS

Alexander Terekhov

unread,

Jul 7, 2003, 7:51:53 AM7/7/03

to

David Schwartz wrote:
[...]

> "At sequence points, volatile objects are stable in the sense that previous
> accesses are complete and subsequent accesses have not yet occurred."
>
> Which seems to imply that barriers *do* have to be used.

Well, now read the C++ Std -- 1.9/7 and 7.1.5.1/8. And I'll admit
that it's a big mess. Essentially, they talk about accesses to
*volatile* objects. It says nothing about accesses to non-volatile
data.

> You'd have to
> get pretty creative in redefining what an "access" is and what it means for
> one to "complete" or "occur" to get around this statement requiring barriers
> before and after sequence points accessing volatile variables.

C/C++ volatiles are totally brain-damaged and shall be DEPRECATED
ASAP, I believe strongly.

regards,
alexander.

--
http://www.ibm.com/servers/eserver/linux/fun

David Schwartz

unread,

Jul 7, 2003, 6:40:40 PM7/7/03

to

"Alexander Terekhov" <tere...@web.de> wrote in message

news:3F095ED9...@web.de...

> David Schwartz wrote:

> > Which seems to imply that barriers *do* have to be used.

> Well, now read the C++ Std -- 1.9/7 and 7.1.5.1/8. And I'll admit
> that it's a big mess. Essentially, they talk about accesses to
> *volatile* objects. It says nothing about accesses to non-volatile
> data.

The biggest problem, IMO, is that the standard tries to make accesses to
volatile objects part of the observable behavior, but doesn't specify where
they are observed. For example:

volatile int i, j;
i=1;
j=2;

Not even a memory barrier can ensure that anyone observing the writes to
'i' and 'j' on the memory bus will see 'i' written before 'j'. Yet, the
standard says:

"At sequence points, volatile objects are stable in the sense that
previous
accesses are complete and subsequent accesses have not yet occurred."

This certainly implies that at the sequence point between the two
assignments, the memory write to 'i' must have occured but not the one to
'j'. Fine, but occurred WHERE?

The second biggest problem is what "at sequence points" means. It may be
easy to find the sequence point in the assembly instructions, but it may be
impossible to find it anywhere else. A P4, for example, executes some
instructions concurrently and some out of order, so what does "at sequence
points" mean?

Certainly "at sequence points" can't mean at some particular time
(because a sequence point can be reached on the bus after it's reached in
the processor). It can't seem to mean at some particular place (because
memory operations don't take place in any one place, they take place in
memory, busses, and various caches). In fact, it doesn't seem to mean
anything at all.

And the third biggest problem is what is an "access"? Is it an assembly
instruction? A bus cycle to main physical memory? Is it a cache operation?

The fourth biggest problem is how can volatile accesses be part of the
observable behavior if:

volatile short i[2]={0, 0};
i[1]=3;

The 'i[1]=3' might require a read for 'i[0]' on some platforms but not
others. If the read is defined as observable behavior, then some platforms
will have to have different observable behavior than others.

Note that none of this even touches on the issue of 'volatile's
uselessness for multithreaded code, which the standard doesn't even attempt
to address. Thank god.

DS

Alexander Terekhov

unread,

Jul 8, 2003, 8:07:55 AM7/8/03

to

David Schwartz wrote:
[...]

> Note that none of this even touches on the issue of 'volatile's
> uselessness for multithreaded code, which the standard doesn't even attempt
> to address. Thank god.

Yeah. The following might also be kinda "illuminating":

http://groups.google.com/groups?selm=38C80A2D.C2AFEFCC%40cup.hp.com
(Subject: Re: Shortcomings of C++?)

<quote>

> > - A definition of 'volatile' that matches no hardware well
> > and serves no real purpose.
>
> Could you explain?

What does 'volatile' mean? The standard says:

8 [Note: volatile is a hint to the implementation to avoid aggressive
optimization involving the object because the value of the object
might be changed by means undetectable by an implementation. See
_intro.execution_ for detailed semantics. In general, the semantics
of volatile are intended to be the same in C++ as they are in C. ]

The truth is that what matters may be:

- Do not change load/store ordering (critical for instance for database
operations)
- Preserve multiple consecutive store or loads (useful for accessing hardware
registers)
- Preserve the access size so that it matches the declared type access size.
This one is more subtle: I remember a bug in some software I wrote that accessed
memory on a realtime card from a PC through a VME back-plane. The x86 and the
VME bus use different byte-ordering, but some hardware in the interface card
made byte swapping transparent based on the access size. Too bad, even on
volatile pointers, my C++ compiler would optimize away something like
*((volatile long *) ptr) |= 1 as a byte or (rather than long or), defeating the
byte swapping hardware. Nasty bug.
- Do not access memory outside of program control (do not perform cache prefetch
operations, for instance)
- etc. etc. etc.

</quote>

Note that the upcoming <iohw.h> and <hardware> does address some issues,
I think.

http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n972.pdf
http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1430.pdf

regards,
alexander.