Is this thread-safe on multi-processor Win32?

Lee Chapman

unread,

May 16, 2003, 7:20:41 AM5/16/03

to

Hi guys and gals. I want a cheap C++ class that I can use to easily access
the Ansi or Unicode version of a string on Win32.

I've come up with the following, where the only overhead in making the class
thread-safe is initialization of a long (m_lock).

I was wondering if anyone would be kind enough to give it a quick once-over
to see if I've made any obvious mistakes w.r.t. to its thread safety. The
code only has to be safe to use in a mutli-threaded process on a
multi-processor machine, but only under Win32. I'm assuming that in such
scenarios, the reading and writing of char *, wchar_t * and long variables
is atomic, and that the interlocked functions are providing all the
necessary memory barriers to keep it working on a multi-processor machine.

Thanks in advance,
- Lee

class String
{
private:
char * volatile m_pAnsi;
wchar_t * volatile m_pUni;
long volatile m_lock;

public:
String()
{
m_pAnsi = NULL;
m_pUni = NULL;
m_lock = 0;
}

String(char * pAnsi)
{
m_pAnsi = _strdup(pAnsi);
m_pUni = NULL;
m_lock = 0;
}

String(wchar_t * pUni)
{
m_pAnsi = NULL;
m_pUni = wcsdup(pUni);
m_lock = 0;
}

~String()
{
free(m_pAnsi);
free(m_pUni);
}

operator const char *()
{
if (m_pAnsi == NULL && m_pUni != NULL)
{
if (InterlockedCompareExchange(&m_lock, 1, 0) == 0)
{
int length = (int)wcslen(m_pUni);
int size = WideCharToMultiByte(CP_ACP, 0, m_pUni, length, NULL, 0, NULL,
NULL);
m_pAnsi = (char *)malloc(size);
WideCharToMultiByte(CP_ACP, 0, m_pUni, length, m_pAnsi, 0, NULL, NULL);

InterlockedExchange(&m_lock, -1);
}
else
{
while (m_lock >= 0)
{
SleepEx(0, FALSE);
}
}
}

return m_pAnsi;
}

operator const wchar_t *()
{
if (m_pUni == NULL && m_pAnsi != NULL)
{
if (InterlockedCompareExchange(&m_lock, 1, 0) == 0)
{
int length = (int)strlen(m_pAnsi);
int size = MultiByteToWideChar(CP_ACP, 0, m_pAnsi, length, NULL, 0);
m_pUni = (wchar_t *)malloc(size);
MultiByteToWideChar(CP_ACP, 0, m_pAnsi, length, m_pUni, 0);

InterlockedExchange(&m_lock, -1);
}
else
{
while (m_lock >= 0)
{
SleepEx(0, FALSE);
}
}
}

return m_pUni;
}

Alexander Terekhov

unread,

May 16, 2003, 7:55:57 AM5/16/03

to

Lee Chapman wrote:
>
> Hi guys and gals. I want a cheap C++ class that I can use to easily access
> the Ansi or Unicode version of a string on Win32.
>
> I've come up with the following, where the only overhead in making the class
> thread-safe is initialization of a long (m_lock).

AFAICS, your solution is totally broken. It seems that what you need
is a "dynamic" pthread_once() that would allow you to "copy-construct"
an "extra representation" upon request if it doesn't match the initialy
constructed/provided one. Welcome to The-"dynamic-pthread_once()"-Club.

regards,
alexander.

Alexander Terekhov

unread,

May 16, 2003, 8:24:00 AM5/16/03

to

I forgot one thing. Joe, please don't confuse him with totally
braindamaged MS-interlocked stuff meant to provide

class stuff { /* ... */ };

class thing {
public:
thing(/* ... */) : stuff_ptr(0) /* ... */ { /*...*/ }
~thing { delete stuff_ptr.load(); /* ...*/ }
/* ... */
const stuff & stuff_instance();
/* ... */
private:
/* ... */
atomic<const stuff*> stuff_ptr;
/* ... */
};

const stuff & thing::stuff_instance() { // "lazy" one
stuff * ptr;
// hoist load barrier (with data dependency "hint")
if (0 == (ptr = stuff_ptr.load_ddhlb())) {
ptr = new stuff(/*...*/);
// sink store barrier
if (!stuff_ptr.attempt_update_ssb(ptr, 0)) {
delete ptr;
// hoist load barrier (with data dependency "hint")
if (0 == (ptr = stuff_ptr.load_ddhlb()))
abort();
}
}
return *ptr;
}

;-)

regards,
alexander.

Lee Chapman

unread,

May 16, 2003, 9:09:19 AM5/16/03

to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3EC4D1CD...@web.de...

>
> AFAICS, your solution is totally broken.
>

That sentence I understand. I don't understand what you've seen that's led
you to that conclusion, but at least I understand what you're saying... :)

> It seems that what you need is a "dynamic" pthread_once() that
> would allow you to "copy-construct" an "extra representation" upon
> request if it doesn't match the initialy constructed/provided one.
> Welcome to The-"dynamic-pthread_once()"-Club.

*blink*

Okay... "pthread_once()" is something to do with the POSIX standard, right?
If so, what relevance does it have to my Win32 implementation?

If the solution is totally broken under Win32, can you give me some hints as
to why?

Thanks,
- Lee

Alexander Terekhov

unread,

May 16, 2003, 9:50:28 AM5/16/03

to

Lee Chapman wrote:
>
> "Alexander Terekhov" <tere...@web.de> wrote in message
> news:3EC4D1CD...@web.de...
> >
> > AFAICS, your solution is totally broken.
> >
>
> That sentence I understand. I don't understand what you've seen that's led
> you to that conclusion,

That's way too long story. ;-)

> but at least I understand what you're saying... :)
>
> > It seems that what you need is a "dynamic" pthread_once() that
> > would allow you to "copy-construct" an "extra representation" upon
> > request if it doesn't match the initialy constructed/provided one.
> > Welcome to The-"dynamic-pthread_once()"-Club.
>
> *blink*
>
> Okay... "pthread_once()" is something to do with the POSIX standard, right?

Right. pthreads-win32**, for example, "provides" one for win32.
It's also totally broken (no one notices it, so it's fine ;-) ).

> If so, what relevance does it have to my Win32 implementation?

The relevance is that pthread_once() is a MECHANISM to perform
thread safe lazy initialization. Well, currently, it's a "C" thing
that works for static stuff only. It shall be 'extended' to support
something along the lines of <http://tinyurl.com/7w7r> (see sort of
"illustrations" embedded in that message).

>
> If the solution is totally broken under Win32, can you give me some hints as
> to why?

Race conditions. Dying "spinlocks". Lack of memory synchronization.

regards,
alexander.

**) http://sources.redhat.com/pthreads-win32

--
"Pthreads win32 is just trying to be a general pupose condition
variable as defined by the pthreads standard, suitable for any
and all threads to go around calling pthread_cond_wait() and
pthread_cond_signal() on. As a consequence the implementation
contains several mutexes, semaphores etc, and a flow of control
that will make your brains dribble out of your ears if you stare
at the code too long (I have the stains on my collar to prove it
;-). Such are the joys of implementing condition variables using
the Win32 synchronization primitives!"

-- <http://tinyurl.com/b9vw>

Lee Chapman

unread,

May 16, 2003, 9:51:34 AM5/16/03

to

"Alexander Terekhov" <tere...@web.de> wrote in message

news:3EC4D860...@web.de...
>
> atomic<const stuff*> stuff_ptr;
>

What's the definition of the atomic<> template? It's not one I'm familiar
with. However, I don't see why I would need it using Visual C++ under Win32.
The documentation that comes with Visual C++ 7.0 states that "simple reads
and writes to properly-aligned 32-bit variables are atomic" and that
"without __declspec(align(#)), Visual C++ aligns data on natural boundaries
based on the size of the data, for example 4-byte integers on 4-byte
boundaries and 8-byte doubles on 8-byte boundaries".

> const stuff & thing::stuff_instance() { // "lazy" one
> stuff * ptr;
> // hoist load barrier (with data dependency "hint")
> if (0 == (ptr = stuff_ptr.load_ddhlb())) {
> ptr = new stuff(/*...*/);
> // sink store barrier
> if (!stuff_ptr.attempt_update_ssb(ptr, 0)) {
> delete ptr;
> // hoist load barrier (with data dependency "hint")
> if (0 == (ptr = stuff_ptr.load_ddhlb()))
> abort();
> }
> }
> return *ptr;
> }

A quick search on "ddhlb" has given me "Data dependency hoist load barrier"?
So I assume the atomic template is providing some sort of memory barrier
around access to the pointer. Well, I don't really see how this differs from
my use of the interlocked Win32 functions. Once again I'm relying on the
Visual C++ documentation, which states that interlocked functions will
"ensure that previous read and write requests have completed and are made
visible to other processors, and to ensure that that no subsequent read or
write requests have started", effectively avoiding a multiprocessor race
condition.

However, if I was that confident I wouldn't have posted the original
question, right... so what have I misinterpreted and got wrong?

Thanks,
- Lee

Lee Chapman

unread,

May 16, 2003, 10:04:42 AM5/16/03

to

"Alexander Terekhov" <tere...@web.de> wrote in message

news:3EC4ECA4...@web.de...

>
> > Okay... "pthread_once()" is something to do with the POSIX standard,
> > right?
>
> Right. pthreads-win32**, for example, "provides" one for win32.
> It's also totally broken (no one notices it, so it's fine ;-) ).
>
> > If so, what relevance does it have to my Win32 implementation?
>
> The relevance is that pthread_once() is a MECHANISM to perform

> thread safe lazy initialization. [...]

Okay, now I'm with you. I'm using the standard MS Win32 threads on W2K, so I
can't use pthread_once() even if I wanted to (and it sounds like I don't).

> > If the solution is totally broken under Win32, can you give me some
> > hints as to why?
>
> Race conditions. Dying "spinlocks". Lack of memory synchronization.

I've picked up some of these points in another reply, so I won't ask about
them here as well.

- Lee

Alexander Terekhov

unread,

May 16, 2003, 10:11:32 AM5/16/03

to

Lee Chapman wrote:
>
> "Alexander Terekhov" <tere...@web.de> wrote in message
> news:3EC4D860...@web.de...
> >
> > atomic<const stuff*> stuff_ptr;
> >
>
> What's the definition of the atomic<> template?

I have yet to write it. I'm currently in "gathering info" stage.
You'll have to wait. You can begin with reading this:

http://www.terekhov.de/pthread_refcount_t/draft-edits.txt

and try to deduct the semantics from a non-blocking implementation
of pthread_refcount_t based on atomic<> that can be found at
<http://tinyurl.com/bwkj> and info at <http://tinyurl.com/bx6u>.

regards,
alexander.

Alexander Terekhov

unread,

May 19, 2003, 2:41:46 AM5/19/03

to

"Oliver S." wrote:
[...]
> There are some bugs in your class I won't comment, but I re-implemented
> the class in two flavours so that they should work (mistakes you're able
> to fix for sure are still possible, but the general principle is solid):

While your "re-implemented class in the two flavours" is a bit less
broken than the OP-stuff, it's still totally broken, I'm afraid. The
general principles that you've demonstrated aren't solid at all. Your
"first flavor" is meant to be something along the lines of (variant
using TSD instead of atomic<> aside for a moment):

class stuff { /* ... */ };

class thing {
public:
thing(/* ... */) : stuff_ptr(0) /* ... */ { /*...*/ }
~thing { delete stuff_ptr.load(); /* ... */ }
/* ... */
const stuff & stuff_instance();
/* ... */
private:
/* ... */
atomic<const stuff*> stuff_ptr;

mutex stuff_mtx;
/* ... */
};

const stuff & thing::stuff_instance() { // "lazy" one
stuff * ptr;
// hoist load barrier (with data dependency "hint")
if (0 == (ptr = stuff_ptr.load_ddhlb())) {

mutex::guard guard(stuff_mtx);
if (0 == (ptr = stuff_ptr.load())) {

ptr = new stuff(/*...*/);
// sink store barrier

stuff_ptr.store_ssb(ptr);
}
}
return *ptr;
}

Well, as for your "second flavor", you could have found this:

http://groups.google.com/groups?selm=3EC4D860.BC32E880%40web.de

regards,
alexander.

SenderX

unread,

May 19, 2003, 3:42:28 AM5/19/03

to

This pseudo-code seems like it sould work to instance something for an
IA-32:

/* Shared pointer to a C_Object */
typedef union U_Ptr
{
unsigned __int64 Value64;

struct
{
C_Object *pObj;
LONG lAba;
};

} PTR, *LPPTR;

static C_Object *pSharedPtr = NULL;

/* Instance the shared pointer */
C_Object& Instance()
{
C_Object pLocalPtr, pOldPtr;

/* Read shared pointer */
pLocalPtr
= InterlockedCompareExchangePointer
( &pSharedPtr,
NULL,
NULL );

__asm { mfence };

/* CAS returns the old value, so if it is NULL
another thread has not instanced */
if ( pLocalPtr == NULL )
{
pLocalPtr = new C_Object;

/* Try and update shared pointer */
pOldPtr
= InterlockedCompareExchangePointer
( &pSharedPtr,
pLocalPtr,
NULL );

__asm { mfence };

/* CAS returns the old value, so if it not NULL
another thread instanced */
if ( pOldPtr != NULL )
{
delete pLocalPtr;

return *pOldPtr;
}
}

return *pLocalPtr;
}

I know Alex will say its wrong, but please tell us EXACTLY where and why?

=)

--
The designer of the SMP and HyperThread friendly, AppCore library.

http://AppCore.home.attbi.com

Alexander Terekhov

unread,

May 19, 2003, 4:04:56 AM5/19/03

to

SenderX wrote:
[...]

> I know Alex will say its wrong, but please tell us EXACTLY where and why?

Neither "__asm { mfence };" nor totally braindamaged MS-interlocked
stuff define anything specific with respect to memory access reordering
done by the COMPILER (which, absent reordering constraints, ought to
operate on "as if"-compiling-single-threaded/thread-neutral-code basis).

regards,
alexander.

SenderX

unread,

May 19, 2003, 4:19:45 AM5/19/03

to

> Neither "__asm { mfence };" nor totally braindamaged MS-interlocked
> stuff define anything specific with respect to memory access reordering
> done by the COMPILER

What about writing the lock-free instance code in pure asm?

Alexander Terekhov

unread,

May 19, 2003, 4:47:32 AM5/19/03

to

SenderX wrote:
>
> > Neither "__asm { mfence };" nor totally braindamaged MS-interlocked
> > stuff define anything specific with respect to memory access reordering
> > done by the COMPILER
>
> What about writing the lock-free instance code in pure asm?

and a "full-stop" compiler barrier ala gcc's "memory"/"volatile"**
[also braindead; to some extent] stuff?

Well, y'know, I just hate "asm" (both "pure" and not-so-"pure"). YMMV.

regards,
alexander.

**) "If your assembler instruction modifies memory in an unpredictable
fashion, add /memory/ to the list of clobbered registers. This will
cause GCC to not keep memory values cached in registers across the
assembler instruction. You will also want to add the volatile keyword
if the memory affected is not listed in the inputs or outputs of the
asm, as the memory clobber does not count as a side-effect of the
asm."

SenderX

unread,

May 19, 2003, 6:15:58 AM5/19/03

to

Well, this has to work then ;)

volatile C_Object *pSharedPtr = NULL;

/* Instance the shared pointer */
C_Object& Instance()
{

volatile C_Object *pLocalPtr, pOldPtr;

/* Read shared pointer */
pLocalPtr
= InterlockedCompareExchangePointer
( &pSharedPtr,
NULL,
NULL );

__asm { mfence };

/* CAS returns the old value, so if it is NULL
another thread has not instanced */
if ( pLocalPtr == NULL )
{
pLocalPtr = new C_Object;

/* Try and update shared pointer */
pOldPtr
= InterlockedCompareExchangePointer
( &pSharedPtr,
pLocalPtr,
NULL );

__asm { mfence };

/* CAS returns the old value, so if it not NULL
another thread instanced */
if ( pOldPtr != NULL )
{
delete pLocalPtr;

return *pOldPtr;
}
}

return *pLocalPtr;
}

Why won't this work, and what will break it?

A weak memory model, the compiler... Or both?

Alexander Terekhov

unread,

May 19, 2003, 7:02:10 AM5/19/03

to

SenderX wrote:

[... volatile/Intel-mfence/MS-interlocked ...]

> Why won't this work, and what will break it?
>
> A weak memory model, the compiler... Or both?

Yeah, both: Intel and Microsoft. ;-) Seriously, look, brain-
dead volatile has an implementation defined semantics ["What
constitutes an access to an object that has volatile-qualified
type is implementation-defined"]. The NewHP (the old Digital)
uses it to fight word tearing, for example. Show me some
statements in the MS docs that would guarantee the semantics
that you need here and I'll concede that it "might work". ;-)
(but it's still braindamaged because what's needed here is the
upcoming atomic<>/threads_specific_ptr<>, extended "dynamic"
pthread_once() aside for a moment).

regards,
alexander.

Lee Chapman

unread,

May 19, 2003, 8:35:15 AM5/19/03

to

"Alexander Terekhov" <tere...@web.de> wrote in message

news:3EC89028...@web.de...

Okay, so as I'm using MS VC++ 7.0 I can put #pragma optimize("", off) before
my class definition; this will prevent the compiler reordering my C++
statements, and so should fix my original implementation?

- Lee

Alexander Terekhov

unread,

May 19, 2003, 10:03:13 AM5/19/03

to

Maybe (but not the original one, that's for sure).

regards,
alexander.

--
The information on this page is for reference only and is not to be
construed as providing binding legal advice. Consult a lawyer if you
have questions.

Alexander Terekhov

unread,

May 30, 2003, 3:47:50 PM5/30/03

to

"Oliver S." wrote:
>
> > While your "re-implemented class in the two flavours" is a bit less

> > broken than the OP-stuff, it's still totally broken, ...
>
> Where is it broken ?

SenderX, would you please explain (but wait with your reply no less
than two weeks ;-) ).

> In it's general principle it's 100% correct.

Dream on.

regards,
alexander.

Alexander Terekhov

unread,

May 31, 2003, 10:12:54 AM5/31/03

to

"Oliver S." wrote:
>
> >> In it's general principle it's 100% correct.
>
> > Dream on.
>

> Typical troll-response for someone who hasn't any arguments.

If you need arguments then go and {re-}read the entire thread, comrade.

regards,
alexander.

P.S. Ziv Caspi might also help you (trivial races in your "100% correct"
solution aside, he knows MS-interlocked braindamage quite well).

SenderX

unread,

May 31, 2003, 1:30:32 PM5/31/03

to

> SenderX, would you please explain (but wait with your reply no less
> than two weeks ;-) ).

I can't seem to find your " two-flavors " in google. But, I would trust
Alex, and say their broken...

However Oliver, I did get Alex to admit the the following " might " work:

Lock-Free Instance Code for...

Compiler: VC++ 6.0

Compiler docs volatile quote:

" Objects declared as volatile are not used in optimizations because their
value can change at any time. The system always reads the current value of a
volatile object at the point it is requested, even if the previous
instruction asked for a value from the same object. Also, the value of the
object is written immediately on assignment. "

Compiler docs pragma( optimize ) quote:

" Using the optimize pragma with the empty string ("") is a special form of
the directive. It either turns off all optimizations or restores them to
their original (or default) settings. "

The Code: ;)

/* The shared pointer */
volatile C_Object *pSharedObj = NULL;

/* Turn off compiler optimizer */
#pragma optimize( "", off )

/* Instance the object */
C_Object&

InstanceObject()
{
/* Load the shared object */
volatile *pLocalPtr
= InterlockedCompareExchangePointer
( &pSharedObj,
NULL,
NULL );

/* Acquire the load */
__asm { lfence };

/* If NULL, there is no object. */

if ( pLocalPtr == NULL )
{

/* Create a new object */
volatile *pNewPtr = new C_Object;

/* Load the shared object again,
and try and store the new value */
pLocalPtr =
InterlockedCompareExchangePointer
( &pSharedObj,
pNewPtr,
NULL );

/* Release the store, and acquire the load */
__asm { mfence };

/* If NULL, the object has been updated */

if ( pLocalPtr == NULL )
{

return *pNewPtr;
}
}

/* We got it! */
return *pLocalPtr;
}

/* Turn on compiler optimizer */
#pragma optimize( "", on )

This should ONLY work for the VC++ 6.0 compiler, and maybe higher.

Your comments?

;)

SenderX

unread,

May 31, 2003, 6:35:29 PM5/31/03

to

> Forget the fence-instruction;
> it's used for weakly-ordered memory-access instructions.

You will most likely need memory barriers for the sample code I posted to
work on weak-order processors, like an Itanium II.

volatiles are for compiler orders, fences are for instruction orders.

Correct?

David Schwartz

unread,

May 31, 2003, 8:09:55 PM5/31/03

to

"SenderX" <x...@xxx.xxx> wrote in message
news:R6aCa.1085689$S_4.1097003@rwcrnsc53...

> > Forget the fence-instruction;
> > it's used for weakly-ordered memory-access instructions.
>
> You will most likely need memory barriers for the sample code I posted to
> work on weak-order processors, like an Itanium II.
>
> volatiles are for compiler orders, fences are for instruction orders.
>
> Correct?

The documentation for VC++ says:

"Objects declared as volatile are not used in optimizations because their
value can change at any time. The system always reads the current value of a
volatile object at the point it is requested, even if the previous
instruction asked for a value from the same object. Also, the value of the
object is written immediately on assignment."

This, at least to me, implies that 'volatile' must defeat weak ordering
with the appropriate fences. It says it's read "at the point it is
requested" and it says it's written "immediately on assignment". If this
allows weak ordering, then the documentation is incorrect or, at best,
highly misleading.

DS

SenderX

unread,

May 31, 2003, 8:35:02 PM5/31/03

to

> This, at least to me, implies that 'volatile' must defeat weak
ordering
> with the appropriate fences. It says it's read "at the point it is
> requested" and it says it's written "immediately on assignment". If this
> allows weak ordering, then the documentation is incorrect or, at best,
> highly misleading.

volatile doesn't stop the processor from reordering the instructions, it
only stops the compiler.

You will need fences for this on weak-order systems, even with volatile
vars.

David Schwartz

unread,

May 31, 2003, 11:57:27 PM5/31/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:WSbCa.528169$Si4.4...@rwcrnsc51.ops.asp.att.net...

> > This, at least to me, implies that 'volatile' must defeat weak
> > ordering
> > with the appropriate fences. It says it's read "at the point it is
> > requested" and it says it's written "immediately on assignment". If this
> > allows weak ordering, then the documentation is incorrect or, at best,
> > highly misleading.

> volatile doesn't stop the processor from reordering the instructions, it
> only stops the compiler.

Are you saying it can't or that it doesn't? It certainly can -- the
compiler could, for example, wrap accesses to volatile variables in the
appropriate fences.

> You will need fences for this on weak-order systems, even with volatile
> vars.

Then the documentation is lying, which doesn't surprise me.

DS

SenderX

unread,

Jun 1, 2003, 12:23:38 AM6/1/03

to

> It certainly can -- the
> compiler could, for example, wrap accesses to volatile variables in the
> appropriate fences.

Your compiler would have to know which processor it was building for, and
put the correct barrier opcodes in order to do that.

I as the programmer, I want to be in control of the fence instructions.

> Then the documentation is lying, which doesn't surprise me.

The docs are not lying at all.

They ONLY talk about what the compiler is going to do. Period.

It doesn't talk about what an Itanium II processor is going to do with those
instructions.

David Schwartz

unread,

Jun 1, 2003, 12:49:51 AM6/1/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:edfCa.2083$DV....@rwcrnsc52.ops.asp.att.net...

> > It certainly can -- the
> > compiler could, for example, wrap accesses to volatile variables in the
> > appropriate fences.

> Your compiler would have to know which processor it was building for, and
> put the correct barrier opcodes in order to do that.

If the compiler doesn't know what processor it's building for, it has a
*very* serious problem! The compiler generates assembler output, so it had
better know all the semantics of the assembly language it's targetting!

> I as the programmer, I want to be in control of the fence instructions.

Then don't ask the compiler to provide you ordering guarantees. The
documentation claims 'volatile' provides such guarantees. (But I think it's
in error.)

> > Then the documentation is lying, which doesn't surprise me.

> The docs are not lying at all.
>
> They ONLY talk about what the compiler is going to do. Period.
>
> It doesn't talk about what an Itanium II processor is going to do with
those
> instructions.

There is no such distinction. When you write C code, you are writing for
the compiler. You aren't supposed to have to care what the processor does,
you're supposed to be able to rely upon the compiler to make the processor
do the right thing.

DS

SenderX

unread,

Jun 1, 2003, 2:57:46 AM6/1/03

to

> There is no such distinction. When you write C code, you are writing
for
> the compiler. You aren't supposed to have to care what the processor does,
> you're supposed to be able to rely upon the compiler to make the processor
> do the right thing.

I do care what the processor does, I have to.

You sure you know what your talking about?

SenderX

unread,

Jun 1, 2003, 5:49:45 AM6/1/03

to

> If the compiler doesn't know what processor it's building for, it has
a
> *very* serious problem!

It needs to know the instruction sets it can compile, then the processor.

Although...

Itanium processors say they rely heavily on compilers written specifically
to take care of stuff that the previous Intel chips have taken care of.

Look through Intel's site for this info.

I am not sure if Itanium processors will accept the current SSE or SSE2
opcodes.

By the way ( to clarify ), your saying that VC++ will add fences for the
following pseudo code:

volatile LPVOID *pSharedPtr;

volatile LPVOID *pLocalPtr;

load:

pLocalPtr = pSharedPtr;

store:

pSharedPtr = pLocalPtr;

and change it to this:

load:

pLocalPtr = pSharedPtr;

__asm { lfence };

store:

pSharedPtr = pLocalPtr;

__asm { sfence };

???

I don't think a compiler would do this?

David Schwartz

unread,

Jun 1, 2003, 1:20:51 PM6/1/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:Z_jCa.532679$Si4.4...@rwcrnsc51.ops.asp.att.net...

> By the way ( to clarify ), your saying that VC++ will add fences for the
> following pseudo code:

> volatile LPVOID *pSharedPtr;
> volatile LPVOID *pLocalPtr;
> load:
> pLocalPtr = pSharedPtr;
> store:
> pSharedPtr = pLocalPtr;
>
> and change it to this:
>
> load:
> pLocalPtr = pSharedPtr;
> __asm { lfence };
> store:
> pSharedPtr = pLocalPtr;
> __asm { sfence };

> I don't think a compiler would do this?

The documentation specifically claims that 'volatile' localizes the
actual variable access to the code that access it. I pasted you the section.
If that's the only way to provide those guarantees, then the compiler had
better do that. Otherwise, either the compiler or the documentation is
broken.

DS

David Schwartz

unread,

Jun 1, 2003, 1:17:53 PM6/1/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:KthCa.1088771$S_4.1100152@rwcrnsc53...

> > There is no such distinction. When you write C code, you are writing
> > for
> > the compiler. You aren't supposed to have to care what the processor
does,
> > you're supposed to be able to rely upon the compiler to make the
processor
> > do the right thing.

> I do care what the processor does, I have to.

You are certainly *allowed* to care what the processor does. But if you
*have* to, then something is wrong.

> You sure you know what your talking about?

Yes, I am most certainly sure what I'm talking about.

If the compiler documentation says that 'volatile' does something, then
the compiler had better emit whatever assembler code is necessary to do that
something. If the compiler could emit code to do what the documentation says
but does not, then either the documentation or the compiler is broken.

The documentation says that 'volatile' provides ordering guarantees. If
the compiler does not emit the necessary assembly instructions to comply
with those guarantees on any processor the compiler claims to support, then
either the documentation or the compiler is broken.

DS

SenderX

unread,

Jun 2, 2003, 12:58:25 AM6/2/03

to

I think you are totally confusing compiler orderings, with processor
orderings.

I think a java or other high-level language compiler, has to put auto
acquire / release on volatile, but not a C compiler.

If my VC++ 6.0 stuck fences opcodes in my code, I would be pissed.

;)

SenderX

unread,

Jun 2, 2003, 1:02:26 AM6/2/03

to

> The documentation specifically claims that 'volatile' localizes the
> actual variable access to the code that access it. I pasted you the
section.

You quoted the section to me?

More like, I quoted the section on volatile and pragma( optimize ) for
Oliver S. when I posted the lock-free instance once code. Which started this
discussion in the first place:

http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=bbdc
lk%24clm%241%40nntp.webmaster.com&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-
8%26oe%3DUTF-8%26group%3Dcomp.programming.threads

I know this is very knit-picky, but I was a little curious.

;)

Alexander Terekhov

unread,

Jun 2, 2003, 2:37:26 AM6/2/03

to

David Schwartz wrote:
[...]

> The documentation says that 'volatile' provides ordering guarantees. If
> the compiler does not emit the necessary assembly instructions to comply
> with those guarantees on any processor the compiler claims to support, then
> either the documentation or the compiler is broken.

The only thing that is really broken here is C/C++ volatile. What you're
talking about is nothing but revised Java (or C#) volatiles that provide
certain atomicity and ordering guarantees. I don't like it. More info on
this can be found (follow the links ;-) ) in the following msg of mine:

http://groups.google.com/groups?selm=3ED8AC69.A75E9B4F%40web.de
(Subject: Re: What is a "POD" type?)

regards,
alexander.

Ziv Caspi

unread,

Jun 2, 2003, 5:32:48 AM6/2/03

to

Neither an explicit memory fence nor a call to library functions (in
our case, InterlockedWhatever) will deny the compiler the "right" to
reorder memory accessed. So far so true.

However, the language already provides another mechanism for doing
just that -- the volatile access modifier. By combining (say)
interlocked operations (which indicate volatile access to their
"active" argument, *and* have an implicit barrier semantic) and other
volatile variables/accesses you can get what you need.

(Thus, if you write:
volatile LONG x,y; IntelockedExchange( &x, 0 ); y = 0;
and use a similar barrier/volatile combination when reading x/y,
you're safe.)

For example, one could implement a Win32 equivalent to pthread_once in
this manner.

Ziv.

SenderX

unread,

Jun 2, 2003, 4:59:26 AM6/2/03

to

> (Thus, if you write:
> volatile LONG x,y; IntelockedExchange( &x, 0 ); y = 0;
> and use a similar barrier/volatile combination when reading x/y,
> you're safe.)
>
> For example, one could implement a Win32 equivalent to pthread_once in
> this manner.

Like the sample I posted:

http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3EDA
F0A6.ADE3F213%40web.de&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUT
F-8%26group%3Dcomp.programming.threads

I think this should work for VC++ 6.0?

If not, where is it broken?

Alexander Terekhov

unread,

Jun 2, 2003, 6:33:27 AM6/2/03

to

Ziv Caspi wrote:
[...]

> >Neither "__asm { mfence };" nor totally braindamaged MS-interlocked
> >stuff define anything specific with respect to memory access reordering
> >done by the COMPILER (which, absent reordering constraints, ought to
> >operate on "as if"-compiling-single-threaded/thread-neutral-code basis).
>
> Neither an explicit memory fence nor a call to library functions (in
> our case, InterlockedWhatever) will deny the compiler the "right" to
> reorder memory accessed. So far so true.
>
> However, the language already provides another mechanism for doing
> just that -- the volatile access modifier.

Lack of atomicity aside for a moment, that would be true IFF *ALL* your
C/C++ objects were designated as volatile to preclude any reordering
(e.g. speculative loading of non-volatile objects) by the compiler, to
begin with (hardware barriers and "what constitutes an access to an
object that has volatile-qualified type is implementation-defined" bit
aside for a moment). C/C++ volatiles are brain-dead and revised Java
(and C#) ones aren't really the best way to do it in C/C++. atomic<>
template (and macros for plain C) ala the upcoming (see C++ TR) <iohw.h>
and <hardware> would solve the problem in much more efficient and
"elegant" way, so to speak. *All* current uses of C/C+ volatiles shall
be "deprecated" in favor of:

a) atomic<> for low level non-blocking stuff with mem.sync.,

b) exceptions that shall replace setjmp/longjmp silliness (I mean
volatile auto stuff "that could be modified between the two
return from setjmp()"),

c) threads (SIGEV_THREAD delivery and sig{timed}wait{info}())
that shall replace async.signals and volatile sig_atomic_t
static ugliness.

> By combining (say)
> interlocked operations (which indicate volatile access to their
> "active" argument, *and* have an implicit barrier semantic) and other
> volatile variables/accesses you can get what you need.

Dream on.

regards,
alexander.

SenderX

unread,

Jun 2, 2003, 7:00:30 AM6/2/03

to

> MS-interlocked braindamage quite well).

I assume your atomic<> template can do anything the Interlocked API's can
do, but in a non-brain damaged way. Your atomic<> provides a function that
can compare and swap long's correct?

=)

If so... I really want to use your atomic<> template for my upcoming
portable lock-free API library, using my brand new algo which I recently
posted. It is working out just great.

I believe your library ( standards ) could make my lock-free stuff very port
very well.

Where can we download your library? Is it even ready to be released yet?

Alexander Terekhov

unread,

Jun 2, 2003, 8:20:39 AM6/2/03

to

SenderX wrote:
>
> > MS-interlocked braindamage quite well).
>
> I assume your atomic<> template can do anything the Interlocked API's can
> do, but in a non-brain damaged way.

Yeah, that's the intent.

> Your atomic<> provides a function that
> can compare and swap long's correct?

It certainly provides std::numeric_limits<>-like specializations
with "bool attempt_update(T old_value, T new_value)" (or something
like that) member function for the scalar types (T) that can be
manipulated atomically. I'm not sure this is what you call "swap
long's". Well, you could certainly build yourself something like
"template<typename T> T get_and_set(atomic<T>&, T new_value);"
that would return previous [old] value, but this is probably not
what you need, correct?

[...]

> Where can we download your library? Is it even ready to be released yet?

Nope. "follow the links"

http://groups.google.com/groups?selm=3ED90AE4.A211756B%40web.de
(Subject: Re: shared_ptr/weak_ptr and thread-safety)

regards,
alexander.

SenderX

unread,

Jun 2, 2003, 9:13:18 AM6/2/03

to

> Well, you could certainly build yourself something like
> "template<typename T> T get_and_set(atomic<T>&, T new_value);"
> that would return previous [old] value, but this is probably not
> what you need, correct?

bool attempt_update(T old_value, T new_value);

Does old_value get compared to the dest value, and if they match the dest
value is updated with the new_value?

If so, then I could use that.

Just to clarify...

I need this " EXACT " functionality, in order for it to work:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas
e/interlockedcompareexchange.asp

So, I would need something like this:

long lDest = 0;

atomic< long > AtomicLong( &lDest );

long lOldValue;

/* Atomic update */
do
{
lOldValue = AtomicLong.read_with_acquire();
}

while( ! AtomicLong.attempt_update_with_release
( lOldValue,
lOldValue + 1 ) );

Can your lib do that?

It would be very nice if your library would all me to code a single atomic
base for my new lock-free algo system that I built. I think it will probably
workout.

=)

--
The designer of the SMP and HyperThread friendly, AppCore library.

http://AppCore.home.attbi.com

"Alexander Terekhov" <tere...@web.de> wrote in message

news:3EDB411...@web.de...

Ziv Caspi

unread,

Jun 2, 2003, 11:19:32 AM6/2/03

to

On Sat, 31 May 2003 17:09:55 -0700, "David Schwartz"
<dav...@webmaster.com> wrote:

> The documentation for VC++ says:
>
>"Objects declared as volatile are not used in optimizations because their
>value can change at any time. The system always reads the current value of a
>volatile object at the point it is requested, even if the previous
>instruction asked for a value from the same object. Also, the value of the
>object is written immediately on assignment."
>
> This, at least to me, implies that 'volatile' must defeat weak ordering
>with the appropriate fences.

I've read this section a couple of times, and still don't understand
how your conclusion follows. What I infer from the quote above (as
well as some experience in the area, plus some discussions with the
people who built the compiler) is that when you use volatile, you
essentially tell the compiler "do what I say" -- when you set a value
to a volatile object, the compiler generates the store instruction in
the same program location you asked for.

>It says it's read "at the point it is
>requested" and it says it's written "immediately on assignment". If this
>allows weak ordering, then the documentation is incorrect or, at best,
>highly misleading.

The model you seem to be reasoning along is that of a single
synchronization point (probably the RAM) which all reads and writes
are measured against. This model is not followed by modern processors
for some time now, and is being replaced by cross-processor
observability guarantees (if you put a fence between writing to a and
writing to b on processor A, and you have a fence when reading them on
processor B, then you're guaranteed to read this value if you read
that value).

The compiler here is, in general, unable to help further. Compilers
usually target a variety of processor models (Pentium, P-whatever...)
and generate code that works on them all. Furthermore, the compiler
doesn't know when it creates programs whether they are run on
multiprocessor machines. As added bonus, I believe some processors can
switch their ordering behavior depending on the area of memory you
access. All these are in the domain of the platform, not the compiler.
If the compiler were to generate "always working" code, you wouldn't
want to use it anyway.

Ziv

Lee Chapman

unread,

Jun 2, 2003, 10:42:05 AM6/2/03

to

> > The documentation for VC++ says:
> >

My eventual intepretation was:

a) 'volatile' can be used to prevent the compiler from reordering reads &
writes;

b) the Win32 Interlocked functions can be used to stop the hardware from
effectively reordering reads & writes on a multi-processor machine. (The MS
documentation states that the OS's implementation of these Interlocked
functions contains the neccesssary memory barriers et. al.)

i.e. You need both.

I appreciate from the contributors to this thread that there are many subtle
problems that can arise in the general case, but with VC++ 7.0, W2K and a
quad Pentium box, it works in practice, and that's all I care about. :)

- Lee

SenderX

unread,

Jun 2, 2003, 10:50:08 AM6/2/03

to

> b) the Win32 Interlocked functions can be used to stop the hardware from
> effectively reordering reads & writes on a multi-processor machine.

Yes, or you can use the target processors acquire / release barrier opcodes
directly.

> i.e. You need both.

Yes, interlocked and/or memory barriers.

> I appreciate from the contributors to this thread that there are many
subtle
> problems that can arise in the general case, but with VC++ 7.0, W2K and a
> quad Pentium box, it works in practice, and that's all I care about. :)

Works with VC++ 6.0 as well.

;)

David Schwartz

unread,

Jun 2, 2003, 3:03:10 PM6/2/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:CTACa.16331$DV.2...@rwcrnsc52.ops.asp.att.net...

> > The documentation specifically claims that 'volatile' localizes the
> > actual variable access to the code that access it. I pasted you the
> section.
>
> You quoted the section to me?
>
> More like, I quoted the section on volatile and pragma( optimize ) for
> Oliver S. when I posted the lock-free instance once code. Which started
this
> discussion in the first place:

I quoted this section to you:

"Objects declared as volatile are not used in optimizations because their
value can change at any time. The system always reads the current value of a
volatile object at the point it is requested, even if the previous
instruction asked for a value from the same object. Also, the value of the
object is written immediately on assignment."

Please explain to me how you can control when the current value of an
object is read without using any fences.

DS

David Schwartz

unread,

Jun 2, 2003, 3:13:21 PM6/2/03

to

"Ziv Caspi" <zi...@netvision.net.il> wrote in message
news:3edb1fca.2111780319@newsvr...

> On Sat, 31 May 2003 17:09:55 -0700, "David Schwartz"
> <dav...@webmaster.com> wrote:

> > The documentation for VC++ says:

> >"Objects declared as volatile are not used in optimizations because their
> >value can change at any time. The system always reads the current value
>>of a
> >volatile object at the point it is requested, even if the previous
> >instruction asked for a value from the same object. Also, the value of
>>the
> >object is written immediately on assignment."

> > This, at least to me, implies that 'volatile' must defeat weak
ordering
> >with the appropriate fences.

> I've read this section a couple of times, and still don't understand
> how your conclusion follows. What I infer from the quote above (as
> well as some experience in the area, plus some discussions with the
> people who built the compiler) is that when you use volatile, you
> essentially tell the compiler "do what I say" -- when you set a value
> to a volatile object, the compiler generates the store instruction in
> the same program location you asked for.

It says that "the value will be read" at the point it is requested. It
deosn't say that a read instruction will be emitted. Nor could it, because
it's high-level language documentation for a compiler. Such documentation
can't talk about where assembly instructions are emitted because that would
inappropriately cross domain boundaries.

> >It says it's read "at the point it is
> >requested" and it says it's written "immediately on assignment". If this
> >allows weak ordering, then the documentation is incorrect or, at best,
> >highly misleading.

> The model you seem to be reasoning along is that of a single
> synchronization point (probably the RAM) which all reads and writes
> are measured against. This model is not followed by modern processors
> for some time now, and is being replaced by cross-processor
> observability guarantees (if you put a fence between writing to a and
> writing to b on processor A, and you have a fence when reading them on
> processor B, then you're guaranteed to read this value if you read
> that value).

Then explain to me what the documentation could mean when it says "The

system always reads the current value of a volatile object at the point it
is requested, even if the previous instruction asked for a value from the

same object". If you mean that:

volatile int *a;
int i, j;
i=*a;
j=*a;

If you mean that the read for 'j=*a' could occur before the read for
'i=*a' then the documentation doesn't mean anything. Maybe you with your
secret decode ring can see that it's really about the ordering of assembly
instructions rather than about the ordering of reads, but that's sure as
hell not what it *says*.

> The compiler here is, in general, unable to help further. Compilers
> usually target a variety of processor models (Pentium, P-whatever...)
> and generate code that works on them all. Furthermore, the compiler
> doesn't know when it creates programs whether they are run on
> multiprocessor machines. As added bonus, I believe some processors can
> switch their ordering behavior depending on the area of memory you
> access. All these are in the domain of the platform, not the compiler.

Right, and it's the compiler's job to make sure I don't have to think
about that kind of thing, all I'm supposed to do is rely on the guarantees
the compiler gives me.

> If the compiler were to generate "always working" code, you wouldn't
> want to use it anyway.

I'm not sure what you mean by this. In general, I do want my compiler to
generate code that "always works" on every platform/configuration the
compiler claims to support.

DS

SenderX

unread,

Jun 2, 2003, 3:39:16 PM6/2/03

to

> I quoted this section to you:

I quoted it to Oliver S.

You must not have read it then.

SenderX

unread,

Jun 2, 2003, 3:40:52 PM6/2/03

to

You are confused on this issue.

David Schwartz

unread,

Jun 2, 2003, 3:51:00 PM6/2/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:8LNCa.1108875$S_4.1122132@rwcrnsc53...

> You are confused on this issue.

Then straighten me out.

Do we agree that the documentation says: "The system always reads the

current value of a volatile object at the point it is requested, even if the
previous instruction asked for a value from the same object".

And do we agree that if the compiler documentation makes a particular
guarantee or claim, the compiler should emit whatever assembly instruction
sequences it takes to make that guarantee on every process the compiler
supports?

If you disagree with either of the two things I'm saying above, then we
can discuss those things. Otherwise, with those things agreed upon, my
conclusion follows immediately.

DS

Alexander Terekhov

unread,

Jun 2, 2003, 4:46:59 PM6/2/03

to

David Schwartz wrote:
>
> "SenderX" <x...@xxx.xxx> wrote in message
> news:8LNCa.1108875$S_4.1122132@rwcrnsc53...
>
> > You are confused on this issue.
>
> Then straighten me out.
>
> Do we agree that the documentation says: "The system always reads the
> current value of a volatile object at the point it is requested, even if the
> previous instruction asked for a value from the same object".

Yes, but it's no more "current" than the value you write/read to
some stdio file... with the only difference that C/C++ volatiles
aren't synchronized ala {brain-dead} stdio files with their
implicit recursive locking scheme -- "strong" thread-safety that,
IMO, POSIX mandates rather *foolishly* (instead of providing much
more *efficient and sufficient* "basic" thread-safety guarantee
for ALL stdio operations on streams... not only *_unlocked() for
characters).

>
> And do we agree that if the compiler documentation makes a particular
> guarantee or claim, the compiler should emit whatever assembly instruction
> sequences it takes to make that guarantee on every process the compiler
> supports?

C/C++ volatiles are single-threaded beasts. The only case that
kinda "covers" asynchrony (and atomicity) is "static volatile
sig_atomic_t"; but it's also an "unsafe" thread-safety thing; it
doesn't ensure visibility across threads [see 4.10 rules; they
don't make any exceptions for static volatile sig_atomic_t's...
and you can't really synchronize it because that would require
async-signal-safe pthread calls, to begin with].

regards,
alexander.

David Schwartz

unread,

Jun 2, 2003, 5:32:45 PM6/2/03

to

"Alexander Terekhov" <tere...@web.de> wrote in message

news:3EDBB7C3...@web.de...

> > And do we agree that if the compiler documentation makes a particular
> > guarantee or claim, the compiler should emit whatever assembly
instruction
> > sequences it takes to make that guarantee on every process the compiler
> > supports?

> C/C++ volatiles are single-threaded beasts.

If this were any situation other than VC++ on WIN32, I'd agree with you.
However, in the special case of Microsoft Visual C++ on WIN32, you should be
able to assume that everything is talking about the multithreaded case
unless they say otherwise.

With GCC, for example, the documentation was written without
multithreading in mind. So if multithreading is an exception to something in
the documentation, you wouldn't expect it to be noted there. But VC++ was
developed from the ground up with multithreading in mind. In fact, WIN32 was
developed from the ground up for multithreading. The documentation reflects
this.

In any event, this could be an issue even with single-threaded programs.
Memory can be shared across processes running concurrently on different CPUs
even by programs that aren't multi-threaded.

If 'volatile' doesn't do what it's documented to do, then the
documentation is erroneous.

DS

SenderX

unread,

Jun 2, 2003, 7:36:55 PM6/2/03

to

> If 'volatile' doesn't do what it's documented to do, then the
> documentation is erroneous.

Nope. Its not in error.

It only talks about what the compiler is going to do, not what the hardware
is going to do.

VC++ does not wrap volatile access with acquire / release fences.

David Schwartz

unread,

Jun 2, 2003, 8:02:29 PM6/2/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:rcRCa.1110427$S_4.1123860@rwcrnsc53...

> > If 'volatile' doesn't do what it's documented to do, then the
> > documentation is erroneous.

> Nope. Its not in error.

> It only talks about what the compiler is going to do, not what the
hardware
> is going to do.

Okay, this is just ridiculous. The documentation does not tell you what
the compiler does or what the hardware does, it tells you what the
'volatile' specifier does. It says that the volatile specifier ensures that
the load occurs where the statement is. It is the compiler's job to emit the
necessary assembly code to make the hardware do whatever it is the
compiler's documentation says the qualifier is going to do.

If the compiler does not emit the necessary assembly codes to make the
hardware do what the documentation says will happen, the either the
documentation is in error or the compiler is broken.

People trying to use *compilers* should not have to know anything about
the hardware unless they want to. If the documentation for a compiler says a
language construct will have a particular effect, then the compiler has to
make the hardware do that. End of story.

DS

SenderX

unread,

Jun 3, 2003, 3:07:25 AM6/3/03

to

> People trying to use *compilers* should not have to know anything
about
> the hardware unless they want to. If the documentation for a compiler says
a
> language construct will have a particular effect, then the compiler has to
> make the hardware do that. End of story.

lol.

You really don't have a clue.

VC++ does not wrap volatile var access, with acquire / release fences.

Look at the new Java memory model coming out or .NET, it does wrap volatile
with fences cause its a high-level lang.

C is NOT a high-level lang.

I want you to show me disassembled VC++ volatile vars access, that's wrapped
with fence opcodes.

You will not be able to do this, cause your totally wrong about C/C++
volatiles.

Momchil Velikov

unread,

Jun 3, 2003, 3:28:07 AM6/3/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message news:<bbgoik$cbl$1...@nntp.webmaster.com>...

> "SenderX" <x...@xxx.xxx> wrote in message
> news:rcRCa.1110427$S_4.1123860@rwcrnsc53...
>
> > > If 'volatile' doesn't do what it's documented to do, then the
> > > documentation is erroneous.
>
> > Nope. Its not in error.
>
> > It only talks about what the compiler is going to do, not what the
> hardware
> > is going to do.
>
> Okay, this is just ridiculous. The documentation does not tell you what
> the compiler does or what the hardware does, it tells you what the
> 'volatile' specifier does. It says that the volatile specifier ensures that
> the load occurs where the statement is. It is the compiler's job to emit the
> necessary assembly code to make the hardware do whatever it is the
> compiler's documentation says the qualifier is going to do.

So, the documentation is incomplet and inkorrect. Hardly surprising.
Isn't it just clear that the documentation speaks about emitting read
instructions and load does occur, only that the value read is
"current" only in certain situations, i.e. in uniprocessor ?

> People trying to use *compilers* should not have to know anything about
> the hardware unless they want to. If the documentation for a compiler says a
> language construct will have a particular effect, then the compiler has to
> make the hardware do that. End of story.

Or the documentation be corrected, because it is not what the compiler
ACTUALLY DOES. Period.

~velco

Alexander Terekhov

unread,

Jun 3, 2003, 4:02:10 AM6/3/03

to

David Schwartz wrote:
[...]

> People trying to use *compilers* should not have to know anything about
> the hardware unless they want to. If the documentation for a compiler says a
> language construct will have a particular effect, then the compiler has to
> make the hardware do that. End of story.

DS, Microsoft simply doesn't have people that fully understand the issues
of memory ordering and synchronization. Here's an illustration:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/synchronization_and_multiprocessor_issues.asp
(Synchronization and Multiprocessor Issues)

regards,
alexander.

David Schwartz

unread,

Jun 3, 2003, 4:25:51 AM6/3/03

to

"Momchil Velikov" <ve...@fadata.bg> wrote in message
news:87bded37.03060...@posting.google.com...

> "David Schwartz" <dav...@webmaster.com> wrote in message
news:<bbgoik$cbl$1...@nntp.webmaster.com>...

> So, the documentation is incomplet and inkorrect. Hardly surprising.

> Isn't it just clear that the documentation speaks about emitting read
> instructions and load does occur, only that the value read is
> "current" only in certain situations, i.e. in uniprocessor ?

No, it's not clear. I shouldn't even have to know what weak ordering
*is* to understand what a qualifier in a high-level language does.

> > People trying to use *compilers* should not have to know anything
about
> > the hardware unless they want to. If the documentation for a compiler
says a
> > language construct will have a particular effect, then the compiler has
to
> > make the hardware do that. End of story.

> Or the documentation be corrected, because it is not what the compiler
> ACTUALLY DOES. Period.

That's my point. The documentation is in error. Defending it by saying
that the compiler is not responsible for what the hardware does it utter
nonsense.

DS

David Schwartz

unread,

Jun 3, 2003, 4:27:48 AM6/3/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:NOXCa.1113770$S_4.1148965@rwcrnsc53...

> > People trying to use *compilers* should not have to know anything
> > about
> > the hardware unless they want to. If the documentation for a compiler
says
> > a
> > language construct will have a particular effect, then the compiler has
to
> > make the hardware do that. End of story.

> lol.

> You really don't have a clue.

Then show me where i'm wrong.

> VC++ does not wrap volatile var access, with acquire / release fences.

I never said it did. I said the compiler said that it provided ordering
guarantees. How it does that is not what I'm talking about.

> I want you to show me disassembled VC++ volatile vars access, that's
wrapped
> with fence opcodes.

I never said VC++ wraps volatile variable accesses with fence opcodes. I
said the documentation claims that VC++ volatile variable accesses have
ordering guarantees. If the only way to do this is by emitting fence
opcodes, then if VC++ does not emit fence opcodes, it's not doing what the
documentation says it will do.

> You will not be able to do this, cause your totally wrong about C/C++
> volatiles.

I'm not wrong. I'm not making any claims about C/C++ volatiles. The VC++
documentation is.

DS

David Schwartz

unread,

Jun 3, 2003, 4:43:12 AM6/3/03

to

"Alexander Terekhov" <tere...@web.de> wrote in message

news:3EDC5602...@web.de...

> DS, Microsoft simply doesn't have people that fully understand the issues
> of memory ordering and synchronization. Here's an illustration:

You know, I think you hit the nail on the head.

DS

SenderX

unread,

Jun 3, 2003, 5:40:15 AM6/3/03

to

> No, it's not clear. I shouldn't even have to know what weak ordering
> *is* to understand what a qualifier in a high-level language does.

You better know what weak-ordering is if you plan to deploy on a non-TSO
multi-processor box!

The discussion is on C/C++ compilers, not high-level lang's anyway.

;)

SenderX

unread,

Jun 3, 2003, 5:41:00 AM6/3/03

to

> I'm not wrong. I'm not making any claims about C/C++ volatiles. The
VC++
> documentation is.

The docs assume uni-processor for mem access, and say that volatile will not
be used in compiler optimizations.

They are not wrong at all.

SenderX

unread,

Jun 3, 2003, 6:15:00 AM6/3/03

to

There using InterlockedExchange as a barrier, lol.

David Schwartz

unread,

Jun 3, 2003, 6:33:41 AM6/3/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:M2_Ca.556903$Si4.5...@rwcrnsc51.ops.asp.att.net...

> > I'm not wrong. I'm not making any claims about C/C++ volatiles. The
> > VC++
> > documentation is.

> The docs assume uni-processor for mem access, and say that volatile will
not
> be used in compiler optimizations.

Why do you say the docs assume uni-processor for mem access? What good
would compiler documentation be if it assumed particular hardware
configurations in the documentation of basic language structures?!

> They are not wrong at all.

If the compiler's documentation about a qualifier is correct for only
some hardware platforms that the compiler clamis to support, the
documentation is badly broken. I shouldn't have to think about different
hardware platforms when I write C code -- I should be able to rely upon the
guarantees the compiler's documentation provides on all supported hardware
platforms, processors, operating systems, and so on.

Why are you trying to make excuses for something that's so obviously
wrong?

DS

SenderX

unread,

Jun 3, 2003, 6:42:00 AM6/3/03

to

> I shouldn't have to think about different hardware platforms when I write

C code.

lol

Lee Chapman

unread,

Jun 3, 2003, 6:30:09 AM6/3/03

to

> There using InterlockedExchange as a barrier, lol.

I really don't understand your problem with this. The documentation simply
says that you can use the Interlocked functions to avoid race conditions
because the OS's *implementation of those functions* will contain the
necessary memory barrier opcodes to ensure predicable results on a
multiprocessor machine.

i.e. Yes, you need barriers, fences, sheep, whatever, but as a Windows
programmer you don't have to worry about the processor & memory model
specifics - instead you can rely on the OS to do this for you when you call
one of the Interlocked functions.

- Lee

SenderX

unread,

Jun 3, 2003, 7:11:56 AM6/3/03

to

> i.e. Yes, you need barriers, fences, sheep, whatever, but as a Windows
> programmer you don't have to worry about the processor & memory model
> specifics - instead you can rely on the OS to do this for you when you
call
> one of the Interlocked functions.

Well... Yeah that would give you portable barriers.

Alex's atomic<> should solve all of this, it should allow you to use
portable barriers.

Hurry up Alex!

;)

Lee Chapman

unread,

Jun 3, 2003, 7:26:31 AM6/3/03

to

David, the C/C++ languages know nothing about multiple threads, let alone
multiple processors. Threads are not a part of the language specifications;
they are provided by operating systems in the same way that GUI functions
are provided to allow output to be sent to a monitor. I doubt you would
suggest that the C/C++ specs should be concerned with drawing polygons on a
monitor, and in the same way, they have no interest in multiple threads or
processors.

This means that a C complier is free to re-order statements (and perform
other optimisations) providing that the observable behaviour is the same
before and after, assuming a single thread of execution.

The volatile keyword is used to prevent certain optimisations by saying that
the contents of the variable can be changed by something other that the
currently executing program. This variable could be the reflection of a
hardware register or, these days, it could be a concurrently executing
thread, possibly one running on a different processor. The C language
doesn't distinguish these cases: as far as it's concerned, and hence as far
as any C compiler is concerned, volatile just means that the value might be
changed by something other than the executing program.

But there are still other optimisations that might re-order statements, even
if they access volatile variables - providing the observable behaviour is
the same assuming a single thread of execution this is perfectly legal (and,
if it improves performance, desirable).

e.g.

volatile int x;
volatile int y;

x = 1;
y = 2;

It doesn't matter whether the compiler generates code to write to x first or
to y first, the observable behaviour (assuming a single thread of execution)
is the same.

But in Visual Studio, we can use

#pragma optimize("", off)

...

#pragma optimize("", on)

to prevent any re-ordering due to optimization.

However, that's only the compiler's side of the story. On a multi-processor
machine, even if the compiler generates the machine code to read and write
the variables in the "correct order", whatever that may be, in reality the
memory model of the machine might be such that different threads see the end
results of these reads and writes at different times.

e.g.

Thread 1:

x = 1;
y = 2;

Thread 2:

z = 0;

if (x == 1)
z = y;

On a multi-processor machine, there is no guarantee that the machine
language generated by your C compiler, even with volatiles and pragmas in
place to prevent re-ordering, will ensure that z gets the value 2.

This is where memory barriers (again, nothing to do with the C language
spec) come in. I think of them as doing for the hardware what volatile and
#pragma optimize do for the compiler.

Now, if you know the opcodes necessary to implement a memory barrier on your
target machine, you can add them using whatever keyword your compiler
provides for inserting machine code into your C program, but you still have
to be careful that the compiler doesn't re-order them during optimisation!
If you're programming on Windows, there's a better way. You can use the
Interlocked functions etc. and the compiler promises not to re-order the
statements and the Windows OS promises to call the necessary opcodes to
ensure that the processors' caches are synchronised at the right point. The
end result is that you have code that does what you think it will, even on a
multi-processor machine.

- Lee

David Schwartz

unread,

Jun 3, 2003, 7:46:06 AM6/3/03

to

"Lee Chapman" <Please Reply To Group> wrote in message
news:3edc8644$1@shknews01...

> David, the C/C++ languages know nothing about multiple threads, let alone
> multiple processors. Threads are not a part of the language
specifications;
> they are provided by operating systems in the same way that GUI functions
> are provided to allow output to be sent to a monitor. I doubt you would
> suggest that the C/C++ specs should be concerned with drawing polygons on
a
> monitor, and in the same way, they have no interest in multiple threads or
> processors.

We're not talking about C/C++ languages. We're talking about the
documentation for one particular C++ compiler and one that specifically was
designed to support threads from the ground up.

> This means that a C complier is free to re-order statements (and perform
> other optimisations) providing that the observable behaviour is the same
> before and after, assuming a single thread of execution.

We're not talking about "a C compiler". We're talking about Visual C++,
a C/C++ compiler that explicitly claims to support compiling multithreaded
code.

> The volatile keyword is used to prevent certain optimisations by saying
that
> the contents of the variable can be changed by something other that the
> currently executing program. This variable could be the reflection of a
> hardware register or, these days, it could be a concurrently executing
> thread, possibly one running on a different processor. The C language
> doesn't distinguish these cases: as far as it's concerned, and hence as
far
> as any C compiler is concerned, volatile just means that the value might
be
> changed by something other than the executing program.

You would be right if we were talking about what the standard says about
the 'volatile' qualifier, but we're not. We're talking about what the
documentation of one specific compiler says about the 'volatile' qualifier.
You are correct, the C language itself doesn't distinguish these cases, it
basically says that 'volatile' does whatever the implementation wants it to
do. That's why we were talking about the documentation of one specifici
compiler.

> But there are still other optimisations that might re-order statements,
even
> if they access volatile variables - providing the observable behaviour is
> the same assuming a single thread of execution this is perfectly legal
(and,
> if it improves performance, desirable).

Perfectly legal according to what? The C standard? Funny, we weren't
talking about that. The C++ standard? Guess what, we weren't talking about
that either.

> e.g.
>
> volatile int x;
> volatile int y;
>
> x = 1;
> y = 2;
>
> It doesn't matter whether the compiler generates code to write to x first
or
> to y first, the observable behaviour (assuming a single thread of
execution)
> is the same.

Unless, of course, the compiler's documentation SPECIFICALLY SAID THAT
IT WROTE X FIRST. Which is, surprise, surprise, what happened in this case.

> But in Visual Studio, we can use
>
> #pragma optimize("", off)
>
> ...
>
> #pragma optimize("", on)
>
> to prevent any re-ordering due to optimization.

> However, that's only the compiler's side of the story. On a
multi-processor
> machine, even if the compiler generates the machine code to read and write
> the variables in the "correct order", whatever that may be, in reality the
> memory model of the machine might be such that different threads see the
end
> results of these reads and writes at different times.

Fortunately, I don't have to care about that. My compiler specifically
says that it orders the reads/writes. I pasted the section of that
documentation that makes that claim.

> e.g.
>
> Thread 1:
>
> x = 1;
> y = 2;
>
> Thread 2:
>
> z = 0;
>
> if (x == 1)
> z = y;

> On a multi-processor machine, there is no guarantee that the machine
> language generated by your C compiler, even with volatiles and pragmas in
> place to prevent re-ordering, will ensure that z gets the value 2.

Then the compiler emitted the wrong assembly code because the
documentation said that the accesses to volatile variables would take place
in the order they were coded.

> This is where memory barriers (again, nothing to do with the C language
> spec) come in. I think of them as doing for the hardware what volatile and
> #pragma optimize do for the compiler.

Right, nothing to do with the C language. Just like all assembly code
has nothing to do with the C language. Programmers expect the compiler to
emit the assembly code that allows it to make the machine do what the

documentation says it will do.

> Now, if you know the opcodes necessary to implement a memory barrier on

your
> target machine, you can add them using whatever keyword your compiler
> provides for inserting machine code into your C program, but you still
have
> to be careful that the compiler doesn't re-order them during optimisation!
> If you're programming on Windows, there's a better way. You can use the
> Interlocked functions etc. and the compiler promises not to re-order the
> statements and the Windows OS promises to call the necessary opcodes to
> ensure that the processors' caches are synchronised at the right point.
The
> end result is that you have code that does what you think it will, even on
a
> multi-processor machine.

Amazingly, according to the documentation 'volatile' does this all by
itself. My bet is that this was actually intended originally, since x86
preserved ordering. (Even in the case of a prefetch! It invalidates the
prefect if the data is no longer in the cache!) I'm not sure that's still
the case on the P4, and it's damn near impossible to do on x86-64 platforms.
So my bet is Microsoft never thought about it. The guarantee that the
documentation made, and still makes, just became false due to processor
evolution with nobody really thinking about it.

The documentation make a guarantee and the compiler fails to emit the
assembly code to ensure the guarantee. It really is that simple. All this
bullcrap about assembly code and hardware and different processors is so
much smoke and mirrors. Someone using C language structures should be able
to rely upon the guarantees and promises the compiler documentation makes
without having to think about different hardware and assembly code and
whatnot. If the compiler can't guarantee something in the face of some
processors or hardware it claims to support, it shouldn't make that
guarantee.

DS

SenderX

unread,

Jun 3, 2003, 8:04:47 AM6/3/03

to

You are 100% wrong Dave.

Patrick Doyle

unread,

Jun 4, 2003, 8:53:27 PM6/4/03

to

David Schwartz <dav...@webmaster.com> wrote:
>
> If the compiler doesn't know what processor it's building for, it has a
>*very* serious problem! The compiler generates assembler output, so it had
>better know all the semantics of the assembly language it's targetting!

Dude, give up. He doesn't get it, and you're not going to convince him.

Patrick Doyle

unread,

Jun 4, 2003, 9:23:35 PM6/4/03

to

Ziv Caspi <zi...@netvision.net.il> wrote:
>On Sat, 31 May 2003 17:09:55 -0700, "David Schwartz"
><dav...@webmaster.com> wrote:
>
>> The documentation for VC++ says:
>>
>>"Objects declared as volatile are not used in optimizations because their

>>value can change at any time. The system always reads the current value of a

>>volatile object at the point it is requested, even if the previous

>>instruction asked for a value from the same object. Also, the value of the
>>object is written immediately on assignment."
>>
>> This, at least to me, implies that 'volatile' must defeat weak ordering
>>with the appropriate fences.
>
>I've read this section a couple of times, and still don't understand
>how your conclusion follows.

If I may leap in here uninvited...

Then I think you may need some wider experience with memory models. Have a
look at Java's, and then you ought to see how David could reach the conclusion
he reached. (You may not agree, but that's another issue.)

>What I infer from the quote above (as
>well as some experience in the area, plus some discussions with the
>people who built the compiler) is that when you use volatile, you
>essentially tell the compiler "do what I say" -- when you set a value
>to a volatile object, the compiler generates the store instruction in
>the same program location you asked for.

Is that the same as saying it "always reads the current value of a volatile
object at the point it is requested"? The answer depends on your point of
view, and I think David's is as valid as yours.

Let me ask you this: suppose you were to invent a language in which you *do*
want volatiles to exhibit sequentially-consistent memory semantics. How would
you describe this behaviour? Is it unreasonable to think it would be described
something like David's quote?

>>It says it's read "at the point it is
>>requested" and it says it's written "immediately on assignment". If this
>>allows weak ordering, then the documentation is incorrect or, at best,
>>highly misleading.
>
>The model you seem to be reasoning along is that of a single
>synchronization point (probably the RAM) which all reads and writes
>are measured against. This model is not followed by modern processors
>for some time now, and is being replaced by cross-processor

>observability guarantees [...]

Not so in Java. That kind of waffling is not allowed. When they say A should
happen before B, they mean it, and they don't let the processor decide to go
and mess things up. (I'm not saying that C/C++ should be like Java. I only
bring it up to show that your interpretation of how volatiles should behave is
not the only reasonable one.)

>The compiler here is, in general, unable to help further. Compilers
>usually target a variety of processor models (Pentium, P-whatever...)
>and generate code that works on them all.

So what? If we lived by your rules, compilers couldn't emit SSE2 instructions
in case they need to run on a P2.

>Furthermore, the compiler doesn't know when it creates programs whether they

>are run on multiprocessor machines. As added bonus, I believe some processors
>can switch their ordering behavior depending on the area of memory you access.

>All these are in the domain of the platform, not the compiler. If the

>compiler were to generate "always working" code, you wouldn't want to use it
>anyway.

This is completely fictitious. Compilers can and do deal with these issues.

Besides, even if the compiler couldn't know all these things, all it would need
to do to make volatiles correct in these situations is conservatively generate
the appropriate fences. (Nobody cares much about the performance of volatiles
anyway, right? :-)

But above all, the point (I think) is that if a sequentially-consistent memory
model is not being provided, then the wording of the description of "volatile"
should be chosen to make that clear.

Patrick Doyle

unread,

Jun 4, 2003, 9:29:45 PM6/4/03

to

David Schwartz <dav...@webmaster.com> wrote:
>
>"SenderX" <x...@xxx.xxx> wrote in message
>news:NOXCa.1113770$S_4.1148965@rwcrnsc53...
>
>> > People trying to use *compilers* should not have to know anything
>> > about
>> > the hardware unless they want to. If the documentation for a compiler
>says
>> > a
>> > language construct will have a particular effect, then the compiler has
>to
>> > make the hardware do that. End of story.
>
>> lol.
>
>> You really don't have a clue.
>
> Then show me where i'm wrong.

Man, are you new to usenet? You're never going to convince this guy. Right
now he's just having fun pushing your buttons. It takes virtually no effort on
his part to call you names. I recommend you take up a less frustrating hobby
than to try to treat him as though he were a rational, open-minded individual.

David Schwartz

unread,

Jun 4, 2003, 11:06:51 PM6/4/03

to

"Patrick Doyle" <doy...@eecg.toronto.edu> wrote in message
news:HFzIt...@ecf.utoronto.ca...

> Man, are you new to usenet? You're never going to convince this guy.
Right
> now he's just having fun pushing your buttons. It takes virtually no
effort on
> his part to call you names. I recommend you take up a less frustrating
hobby
> than to try to treat him as though he were a rational, open-minded
individual.

Yeah, I'll take up juggling chainsaws.

DS

SenderX

unread,

Jun 5, 2003, 3:22:37 AM6/5/03

to

We're talking about C/C++ compilers.

They do NOT wrap volatile access like revised java or .net volatiles.

So why change the subject to higher-level lang's, which HAVE to follow
David's volatile rules in the first place?

I'm only arguing that C/C++ volatile do not work that way David says they
should/or do? Not sure what he is saying about C...

When I use C, I don't want the compiler to insert barriers opcodes
everywhere like revised java!

So, Dave is correct on high-level langs ( I already said this point some
posts back ), not with C.

=)

Patrick Doyle

unread,

Jun 5, 2003, 7:49:01 AM6/5/03

to

In article <0dCDa.575898$Si4.5...@rwcrnsc51.ops.asp.att.net>,

SenderX <x...@xxx.xxx> wrote:
>We're talking about C/C++ compilers.
>
>They do NOT wrap volatile access like revised java or .net volatiles.
>
>So why change the subject to higher-level lang's, which HAVE to follow
>David's volatile rules in the first place?

Because they are not "David's rules". They are the C/C++ compiler's own
documentation.

>I'm only arguing that C/C++ volatile do not work that way David says they
>should/or do? Not sure what he is saying about C...

Neither is he. He is arguing that the documentation should match the reality.

>When I use C, I don't want the compiler to insert barriers opcodes
>everywhere like revised java!

David has never said he wanted this either. Don't take my word for it; have
another look at his posts.

(Also, Java has always needed fences for volatiles. I'm not sure what you mean
by "revised".)

Ziv Caspi

unread,

Jun 5, 2003, 6:40:36 PM6/5/03

to

On Thu, 5 Jun 2003 01:23:35 GMT, doy...@eecg.toronto.edu (Patrick
Doyle) wrote:

>Ziv Caspi <zi...@netvision.net.il> wrote:
>>On Sat, 31 May 2003 17:09:55 -0700, "David Schwartz"
>><dav...@webmaster.com> wrote:
>>
>>> The documentation for VC++ says:
>>>
>>>"Objects declared as volatile are not used in optimizations because their
>>>value can change at any time. The system always reads the current value of a
>>>volatile object at the point it is requested, even if the previous
>>>instruction asked for a value from the same object. Also, the value of the
>>>object is written immediately on assignment."
>>>
>>> This, at least to me, implies that 'volatile' must defeat weak ordering
>>>with the appropriate fences.
>>
>>I've read this section a couple of times, and still don't understand
>>how your conclusion follows.
>
>If I may leap in here uninvited...
>
>Then I think you may need some wider experience with memory models. Have a
>look at Java's, and then you ought to see how David could reach the conclusion
>he reached. (You may not agree, but that's another issue.)

David is quoting from the documentation of one implementation of a
particular C++ compiler. Prior Java knowledge is not applicable, even
if Java chose to use terms similar to C++'s.

>>What I infer from the quote above (as
>>well as some experience in the area, plus some discussions with the
>>people who built the compiler) is that when you use volatile, you
>>essentially tell the compiler "do what I say" -- when you set a value
>>to a volatile object, the compiler generates the store instruction in
>>the same program location you asked for.
>
>Is that the same as saying it "always reads the current value of a volatile
>object at the point it is requested"? The answer depends on your point of
>view, and I think David's is as valid as yours.
>
>Let me ask you this: suppose you were to invent a language in which you *do*
>want volatiles to exhibit sequentially-consistent memory semantics. How would
>you describe this behaviour? Is it unreasonable to think it would be described
>something like David's quote?

Yes, because the description above lacks any mention of memory
visibility, which is crucial in this context. So, if you wanted to
borrow the quote to plant it into this new language manual, you'd have
to guarantee visibility, and call that out in the documentation.

>>>It says it's read "at the point it is
>>>requested" and it says it's written "immediately on assignment". If this
>>>allows weak ordering, then the documentation is incorrect or, at best,
>>>highly misleading.
>>
>>The model you seem to be reasoning along is that of a single
>>synchronization point (probably the RAM) which all reads and writes
>>are measured against. This model is not followed by modern processors
>>for some time now, and is being replaced by cross-processor
>>observability guarantees [...]
>
>Not so in Java. That kind of waffling is not allowed. When they say A should
>happen before B, they mean it, and they don't let the processor decide to go
>and mess things up. (I'm not saying that C/C++ should be like Java. I only
>bring it up to show that your interpretation of how volatiles should behave is
>not the only reasonable one.)

A) The quote does *not* say anything about A happening before B (it
only speaks about "A")

B) Again, we are not talking about Java

>>The compiler here is, in general, unable to help further. Compilers
>>usually target a variety of processor models (Pentium, P-whatever...)
>>and generate code that works on them all.
>
>So what? If we lived by your rules, compilers couldn't emit SSE2 instructions
>in case they need to run on a P2.

Note that I said "general".

>>Furthermore, the compiler doesn't know when it creates programs whether they
>>are run on multiprocessor machines. As added bonus, I believe some processors
>>can switch their ordering behavior depending on the area of memory you access.
>>All these are in the domain of the platform, not the compiler. If the
>>compiler were to generate "always working" code, you wouldn't want to use it
>>anyway.
>
>This is completely fictitious. Compilers can and do deal with these issues.

What I mean is that if the compiler behaves defensively, and always
inserts the code necessary for synchronization, you'd pay a very high
run-time price even in cases you don't need to. Paying for what you
don't use is not considerd good practice in the C++ world. Based on
your next paragraph, it appears you agree :-)

>Besides, even if the compiler couldn't know all these things, all it would need
>to do to make volatiles correct in these situations is conservatively generate
>the appropriate fences. (Nobody cares much about the performance of volatiles
>anyway, right? :-)
>
>But above all, the point (I think) is that if a sequentially-consistent memory
>model is not being provided, then the wording of the description of "volatile"
>should be chosen to make that clear.

Documentation should call out for what *is* provided, not what is not,
unless there's a good reason (for example, the compiler deviates from
the standard, or it is "reasonable" to assume users would expect the
compiler to behave in a way that it doesn't). I think this is very
different than saying "the documentation lie".

Ziv

David Schwartz

unread,

Jun 5, 2003, 5:54:25 PM6/5/03

to

"Ziv Caspi" <zi...@netvision.net.il> wrote in message
news:3edf1cb5.218137645@newsvr...

> >>>"Objects declared as volatile are not used in optimizations because
their
> >>>value can change at any time. The system always reads the current value
of a
> >>>volatile object at the point it is requested, even if the previous
> >>>instruction asked for a value from the same object. Also, the value of
the
> >>>object is written immediately on assignment."

> Yes, because the description above lacks any mention of memory

> visibility, which is crucial in this context. So, if you wanted to
> borrow the quote to plant it into this new language manual, you'd have
> to guarantee visibility, and call that out in the documentation.

You don't have to worry about memory visibility. So long as a value is
written "immediately on assignment" and the "current value" is "always read"
"at the point it is requested", memory visibility can never cause you any
problems. And the documentation says that this is exactly what happens.

> >>>It says it's read "at the point it is
> >>>requested" and it says it's written "immediately on assignment". If
this
> >>>allows weak ordering, then the documentation is incorrect or, at best,
> >>>highly misleading.

> A) The quote does *not* say anything about A happening before B (it

> only speaks about "A")

It says A happens "immediately" and "at the point it is requested". If B
also happens "immediately" and "at the point it is requested", then we can
infer the relatiev ordering of A and B.

SenderX

unread,

Jun 5, 2003, 7:12:39 PM6/5/03

to

> You don't have to worry about memory visibility.

You are not a C programmer at all correct?

David Schwartz

unread,

Jun 5, 2003, 7:58:35 PM6/5/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:H7QDa.55733$DV.7...@rwcrnsc52.ops.asp.att.net...

> > You don't have to worry about memory visibility.

> You are not a C programmer at all correct?

Okay, I give up. Cutting a statement off from its supporting rationale
and using it to question my competence is flat out malicious.

DS

SenderX

unread,

Jun 5, 2003, 8:13:09 PM6/5/03

to

> Okay, I give up. Cutting a statement off from its supporting rationale
> and using it to question my competence is flat out malicious.

You don't think others can read your rationale?

I have to admit, some programmers would probably like to have a C compiler
that would inject barriers on volatile access like revised java.

But I think most would not?

Maybe have upcoming C compilers give an option to inject fences on volatile.
I would support that.

David Schwartz

unread,

Jun 5, 2003, 11:45:40 PM6/5/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:p0RDa.1142022$S_4.1173362@rwcrnsc53...

> > Okay, I give up. Cutting a statement off from its supporting
rationale
> > and using it to question my competence is flat out malicious.

> You don't think others can read your rationale?

I think everyone else can, it just seems that you can't.

> I have to admit, some programmers would probably like to have a C compiler
> that would inject barriers on volatile access like revised java.
>
> But I think most would not?

Most probably would not. Though some might like to have a new keyword
(reallyvolatile?) that did that.

> Maybe have upcoming C compilers give an option to inject fences on
volatile.
> I would support that.

Wouldn't it just be easier to fix the documentation?

DS

SenderX

unread,

Jun 6, 2003, 9:54:06 PM6/6/03

to

> On a Itanium-II a write without a fence even wouln't be immediately
> visible to the code following the write - so a volatile which does a
> write without a fence would be useless.

Current C volatiles are useless for non-TSO processors.

You need to add your own barriers.

SenderX

unread,

Jun 6, 2003, 9:55:49 PM6/6/03

to

> And you think the fence-instructions introduced with the Pentium-III
> are necessary for thread-coherency. That's even LOLler.

C apps for SMP processors, will need some fences.

SenderX

unread,

Jun 6, 2003, 9:58:54 PM6/6/03

to

> you don't even need fences for writes as they
> become virtually visible immediately after the write-instruction.

You will need them for loads, no?

Like, after you load the head of a lock-free stack you should do a load
barrier.

Alexander Terekhov

unread,

Jun 7, 2003, 7:39:37 AM6/7/03

to

"Oliver S." wrote:
>
> > Current C volatiles are useless for non-TSO processors.
> > You need to add your own barriers.
>

> And why should there be a volatile-specifier even if the processor
> issuing the write can't see the effect of the write immediately ?

Such processor would only be useful as a "hi-tech" heating device.
"Oliver's Intel Inside". ;-)

regards,
alexander.

Alexander Terekhov

unread,

Jun 7, 2003, 9:17:04 AM6/7/03

to

"Oliver S." wrote:
>
> > Such processor would only be useful as a "hi-tech" heating device.
> > "Oliver's Intel Inside". ;-)
>

> Then the Itanium is a such device as it needs a fence to see its
> own writes.

That's "Oliver's Itanium Inside", I guess.

regards,
alexander.

Joseph Seigh

unread,

Jun 7, 2003, 10:12:13 AM6/7/03

to

"Oliver S." wrote:
>
> > VC++ does not wrap volatile var access, with acquire / release fences.
>
> That's because on x86-Systems even the write-queue of the CPU is trans-
> parent to any subsequent reads. And because of the ownership-semantics
> with MESI-coherent caches, you don't even need fences for writes as they

> become virtually visible immediately after the write-instruction.

Cache has nothing to do with the memory model. Cache is ususally transparent.
Memory models that require fences, do so because of their pipelined
architecture. On such processors, fences are required whether or not
cache is enabled or present even.

>
> > Look at the new Java memory model coming out or .NET, it does wrap
> > volatile with fences cause its a high-level lang.
>
> No, not on x86-systems because it's simply not necessary. I suppose
> your x86-Knowledge isn't very deep as you implicitly claim, that coheren-
> cy wouldn't be possible without fences on x86s. So how could it be possi-
> ble to run SMP-capable operating-systems older than this instruction (it
> was introduced with the Pentium-III's SSE-extension) on machines having
> this instruction ?

It would depend of the memory model present on those processors. If it was
TSO, fences would not be required. If it was before Intel supported SMP, the
memory model wouldn't be documented. The various system manufacturers would
have to figure out what the actual memory model was and what to do about it.

Joe Seigh

Alexander Terekhov

unread,

Jun 7, 2003, 10:13:55 AM6/7/03

to

"Oliver S." wrote:
>
> >> Then the Itanium is a such device as it needs a fence to see its
> >> own writes.
>
> > That's "Oliver's Itanium Inside", I guess.
>

> No, every Itanium - if the write buffer isn't flushed to the cache, code
> following a write isn't guaranteed to see the effect of this prior write.

Oliver, I'm slowly getting tired of this thread... hint: "uni-processor
coherence".

regards,
alexander.

Joseph Seigh

unread,

Jun 7, 2003, 11:56:33 AM6/7/03

to

"Oliver S." wrote:
>
> > Cache has nothing to do with the memory model. Cache is ususally
> > transparent. Memory models that require fences, do so because of
> > their pipelined architecture. On such processors, fences are
> > required whether or not cache is enabled or present even.
>

> I didn't claim that fences are necessary for cache-consistency;
> I just wanted to oppose to the potential belief that this is necessary.

We're not discussing cache-consistency. You are being OT if you are
introducing cache to the present discussion.

>
> > It would depend of the memory model present on those processors.
>

> We're talking about x86s here; SenderX seems to have checked the code
> VC++ generates on volatiles and didn't see any fences. So he claime that
> volatile is useless with VC++.

I think the argument is that volatile should present correct memory visibility
in a threaded environment. That's not true. The C/C++ standards don't
recognise threading, so to trying to interpret volatile as having some kind
of meaning in a threaded environment is a misapplication of the C/C++ standards.
It's possible that certain compilers may try to add additional semantics not
required by the standard. This would be non portable of course. So the presence
of fences wouldn't prove anything. The absence would prove something, providing
VC++ is compliant. So, I'd say SenderX is right in this case.

And VC++ targets all x86 processors. Trying argue semantics based on a subset of
those processors is faulty logic. But I don't think you are trying to argue that,
are you?

And I strongly recommend that you read section 7.2 of Vol 3. of the IA-32 Intel
Architecture Software Developer's Manual, in particular the last paragraph of
that section.

Joe Seigh

Alexander Terekhov

unread,

Jun 7, 2003, 12:10:06 PM6/7/03

to

Joseph Seigh wrote:
[...]

> And I strongly recommend that you read section 7.2 of Vol 3. of the IA-32 Intel
> Architecture Software Developer's Manual, in particular the last paragraph of
> that section.

Joe, sorry, but I just can't resist... in the name of future
"googlers", so to speak. ;-)

http://google.com/groups?selm=3C3C9B63.DFDC9920%40web.de
(Subject: Re: Multi-Processor Concurrency Problem)

regards,
alexander.

Ziv Caspi

unread,

Jun 7, 2003, 4:05:37 PM6/7/03

to

On 7 Jun 2003 01:07:40 GMT, "Oliver S." <Foll...@gmx.net> wrote:

> On weak-ordered systems a volatile without a fence would simply put
>the write to the write queue and even the code after the write couln't
>rely on the visibility of the write until a fence. For me this looks
>like a "feature without an effect".

If I understand you correctly, you mean something like this:

volatile int i;
int j;
// ...
i = 10;
j = i;

If, following this code, j is not 10, then your C/C++ compiler is
non-conforming regardless of whether i is declared volatile or not.

Ziv

Alexander Terekhov

unread,

Jun 10, 2003, 5:39:59 AM6/10/03

to

"Oliver S." wrote:
>
> > If I understand you correctly ...
>
> That's the problem ! I just opposed to Patrick why this volatile-"fea-
> ture" wouldn't make any sense and not that the forced write instruction
> without a fence makes sense on Non-TSO-CPUs.

>
> > volatile int i;
> > int j;
> > // ...
> > i = 10;
> > j = i;
>

> In this case the compiler sees the dependency and must insert a fence,
> no matter if there's a volatile or not (assuming that the compiler would
> be so stupid to re-load the previous value with non-volatiles).

Oliver, I must say that it's a pretty wise decision on part of your
newsreader to use "X-No-Archive: yes" header.

regards,
alexander.

Balog Pal

unread,

Jun 11, 2003, 5:15:05 AM6/11/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:Ey_Ca.1115530$S_4.1149367@rwcrnsc53...

> There using InterlockedExchange as a barrier, lol.

And what is the problem with that on a real MP IA-32 system? LEt's assume
until proven otherwise that no OS sets memory to WC mode. Then what is the
problem?

Show me a scenario when something breaks.

[and please straight to the real point. no talk of generalities, fictions,
anything irrelevant.]

(Or do they use it on other achitectures?)

Paul

Balog Pal

unread,

Jun 11, 2003, 5:42:12 AM6/11/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:yFbEa.71347$DV.8...@rwcrnsc52.ops.asp.att.net...

> Like, after you load the head of a lock-free stack you should do a load
> barrier.

Yeah, you should. Then you also shall know that the #LoadLoad barrier is a
no-op for so many systems.

I found no case where you'd really need load barrier on win32/ia32, a #LOCK
at the writer site will propagate changes to all processors. [using
slightly different methods on different processors, but the end result is
the same, the 'observer' will never see changes in opposite order as they
appear around the locking instruction on the other processor. ]

Paul

Balog Pal

unread,

Jun 11, 2003, 5:58:02 AM6/11/03

to

"Oliver S." <Foll...@gmx.net> wrote in message
news:Xns9394156292D...@130.133.1.4...

> >> No, every Itanium - if the write buffer isn't flushed to the cache,
code
> >> following a write isn't guaranteed to see the effect of this prior
write.
>
> > Oliver, I'm slowly getting tired of this thread... hint: "uni-processor
> > coherence".
>

> A hint can't be criticized;
> so you could have no clue without being criticized.

Really, Oliver, it's confuse what you state. Do you mean code following
write executing on the _same_ processor or on another processor? You'd be
correct for the latter, wrong for the former.

Hopefully by 'see' you mean 'read' and not speculate what could possibly
happen without actual reading. As those are different cases. Speculative
read and execute is allowed for the processor, but not without a set of
constraints. 'processor self coherency' being one of those at least if you
access memory in the same chunks.

Guess anyone'd refuse to write code to a proc-architecture where you store 1
and read back any random value written a month before or that will be
written sometimes later. ;-)

Paul

Balog Pal

unread,

Jun 11, 2003, 6:28:22 AM6/11/03

to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3EE20E5E...@web.de...

> Joe, sorry, but I just can't resist... in the name of future
> "googlers", so to speak. ;-)
>
> http://google.com/groups?selm=3C3C9B63.DFDC9920%40web.de
> (Subject: Re: Multi-Processor Concurrency Problem)

It worth quoting directly:

"IA-32 Intel ® Architecture
Software Developer's
Manual

.
.
.
Despite the fact that Pentium 4, Intel Xeon, and P6 family
processors support processor ordering, Intel does not guarantee
that future processors will support this model. To make software
portable to future processors, it is recommended that operating
systems provide critical region and resource control constructs
and API's (application program interfaces) based on I/O, locking,
and/or serializing instructions be used to synchronize access to
shared areas of memory in multiple-processor systems. "

IMHO no-one told the opposite. Yes, everyone shall know whats he doing, and
why's he doing that. The baseline rule to threading, to shared access is DO
SYNC. Failing to think that way one falls on face soon on the most
conservative environment running a single processor.

Then, knowing what happens makes it possible to use the system specifics. To
write code for some existing system, not one 'could possibly exist'. In
the C world IMHO that's a common approach. Portability is cool if obtained
without effort, otherwise you drop it unless required. Is stuff that
compiles and works well on unix and win32 ubiquitous? Sure it exists, I
wrote such stuff myself, but it's rare. Most Win32 work is aimed strongly
to use win32, and will never leave it. (Yeah, that's lock in. If you think
its evil please dont hit the messenger, I didn't create the situation, just
describe how we live with it.)

Honestly I seriously doubt any future intel win32/ia32 system will break the
sync model code uses today. We expect our already bought W95, NT4, XP
or whatever OS to execute on the next PC with the next intel processor.
Should it crash, should it require a new OS people would by 3-4 magnitudes
less of those chips. Intel aims for that, sure. ;)

If MS people use lock as a barrier, that pretty well closes the issue, it
locks in intel's path. Why throw away the little benefits of an antipattern
and keep just the problems?

Paul

SenderX

unread,

Jun 11, 2003, 12:24:36 PM6/11/03

to

> And what is the problem with that on a real MP IA-32 system?

Actually, I was responding to / kidding with Alex. He is supposed to have
portable atomic ops ready, in the not to far future.

;)

> Then what is the problem?

There is really nothing wrong with it at all. You are correct on this.

> Show me a scenario when something breaks.

It won't break, if windows runs... The Interlocked API's that the OS
supports run.

The mem-model is supposed to be irrelevant when using themselves.

In fact my high-speed test library, AppCore, uses the Interlocked API's
everywhere I need to change shared memory.

The only place where I would like to have Alex's atomics, is in my lock-free
algos. To make them portable to other OS's and processors.

Alexander Terekhov

unread,

Jun 11, 2003, 12:56:08 PM6/11/03

to

SenderX wrote:
[...]

> > Then what is the problem?
>
> There is really nothing wrong with it at all. You are correct on this.

Nope. The problem is this:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/synchronization_and_multiprocessor_issues.asp

That silly example "presented" by the MS is nothing but a totally busted
DCCI (well, "atomic storage" and brain-damaged "return TRUE/return FALSE"
aside for a moment). The "right" DCCI is this:

/* ... */
atomic<stuff *> stuff_ptr;
/* ... */
}; // class thing

const stuff & thing::stuff_instance() { // "lazy" one
stuff * ptr;
// hoist load barrier (with data dependencyncy "hint")
if (0 == (ptr = stuff_ptr.load(msync::ddhlb))) {
ptr = new stuff(/*...*/);
// sink store barrier
if (!stuff_ptr.attempt_update(0, ptr, msync::ssb)) {
delete ptr;
// hoist load barrier (with data dependencyncy "hint")
if (0 == (ptr = stuff_ptr.load(msync::ddhlb)))
abort();
}
}
return *ptr;
}

More "illustrations" can be found here:

http://terekhov.de/pthread_refcount_t/experimental/refcount.cpp

regards,
alexander.

SenderX

unread,

Jun 11, 2003, 1:34:23 PM6/11/03

to

> > There is really nothing wrong with it at all. You are correct on this.
>
> Nope. The problem is this:

The Interlocked API's are fine for Win32/64 apps. I don't believe they we're
meant to create an atomic C/C++ standard anyway...

Your atomics should be a lot better, mainly because of the portability
factor.

> The "right" DCCI is this:

..snip..

I have posted 2 100% working DCCI algos on this group that do just that.
They are 100% safe when compiled with VC++ and IA-32/64.

One used Joe Sieghs atomic_ptr, with my ABA CAS. The other used the
interlocked api's.

Now, if I used your atomic template, it should would work on different
compilers and processors, like IA-32/64, PowerPC, Alpha, Sparc, ect...
Correct?

By the way...

Would I be able to access a processors "wide" CAS or LL/SC operations using
your atomic< T >?

Like the cmpxchg8b, and the cmp8xchg16b IA-32/64 instructions...

I would need those if I we're to use your template for some more of my more
complex lock-free algos I have presented. Like my fast read / write lock, or
my semaphore.

Alexander Terekhov

unread,

Jun 11, 2003, 3:33:14 PM6/11/03

to

SenderX wrote:
[...]

> I would need those if I we're to use your template for some more of my more
> complex lock-free algos I have presented. Like my fast read / write lock, or
> my semaphore.

Your "counting section" thing is busted (for that and other reasons,
you should really call it "a metered" one... ala the {in}famous MS
silliness). Study condvars, really. And as for the "fast read / write
lock"...

http://groups.google.com/groups?selm=3D9196B2.9BC29299%40web.de

Well, and perhaps:

http://groups.google.com/groups?selm=3DAA9F0F.D3EA3B55%40web.de

and note that start_write() needs msync::acq semantics, end_write()
needs msync::rel semantics, start_read() needs msync::hlb semantics
and end_read() needs msync::slb thing. Did I already mention that
MS-interlocked stuff sucks miserably?

regards,
alexander.

SenderX

unread,

Jun 11, 2003, 4:15:04 PM6/11/03

to

> Your "counting section" thing is busted

Really!?!

Na... ;)

It has been working perfect on all the multi-processor IA-32 boxs ( they use
cmpxchg8b ) I've tested them on.

All the counting, for the count and waits are atomic. All the waiter
signaling gets queued up on the OS semaphore. That makes for a perfect
atomic predicate loop. No signals get lost, they all queue up.

Please, I need you to show me where its messed up? Really, I would really
like to know!

;)

It simply has not crashed under tremendous, extended ( days at a time ),
load on 4-processor Xeon boxs.

If it is busted, please tell me how to reproduce a failure so I can see the
supposed bug in action.

Thanks Alex.

SenderX

unread,

Jun 11, 2003, 4:33:44 PM6/11/03

to

> Did I already mention that
> MS-interlocked stuff sucks miserably?

So any Win32/64 App that uses the Interlocked API will suffer from random
crashes? Microsoft should be dismantled for that!

I just want to clarify how bad they are. I've seen thousands of secure
programs that use them without any errors, like threaded COM objects.

Are they crappy for IA-64, or both IA-32 and 64?