Is this thread-safe on multi-processor Win32?

22 views
Skip to first unread message

Lee Chapman

unread,
May 16, 2003, 7:20:41 AM5/16/03
to
Hi guys and gals. I want a cheap C++ class that I can use to easily access
the Ansi or Unicode version of a string on Win32.

I've come up with the following, where the only overhead in making the class
thread-safe is initialization of a long (m_lock).

I was wondering if anyone would be kind enough to give it a quick once-over
to see if I've made any obvious mistakes w.r.t. to its thread safety. The
code only has to be safe to use in a mutli-threaded process on a
multi-processor machine, but only under Win32. I'm assuming that in such
scenarios, the reading and writing of char *, wchar_t * and long variables
is atomic, and that the interlocked functions are providing all the
necessary memory barriers to keep it working on a multi-processor machine.

Thanks in advance,
- Lee


class String
{
private:
char * volatile m_pAnsi;
wchar_t * volatile m_pUni;
long volatile m_lock;

public:
String()
{
m_pAnsi = NULL;
m_pUni = NULL;
m_lock = 0;
}

String(char * pAnsi)
{
m_pAnsi = _strdup(pAnsi);
m_pUni = NULL;
m_lock = 0;
}

String(wchar_t * pUni)
{
m_pAnsi = NULL;
m_pUni = wcsdup(pUni);
m_lock = 0;
}

~String()
{
free(m_pAnsi);
free(m_pUni);
}

operator const char *()
{
if (m_pAnsi == NULL && m_pUni != NULL)
{
if (InterlockedCompareExchange(&m_lock, 1, 0) == 0)
{
int length = (int)wcslen(m_pUni);
int size = WideCharToMultiByte(CP_ACP, 0, m_pUni, length, NULL, 0, NULL,
NULL);
m_pAnsi = (char *)malloc(size);
WideCharToMultiByte(CP_ACP, 0, m_pUni, length, m_pAnsi, 0, NULL, NULL);

InterlockedExchange(&m_lock, -1);
}
else
{
while (m_lock >= 0)
{
SleepEx(0, FALSE);
}
}
}

return m_pAnsi;
}

operator const wchar_t *()
{
if (m_pUni == NULL && m_pAnsi != NULL)
{
if (InterlockedCompareExchange(&m_lock, 1, 0) == 0)
{
int length = (int)strlen(m_pAnsi);
int size = MultiByteToWideChar(CP_ACP, 0, m_pAnsi, length, NULL, 0);
m_pUni = (wchar_t *)malloc(size);
MultiByteToWideChar(CP_ACP, 0, m_pAnsi, length, m_pUni, 0);

InterlockedExchange(&m_lock, -1);
}
else
{
while (m_lock >= 0)
{
SleepEx(0, FALSE);
}
}
}

return m_pUni;
}


Alexander Terekhov

unread,
May 16, 2003, 7:55:57 AM5/16/03
to

Lee Chapman wrote:
>
> Hi guys and gals. I want a cheap C++ class that I can use to easily access
> the Ansi or Unicode version of a string on Win32.
>
> I've come up with the following, where the only overhead in making the class
> thread-safe is initialization of a long (m_lock).

AFAICS, your solution is totally broken. It seems that what you need
is a "dynamic" pthread_once() that would allow you to "copy-construct"
an "extra representation" upon request if it doesn't match the initialy
constructed/provided one. Welcome to The-"dynamic-pthread_once()"-Club.

regards,
alexander.

Alexander Terekhov

unread,
May 16, 2003, 8:24:00 AM5/16/03
to

I forgot one thing. Joe, please don't confuse him with totally
braindamaged MS-interlocked stuff meant to provide

class stuff { /* ... */ };

class thing {
public:
thing(/* ... */) : stuff_ptr(0) /* ... */ { /*...*/ }
~thing { delete stuff_ptr.load(); /* ...*/ }
/* ... */
const stuff & stuff_instance();
/* ... */
private:
/* ... */
atomic<const stuff*> stuff_ptr;
/* ... */
};

const stuff & thing::stuff_instance() { // "lazy" one
stuff * ptr;
// hoist load barrier (with data dependency "hint")
if (0 == (ptr = stuff_ptr.load_ddhlb())) {
ptr = new stuff(/*...*/);
// sink store barrier
if (!stuff_ptr.attempt_update_ssb(ptr, 0)) {
delete ptr;
// hoist load barrier (with data dependency "hint")
if (0 == (ptr = stuff_ptr.load_ddhlb()))
abort();
}
}
return *ptr;
}

;-)

regards,
alexander.

Lee Chapman

unread,
May 16, 2003, 9:09:19 AM5/16/03
to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3EC4D1CD...@web.de...

>
> AFAICS, your solution is totally broken.
>

That sentence I understand. I don't understand what you've seen that's led
you to that conclusion, but at least I understand what you're saying... :)

> It seems that what you need is a "dynamic" pthread_once() that
> would allow you to "copy-construct" an "extra representation" upon
> request if it doesn't match the initialy constructed/provided one.
> Welcome to The-"dynamic-pthread_once()"-Club.

*blink*

Okay... "pthread_once()" is something to do with the POSIX standard, right?
If so, what relevance does it have to my Win32 implementation?

If the solution is totally broken under Win32, can you give me some hints as
to why?

Thanks,
- Lee


Alexander Terekhov

unread,
May 16, 2003, 9:50:28 AM5/16/03
to

Lee Chapman wrote:
>
> "Alexander Terekhov" <tere...@web.de> wrote in message
> news:3EC4D1CD...@web.de...
> >
> > AFAICS, your solution is totally broken.
> >
>
> That sentence I understand. I don't understand what you've seen that's led
> you to that conclusion,

That's way too long story. ;-)

> but at least I understand what you're saying... :)
>
> > It seems that what you need is a "dynamic" pthread_once() that
> > would allow you to "copy-construct" an "extra representation" upon
> > request if it doesn't match the initialy constructed/provided one.
> > Welcome to The-"dynamic-pthread_once()"-Club.
>
> *blink*
>
> Okay... "pthread_once()" is something to do with the POSIX standard, right?

Right. pthreads-win32**, for example, "provides" one for win32.
It's also totally broken (no one notices it, so it's fine ;-) ).

> If so, what relevance does it have to my Win32 implementation?

The relevance is that pthread_once() is a MECHANISM to perform
thread safe lazy initialization. Well, currently, it's a "C" thing
that works for static stuff only. It shall be 'extended' to support
something along the lines of <http://tinyurl.com/7w7r> (see sort of
"illustrations" embedded in that message).

>
> If the solution is totally broken under Win32, can you give me some hints as
> to why?

Race conditions. Dying "spinlocks". Lack of memory synchronization.

regards,
alexander.

**) http://sources.redhat.com/pthreads-win32

--
"Pthreads win32 is just trying to be a general pupose condition
variable as defined by the pthreads standard, suitable for any
and all threads to go around calling pthread_cond_wait() and
pthread_cond_signal() on. As a consequence the implementation
contains several mutexes, semaphores etc, and a flow of control
that will make your brains dribble out of your ears if you stare
at the code too long (I have the stains on my collar to prove it
;-). Such are the joys of implementing condition variables using
the Win32 synchronization primitives!"

-- <http://tinyurl.com/b9vw>

Lee Chapman

unread,
May 16, 2003, 9:51:34 AM5/16/03
to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3EC4D860...@web.de...
>
> atomic<const stuff*> stuff_ptr;
>

What's the definition of the atomic<> template? It's not one I'm familiar
with. However, I don't see why I would need it using Visual C++ under Win32.
The documentation that comes with Visual C++ 7.0 states that "simple reads
and writes to properly-aligned 32-bit variables are atomic" and that
"without __declspec(align(#)), Visual C++ aligns data on natural boundaries
based on the size of the data, for example 4-byte integers on 4-byte
boundaries and 8-byte doubles on 8-byte boundaries".

> const stuff & thing::stuff_instance() { // "lazy" one
> stuff * ptr;
> // hoist load barrier (with data dependency "hint")
> if (0 == (ptr = stuff_ptr.load_ddhlb())) {
> ptr = new stuff(/*...*/);
> // sink store barrier
> if (!stuff_ptr.attempt_update_ssb(ptr, 0)) {
> delete ptr;
> // hoist load barrier (with data dependency "hint")
> if (0 == (ptr = stuff_ptr.load_ddhlb()))
> abort();
> }
> }
> return *ptr;
> }

A quick search on "ddhlb" has given me "Data dependency hoist load barrier"?
So I assume the atomic template is providing some sort of memory barrier
around access to the pointer. Well, I don't really see how this differs from
my use of the interlocked Win32 functions. Once again I'm relying on the
Visual C++ documentation, which states that interlocked functions will
"ensure that previous read and write requests have completed and are made
visible to other processors, and to ensure that that no subsequent read or
write requests have started", effectively avoiding a multiprocessor race
condition.

However, if I was that confident I wouldn't have posted the original
question, right... so what have I misinterpreted and got wrong?

Thanks,
- Lee


Lee Chapman

unread,
May 16, 2003, 10:04:42 AM5/16/03
to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3EC4ECA4...@web.de...

>
> > Okay... "pthread_once()" is something to do with the POSIX standard,
> > right?
>
> Right. pthreads-win32**, for example, "provides" one for win32.
> It's also totally broken (no one notices it, so it's fine ;-) ).
>
> > If so, what relevance does it have to my Win32 implementation?
>
> The relevance is that pthread_once() is a MECHANISM to perform
> thread safe lazy initialization. [...]

Okay, now I'm with you. I'm using the standard MS Win32 threads on W2K, so I
can't use pthread_once() even if I wanted to (and it sounds like I don't).

> > If the solution is totally broken under Win32, can you give me some
> > hints as to why?
>
> Race conditions. Dying "spinlocks". Lack of memory synchronization.

I've picked up some of these points in another reply, so I won't ask about
them here as well.

- Lee


Alexander Terekhov

unread,
May 16, 2003, 10:11:32 AM5/16/03
to

Lee Chapman wrote:
>
> "Alexander Terekhov" <tere...@web.de> wrote in message
> news:3EC4D860...@web.de...
> >
> > atomic<const stuff*> stuff_ptr;
> >
>
> What's the definition of the atomic<> template?

I have yet to write it. I'm currently in "gathering info" stage.
You'll have to wait. You can begin with reading this:

http://www.terekhov.de/pthread_refcount_t/draft-edits.txt

and try to deduct the semantics from a non-blocking implementation
of pthread_refcount_t based on atomic<> that can be found at
<http://tinyurl.com/bwkj> and info at <http://tinyurl.com/bx6u>.

regards,
alexander.

Alexander Terekhov

unread,
May 19, 2003, 2:41:46 AM5/19/03
to

"Oliver S." wrote:
[...]
> There are some bugs in your class I won't comment, but I re-implemented
> the class in two flavours so that they should work (mistakes you're able
> to fix for sure are still possible, but the general principle is solid):

While your "re-implemented class in the two flavours" is a bit less
broken than the OP-stuff, it's still totally broken, I'm afraid. The
general principles that you've demonstrated aren't solid at all. Your
"first flavor" is meant to be something along the lines of (variant
using TSD instead of atomic<> aside for a moment):

class stuff { /* ... */ };

class thing {
public:
thing(/* ... */) : stuff_ptr(0) /* ... */ { /*...*/ }
~thing { delete stuff_ptr.load(); /* ... */ }
/* ... */
const stuff & stuff_instance();
/* ... */
private:
/* ... */
atomic<const stuff*> stuff_ptr;

mutex stuff_mtx;
/* ... */
};

const stuff & thing::stuff_instance() { // "lazy" one
stuff * ptr;
// hoist load barrier (with data dependency "hint")
if (0 == (ptr = stuff_ptr.load_ddhlb())) {

mutex::guard guard(stuff_mtx);
if (0 == (ptr = stuff_ptr.load())) {

ptr = new stuff(/*...*/);
// sink store barrier

stuff_ptr.store_ssb(ptr);
}
}
return *ptr;
}

Well, as for your "second flavor", you could have found this:

http://groups.google.com/groups?selm=3EC4D860.BC32E880%40web.de

regards,
alexander.

SenderX

unread,
May 19, 2003, 3:42:28 AM5/19/03
to
This pseudo-code seems like it sould work to instance something for an
IA-32:


/* Shared pointer to a C_Object */
typedef union U_Ptr
{
unsigned __int64 Value64;

struct
{
C_Object *pObj;
LONG lAba;
};

} PTR, *LPPTR;


static C_Object *pSharedPtr = NULL;


/* Instance the shared pointer */
C_Object& Instance()
{
C_Object pLocalPtr, pOldPtr;

/* Read shared pointer */
pLocalPtr
= InterlockedCompareExchangePointer
( &pSharedPtr,
NULL,
NULL );

__asm { mfence };

/* CAS returns the old value, so if it is NULL
another thread has not instanced */
if ( pLocalPtr == NULL )
{
pLocalPtr = new C_Object;

/* Try and update shared pointer */
pOldPtr
= InterlockedCompareExchangePointer
( &pSharedPtr,
pLocalPtr,
NULL );

__asm { mfence };

/* CAS returns the old value, so if it not NULL
another thread instanced */
if ( pOldPtr != NULL )
{
delete pLocalPtr;

return *pOldPtr;
}
}

return *pLocalPtr;
}

I know Alex will say its wrong, but please tell us EXACTLY where and why?

=)

--
The designer of the SMP and HyperThread friendly, AppCore library.

http://AppCore.home.attbi.com


Alexander Terekhov

unread,
May 19, 2003, 4:04:56 AM5/19/03
to

SenderX wrote:
[...]

> I know Alex will say its wrong, but please tell us EXACTLY where and why?

Neither "__asm { mfence };" nor totally braindamaged MS-interlocked
stuff define anything specific with respect to memory access reordering
done by the COMPILER (which, absent reordering constraints, ought to
operate on "as if"-compiling-single-threaded/thread-neutral-code basis).

regards,
alexander.

SenderX

unread,
May 19, 2003, 4:19:45 AM5/19/03
to
> Neither "__asm { mfence };" nor totally braindamaged MS-interlocked
> stuff define anything specific with respect to memory access reordering
> done by the COMPILER

What about writing the lock-free instance code in pure asm?

Alexander Terekhov

unread,
May 19, 2003, 4:47:32 AM5/19/03
to

SenderX wrote:
>
> > Neither "__asm { mfence };" nor totally braindamaged MS-interlocked
> > stuff define anything specific with respect to memory access reordering
> > done by the COMPILER
>
> What about writing the lock-free instance code in pure asm?

and a "full-stop" compiler barrier ala gcc's "memory"/"volatile"**
[also braindead; to some extent] stuff?

Well, y'know, I just hate "asm" (both "pure" and not-so-"pure"). YMMV.

regards,
alexander.

**) "If your assembler instruction modifies memory in an unpredictable
fashion, add /memory/ to the list of clobbered registers. This will
cause GCC to not keep memory values cached in registers across the
assembler instruction. You will also want to add the volatile keyword
if the memory affected is not listed in the inputs or outputs of the
asm, as the memory clobber does not count as a side-effect of the
asm."

SenderX

unread,
May 19, 2003, 6:15:58 AM5/19/03
to
Well, this has to work then ;)


volatile C_Object *pSharedPtr = NULL;


/* Instance the shared pointer */
C_Object& Instance()
{

volatile C_Object *pLocalPtr, pOldPtr;

/* Read shared pointer */
pLocalPtr
= InterlockedCompareExchangePointer
( &pSharedPtr,
NULL,
NULL );

__asm { mfence };

/* CAS returns the old value, so if it is NULL
another thread has not instanced */
if ( pLocalPtr == NULL )
{
pLocalPtr = new C_Object;

/* Try and update shared pointer */
pOldPtr
= InterlockedCompareExchangePointer
( &pSharedPtr,
pLocalPtr,
NULL );

__asm { mfence };

/* CAS returns the old value, so if it not NULL
another thread instanced */
if ( pOldPtr != NULL )
{
delete pLocalPtr;

return *pOldPtr;
}
}

return *pLocalPtr;
}

Why won't this work, and what will break it?

A weak memory model, the compiler... Or both?

Alexander Terekhov

unread,
May 19, 2003, 7:02:10 AM5/19/03
to

SenderX wrote:

[... volatile/Intel-mfence/MS-interlocked ...]

> Why won't this work, and what will break it?
>
> A weak memory model, the compiler... Or both?

Yeah, both: Intel and Microsoft. ;-) Seriously, look, brain-
dead volatile has an implementation defined semantics ["What
constitutes an access to an object that has volatile-qualified
type is implementation-defined"]. The NewHP (the old Digital)
uses it to fight word tearing, for example. Show me some
statements in the MS docs that would guarantee the semantics
that you need here and I'll concede that it "might work". ;-)
(but it's still braindamaged because what's needed here is the
upcoming atomic<>/threads_specific_ptr<>, extended "dynamic"
pthread_once() aside for a moment).

regards,
alexander.

Lee Chapman

unread,
May 19, 2003, 8:35:15 AM5/19/03
to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3EC89028...@web.de...

Okay, so as I'm using MS VC++ 7.0 I can put #pragma optimize("", off) before
my class definition; this will prevent the compiler reordering my C++
statements, and so should fix my original implementation?

- Lee


Alexander Terekhov

unread,
May 19, 2003, 10:03:13 AM5/19/03
to

Maybe (but not the original one, that's for sure).

regards,
alexander.

--
The information on this page is for reference only and is not to be
construed as providing binding legal advice. Consult a lawyer if you
have questions.

Alexander Terekhov

unread,
May 30, 2003, 3:47:50 PM5/30/03
to

"Oliver S." wrote:
>
> > While your "re-implemented class in the two flavours" is a bit less
> > broken than the OP-stuff, it's still totally broken, ...
>
> Where is it broken ?

SenderX, would you please explain (but wait with your reply no less
than two weeks ;-) ).

> In it's general principle it's 100% correct.

Dream on.

regards,
alexander.

Alexander Terekhov

unread,
May 31, 2003, 10:12:54 AM5/31/03
to

"Oliver S." wrote:
>
> >> In it's general principle it's 100% correct.
>
> > Dream on.
>
> Typical troll-response for someone who hasn't any arguments.

If you need arguments then go and {re-}read the entire thread, comrade.

regards,
alexander.

P.S. Ziv Caspi might also help you (trivial races in your "100% correct"
solution aside, he knows MS-interlocked braindamage quite well).

SenderX

unread,
May 31, 2003, 1:30:32 PM5/31/03
to
> SenderX, would you please explain (but wait with your reply no less
> than two weeks ;-) ).

I can't seem to find your " two-flavors " in google. But, I would trust
Alex, and say their broken...

However Oliver, I did get Alex to admit the the following " might " work:

Lock-Free Instance Code for...


Compiler: VC++ 6.0

Compiler docs volatile quote:

" Objects declared as volatile are not used in optimizations because their
value can change at any time. The system always reads the current value of a
volatile object at the point it is requested, even if the previous
instruction asked for a value from the same object. Also, the value of the
object is written immediately on assignment. "


Compiler docs pragma( optimize ) quote:

" Using the optimize pragma with the empty string ("") is a special form of
the directive. It either turns off all optimizations or restores them to
their original (or default) settings. "

The Code: ;)


/* The shared pointer */
volatile C_Object *pSharedObj = NULL;


/* Turn off compiler optimizer */
#pragma optimize( "", off )


/* Instance the object */
C_Object&

InstanceObject()
{
/* Load the shared object */
volatile *pLocalPtr
= InterlockedCompareExchangePointer
( &pSharedObj,
NULL,
NULL );

/* Acquire the load */
__asm { lfence };

/* If NULL, there is no object. */


if ( pLocalPtr == NULL )
{

/* Create a new object */
volatile *pNewPtr = new C_Object;

/* Load the shared object again,
and try and store the new value */
pLocalPtr =
InterlockedCompareExchangePointer
( &pSharedObj,
pNewPtr,
NULL );

/* Release the store, and acquire the load */
__asm { mfence };

/* If NULL, the object has been updated */


if ( pLocalPtr == NULL )
{

return *pNewPtr;
}
}

/* We got it! */
return *pLocalPtr;
}


/* Turn on compiler optimizer */
#pragma optimize( "", on )


This should ONLY work for the VC++ 6.0 compiler, and maybe higher.


Your comments?


;)

SenderX

unread,
May 31, 2003, 6:35:29 PM5/31/03
to
> Forget the fence-instruction;
> it's used for weakly-ordered memory-access instructions.

You will most likely need memory barriers for the sample code I posted to
work on weak-order processors, like an Itanium II.

volatiles are for compiler orders, fences are for instruction orders.

Correct?

David Schwartz

unread,
May 31, 2003, 8:09:55 PM5/31/03
to

"SenderX" <x...@xxx.xxx> wrote in message
news:R6aCa.1085689$S_4.1097003@rwcrnsc53...

> > Forget the fence-instruction;
> > it's used for weakly-ordered memory-access instructions.
>
> You will most likely need memory barriers for the sample code I posted to
> work on weak-order processors, like an Itanium II.
>
> volatiles are for compiler orders, fences are for instruction orders.
>
> Correct?

The documentation for VC++ says:

"Objects declared as volatile are not used in optimizations because their
value can change at any time. The system always reads the current value of a
volatile object at the point it is requested, even if the previous
instruction asked for a value from the same object. Also, the value of the
object is written immediately on assignment."

This, at least to me, implies that 'volatile' must defeat weak ordering
with the appropriate fences. It says it's read "at the point it is
requested" and it says it's written "immediately on assignment". If this
allows weak ordering, then the documentation is incorrect or, at best,
highly misleading.

DS


SenderX

unread,
May 31, 2003, 8:35:02 PM5/31/03
to
> This, at least to me, implies that 'volatile' must defeat weak
ordering
> with the appropriate fences. It says it's read "at the point it is
> requested" and it says it's written "immediately on assignment". If this
> allows weak ordering, then the documentation is incorrect or, at best,
> highly misleading.

volatile doesn't stop the processor from reordering the instructions, it
only stops the compiler.

You will need fences for this on weak-order systems, even with volatile
vars.

David Schwartz

unread,
May 31, 2003, 11:57:27 PM5/31/03
to

"SenderX" <x...@xxx.xxx> wrote in message
news:WSbCa.528169$Si4.4...@rwcrnsc51.ops.asp.att.net...

> > This, at least to me, implies that 'volatile' must defeat weak
> > ordering
> > with the appropriate fences. It says it's read "at the point it is
> > requested" and it says it's written "immediately on assignment". If this
> > allows weak ordering, then the documentation is incorrect or, at best,
> > highly misleading.

> volatile doesn't stop the processor from reordering the instructions, it
> only stops the compiler.

Are you saying it can't or that it doesn't? It certainly can -- the
compiler could, for example, wrap accesses to volatile variables in the
appropriate fences.

> You will need fences for this on weak-order systems, even with volatile
> vars.

Then the documentation is lying, which doesn't surprise me.

DS


SenderX

unread,
Jun 1, 2003, 12:23:38 AM6/1/03
to
> It certainly can -- the
> compiler could, for example, wrap accesses to volatile variables in the
> appropriate fences.

Your compiler would have to know which processor it was building for, and
put the correct barrier opcodes in order to do that.

I as the programmer, I want to be in control of the fence instructions.

> Then the documentation is lying, which doesn't surprise me.

The docs are not lying at all.

They ONLY talk about what the compiler is going to do. Period.


It doesn't talk about what an Itanium II processor is going to do with those
instructions.

David Schwartz

unread,
Jun 1, 2003, 12:49:51 AM6/1/03
to

"SenderX" <x...@xxx.xxx> wrote in message
news:edfCa.2083$DV....@rwcrnsc52.ops.asp.att.net...

> > It certainly can -- the
> > compiler could, for example, wrap accesses to volatile variables in the
> > appropriate fences.

> Your compiler would have to know which processor it was building for, and
> put the correct barrier opcodes in order to do that.

If the compiler doesn't know what processor it's building for, it has a
*very* serious problem! The compiler generates assembler output, so it had
better know all the semantics of the assembly language it's targetting!

> I as the programmer, I want to be in control of the fence instructions.

Then don't ask the compiler to provide you ordering guarantees. The
documentation claims 'volatile' provides such guarantees. (But I think it's
in error.)

> > Then the documentation is lying, which doesn't surprise me.

> The docs are not lying at all.
>
> They ONLY talk about what the compiler is going to do. Period.
>
> It doesn't talk about what an Itanium II processor is going to do with
those
> instructions.

There is no such distinction. When you write C code, you are writing for
the compiler. You aren't supposed to have to care what the processor does,
you're supposed to be able to rely upon the compiler to make the processor
do the right thing.

DS


SenderX

unread,
Jun 1, 2003, 2:57:46 AM6/1/03
to
> There is no such distinction. When you write C code, you are writing
for
> the compiler. You aren't supposed to have to care what the processor does,
> you're supposed to be able to rely upon the compiler to make the processor
> do the right thing.

I do care what the processor does, I have to.

You sure you know what your talking about?

SenderX

unread,
Jun 1, 2003, 5:49:45 AM6/1/03
to
> If the compiler doesn't know what processor it's building for, it has
a
> *very* serious problem!

It needs to know the instruction sets it can compile, then the processor.

Although...

Itanium processors say they rely heavily on compilers written specifically
to take care of stuff that the previous Intel chips have taken care of.

Look through Intel's site for this info.

I am not sure if Itanium processors will accept the current SSE or SSE2
opcodes.

By the way ( to clarify ), your saying that VC++ will add fences for the
following pseudo code:

volatile LPVOID *pSharedPtr;

volatile LPVOID *pLocalPtr;

load:

pLocalPtr = pSharedPtr;


store:

pSharedPtr = pLocalPtr;

and change it to this:

load:

pLocalPtr = pSharedPtr;

__asm { lfence };


store:

pSharedPtr = pLocalPtr;

__asm { sfence };

???


I don't think a compiler would do this?

David Schwartz

unread,
Jun 1, 2003, 1:20:51 PM6/1/03
to

"SenderX" <x...@xxx.xxx> wrote in message
news:Z_jCa.532679$Si4.4...@rwcrnsc51.ops.asp.att.net...

> By the way ( to clarify ), your saying that VC++ will add fences for the
> following pseudo code:

> volatile LPVOID *pSharedPtr;
> volatile LPVOID *pLocalPtr;
> load:
> pLocalPtr = pSharedPtr;
> store:
> pSharedPtr = pLocalPtr;
>
> and change it to this:
>
> load:
> pLocalPtr = pSharedPtr;
> __asm { lfence };
> store:
> pSharedPtr = pLocalPtr;
> __asm { sfence };

> I don't think a compiler would do this?

The documentation specifically claims that 'volatile' localizes the
actual variable access to the code that access it. I pasted you the section.
If that's the only way to provide those guarantees, then the compiler had
better do that. Otherwise, either the compiler or the documentation is
broken.


DS


David Schwartz

unread,
Jun 1, 2003, 1:17:53 PM6/1/03
to

"SenderX" <x...@xxx.xxx> wrote in message
news:KthCa.1088771$S_4.1100152@rwcrnsc53...

> > There is no such distinction. When you write C code, you are writing
> > for
> > the compiler. You aren't supposed to have to care what the processor
does,
> > you're supposed to be able to rely upon the compiler to make the
processor
> > do the right thing.

> I do care what the processor does, I have to.

You are certainly *allowed* to care what the processor does. But if you
*have* to, then something is wrong.

> You sure you know what your talking about?

Yes, I am most certainly sure what I'm talking about.

If the compiler documentation says that 'volatile' does something, then
the compiler had better emit whatever assembler code is necessary to do that
something. If the compiler could emit code to do what the documentation says
but does not, then either the documentation or the compiler is broken.

The documentation says that 'volatile' provides ordering guarantees. If
the compiler does not emit the necessary assembly instructions to comply
with those guarantees on any processor the compiler claims to support, then
either the documentation or the compiler is broken.

DS


SenderX

unread,
Jun 2, 2003, 12:58:25 AM6/2/03
to
I think you are totally confusing compiler orderings, with processor
orderings.

I think a java or other high-level language compiler, has to put auto
acquire / release on volatile, but not a C compiler.

If my VC++ 6.0 stuck fences opcodes in my code, I would be pissed.

;)

SenderX

unread,
Jun 2, 2003, 1:02:26 AM6/2/03
to
> The documentation specifically claims that 'volatile' localizes the
> actual variable access to the code that access it. I pasted you the
section.

You quoted the section to me?

More like, I quoted the section on volatile and pragma( optimize ) for
Oliver S. when I posted the lock-free instance once code. Which started this
discussion in the first place:

http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=bbdc
lk%24clm%241%40nntp.webmaster.com&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-
8%26oe%3DUTF-8%26group%3Dcomp.programming.threads

I know this is very knit-picky, but I was a little curious.

;)

Alexander Terekhov

unread,
Jun 2, 2003, 2:37:26 AM6/2/03
to

David Schwartz wrote:
[...]

> The documentation says that 'volatile' provides ordering guarantees. If
> the compiler does not emit the necessary assembly instructions to comply
> with those guarantees on any processor the compiler claims to support, then
> either the documentation or the compiler is broken.

The only thing that is really broken here is C/C++ volatile. What you're
talking about is nothing but revised Java (or C#) volatiles that provide
certain atomicity and ordering guarantees. I don't like it. More info on
this can be found (follow the links ;-) ) in the following msg of mine:

http://groups.google.com/groups?selm=3ED8AC69.A75E9B4F%40web.de
(Subject: Re: What is a "POD" type?)

regards,
alexander.

Ziv Caspi

unread,
Jun 2, 2003, 5:32:48 AM6/2/03
to

Neither an explicit memory fence nor a call to library functions (in
our case, InterlockedWhatever) will deny the compiler the "right" to
reorder memory accessed. So far so true.

However, the language already provides another mechanism for doing
just that -- the volatile access modifier. By combining (say)
interlocked operations (which indicate volatile access to their
"active" argument, *and* have an implicit barrier semantic) and other
volatile variables/accesses you can get what you need.

(Thus, if you write:
volatile LONG x,y; IntelockedExchange( &x, 0 ); y = 0;
and use a similar barrier/volatile combination when reading x/y,
you're safe.)

For example, one could implement a Win32 equivalent to pthread_once in
this manner.

Ziv.

SenderX

unread,
Jun 2, 2003, 4:59:26 AM6/2/03
to
> (Thus, if you write:
> volatile LONG x,y; IntelockedExchange( &x, 0 ); y = 0;
> and use a similar barrier/volatile combination when reading x/y,
> you're safe.)
>
> For example, one could implement a Win32 equivalent to pthread_once in
> this manner.

Like the sample I posted:

http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3EDA
F0A6.ADE3F213%40web.de&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUT
F-8%26group%3Dcomp.programming.threads

I think this should work for VC++ 6.0?

If not, where is it broken?

Alexander Terekhov

unread,
Jun 2, 2003, 6:33:27 AM6/2/03
to

Ziv Caspi wrote:
[...]

> >Neither "__asm { mfence };" nor totally braindamaged MS-interlocked
> >stuff define anything specific with respect to memory access reordering
> >done by the COMPILER (which, absent reordering constraints, ought to
> >operate on "as if"-compiling-single-threaded/thread-neutral-code basis).
>
> Neither an explicit memory fence nor a call to library functions (in
> our case, InterlockedWhatever) will deny the compiler the "right" to
> reorder memory accessed. So far so true.
>
> However, the language already provides another mechanism for doing
> just that -- the volatile access modifier.

Lack of atomicity aside for a moment, that would be true IFF *ALL* your
C/C++ objects were designated as volatile to preclude any reordering
(e.g. speculative loading of non-volatile objects) by the compiler, to
begin with (hardware barriers and "what constitutes an access to an
object that has volatile-qualified type is implementation-defined" bit
aside for a moment). C/C++ volatiles are brain-dead and revised Java
(and C#) ones aren't really the best way to do it in C/C++. atomic<>
template (and macros for plain C) ala the upcoming (see C++ TR) <iohw.h>
and <hardware> would solve the problem in much more efficient and
"elegant" way, so to speak. *All* current uses of C/C+ volatiles shall
be "deprecated" in favor of:

a) atomic<> for low level non-blocking stuff with mem.sync.,

b) exceptions that shall replace setjmp/longjmp silliness (I mean
volatile auto stuff "that could be modified between the two
return from setjmp()"),

c) threads (SIGEV_THREAD delivery and sig{timed}wait{info}())
that shall replace async.signals and volatile sig_atomic_t
static ugliness.

> By combining (say)
> interlocked operations (which indicate volatile access to their
> "active" argument, *and* have an implicit barrier semantic) and other
> volatile variables/accesses you can get what you need.

Dream on.

regards,
alexander.

SenderX

unread,
Jun 2, 2003, 7:00:30 AM6/2/03
to
> MS-interlocked braindamage quite well).

I assume your atomic<> template can do anything the Interlocked API's can
do, but in a non-brain damaged way. Your atomic<> provides a function that
can compare and swap long's correct?

=)

If so... I really want to use your atomic<> template for my upcoming
portable lock-free API library, using my brand new algo which I recently
posted. It is working out just great.

I believe your library ( standards ) could make my lock-free stuff very port
very well.

Where can we download your library? Is it even ready to be released yet?

Alexander Terekhov

unread,
Jun 2, 2003, 8:20:39 AM6/2/03
to

SenderX wrote:
>
> > MS-interlocked braindamage quite well).
>
> I assume your atomic<> template can do anything the Interlocked API's can
> do, but in a non-brain damaged way.

Yeah, that's the intent.

> Your atomic<> provides a function that
> can compare and swap long's correct?

It certainly provides std::numeric_limits<>-like specializations
with "bool attempt_update(T old_value, T new_value)" (or something
like that) member function for the scalar types (T) that can be
manipulated atomically. I'm not sure this is what you call "swap
long's". Well, you could certainly build yourself something like
"template<typename T> T get_and_set(atomic<T>&, T new_value);"
that would return previous [old] value, but this is probably not
what you need, correct?

[...]


> Where can we download your library? Is it even ready to be released yet?

Nope. "follow the links"

http://groups.google.com/groups?selm=3ED90AE4.A211756B%40web.de
(Subject: Re: shared_ptr/weak_ptr and thread-safety)

regards,
alexander.

SenderX

unread,
Jun 2, 2003, 9:13:18 AM6/2/03
to
> Well, you could certainly build yourself something like
> "template<typename T> T get_and_set(atomic<T>&, T new_value);"
> that would return previous [old] value, but this is probably not
> what you need, correct?

bool attempt_update(T old_value, T new_value);

Does old_value get compared to the dest value, and if they match the dest
value is updated with the new_value?

If so, then I could use that.

Just to clarify...

I need this " EXACT " functionality, in order for it to work:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas
e/interlockedcompareexchange.asp


So, I would need something like this:

long lDest = 0;

atomic< long > AtomicLong( &lDest );

long lOldValue;

/* Atomic update */
do
{
lOldValue = AtomicLong.read_with_acquire();
}

while( ! AtomicLong.attempt_update_with_release
( lOldValue,
lOldValue + 1 ) );


Can your lib do that?


It would be very nice if your library would all me to code a single atomic
base for my new lock-free algo system that I built. I think it will probably
workout.


=)


--
The designer of the SMP and HyperThread friendly, AppCore library.

http://AppCore.home.attbi.com


"Alexander Terekhov" <tere...@web.de> wrote in message

news:3EDB411...@web.de...

Ziv Caspi

unread,
Jun 2, 2003, 11:19:32 AM6/2/03
to
On Sat, 31 May 2003 17:09:55 -0700, "David Schwartz"
<dav...@webmaster.com> wrote:

> The documentation for VC++ says:
>
>"Objects declared as volatile are not used in optimizations because their
>value can change at any time. The system always reads the current value of a
>volatile object at the point it is requested, even if the previous
>instruction asked for a value from the same object. Also, the value of the
>object is written immediately on assignment."
>
> This, at least to me, implies that 'volatile' must defeat weak ordering
>with the appropriate fences.

I've read this section a couple of times, and still don't understand
how your conclusion follows. What I infer from the quote above (as
well as some experience in the area, plus some discussions with the
people who built the compiler) is that when you use volatile, you
essentially tell the compiler "do what I say" -- when you set a value
to a volatile object, the compiler generates the store instruction in
the same program location you asked for.

>It says it's read "at the point it is
>requested" and it says it's written "immediately on assignment". If this
>allows weak ordering, then the documentation is incorrect or, at best,
>highly misleading.

The model you seem to be reasoning along is that of a single
synchronization point (probably the RAM) which all reads and writes
are measured against. This model is not followed by modern processors
for some time now, and is being replaced by cross-processor
observability guarantees (if you put a fence between writing to a and
writing to b on processor A, and you have a fence when reading them on
processor B, then you're guaranteed to read this value if you read
that value).

The compiler here is, in general, unable to help further. Compilers
usually target a variety of processor models (Pentium, P-whatever...)
and generate code that works on them all. Furthermore, the compiler
doesn't know when it creates programs whether they are run on
multiprocessor machines. As added bonus, I believe some processors can
switch their ordering behavior depending on the area of memory you
access. All these are in the domain of the platform, not the compiler.
If the compiler were to generate "always working" code, you wouldn't
want to use it anyway.

Ziv

Lee Chapman

unread,
Jun 2, 2003, 10:42:05 AM6/2/03
to
> > The documentation for VC++ says:
> >

My eventual intepretation was:

a) 'volatile' can be used to prevent the compiler from reordering reads &
writes;

b) the Win32 Interlocked functions can be used to stop the hardware from
effectively reordering reads & writes on a multi-processor machine. (The MS
documentation states that the OS's implementation of these Interlocked
functions contains the neccesssary memory barriers et. al.)

i.e. You need both.

I appreciate from the contributors to this thread that there are many subtle
problems that can arise in the general case, but with VC++ 7.0, W2K and a
quad Pentium box, it works in practice, and that's all I care about. :)

- Lee


SenderX

unread,
Jun 2, 2003, 10:50:08 AM6/2/03
to
> b) the Win32 Interlocked functions can be used to stop the hardware from
> effectively reordering reads & writes on a multi-processor machine.

Yes, or you can use the target processors acquire / release barrier opcodes
directly.

> i.e. You need both.

Yes, interlocked and/or memory barriers.

> I appreciate from the contributors to this thread that there are many
subtle
> problems that can arise in the general case, but with VC++ 7.0, W2K and a
> quad Pentium box, it works in practice, and that's all I care about. :)

Works with VC++ 6.0 as well.

;)

David Schwartz

unread,
Jun 2, 2003, 3:03:10 PM6/2/03
to

"SenderX" <x...@xxx.xxx> wrote in message
news:CTACa.16331$DV.2...@rwcrnsc52.ops.asp.att.net...

> > The documentation specifically claims that 'volatile' localizes the
> > actual variable access to the code that access it. I pasted you the
> section.
>
> You quoted the section to me?
>
> More like, I quoted the section on volatile and pragma( optimize ) for
> Oliver S. when I posted the lock-free instance once code. Which started
this
> discussion in the first place:

I quoted this section to you:

"Objects declared as volatile are not used in optimizations because their
value can change at any time. The system always reads the current value of a
volatile object at the point it is requested, even if the previous
instruction asked for a value from the same object. Also, the value of the
object is written immediately on assignment."

Please explain to me how you can control when the current value of an
object is read without using any fences.

DS


David Schwartz

unread,
Jun 2, 2003, 3:13:21 PM6/2/03
to

"Ziv Caspi" <zi...@netvision.net.il> wrote in message
news:3edb1fca.2111780319@newsvr...

> On Sat, 31 May 2003 17:09:55 -0700, "David Schwartz"
> <dav...@webmaster.com> wrote:

> > The documentation for VC++ says:

> >"Objects declared as volatile are not used in optimizations because their
> >value can change at any time. The system always reads the current value
>>of a
> >volatile object at the point it is requested, even if the previous
> >instruction asked for a value from the same object. Also, the value of
>>the
> >object is written immediately on assignment."

> > This, at least to me, implies that 'volatile' must defeat weak
ordering
> >with the appropriate fences.

> I've read this section a couple of times, and still don't understand
> how your conclusion follows. What I infer from the quote above (as
> well as some experience in the area, plus some discussions with the
> people who built the compiler) is that when you use volatile, you
> essentially tell the compiler "do what I say" -- when you set a value
> to a volatile object, the compiler generates the store instruction in
> the same program location you asked for.

It says that "the value will be read" at the point it is requested. It
deosn't say that a read instruction will be emitted. Nor could it, because
it's high-level language documentation for a compiler. Such documentation
can't talk about where assembly instructions are emitted because that would
inappropriately cross domain boundaries.

> >It says it's read "at the point it is
> >requested" and it says it's written "immediately on assignment". If this
> >allows weak ordering, then the documentation is incorrect or, at best,
> >highly misleading.

> The model you seem to be reasoning along is that of a single
> synchronization point (probably the RAM) which all reads and writes
> are measured against. This model is not followed by modern processors
> for some time now, and is being replaced by cross-processor
> observability guarantees (if you put a fence between writing to a and
> writing to b on processor A, and you have a fence when reading them on
> processor B, then you're guaranteed to read this value if you read
> that value).

Then explain to me what the documentation could mean when it says "The


system always reads the current value of a volatile object at the point it
is requested, even if the previous instruction asked for a value from the

same object". If you mean that:

volatile int *a;
int i, j;
i=*a;
j=*a;

If you mean that the read for 'j=*a' could occur before the read for
'i=*a' then the documentation doesn't mean anything. Maybe you with your
secret decode ring can see that it's really about the ordering of assembly
instructions rather than about the ordering of reads, but that's sure as
hell not what it *says*.

> The compiler here is, in general, unable to help further. Compilers
> usually target a variety of processor models (Pentium, P-whatever...)
> and generate code that works on them all. Furthermore, the compiler
> doesn't know when it creates programs whether they are run on
> multiprocessor machines. As added bonus, I believe some processors can
> switch their ordering behavior depending on the area of memory you
> access. All these are in the domain of the platform, not the compiler.

Right, and it's the compiler's job to make sure I don't have to think
about that kind of thing, all I'm supposed to do is rely on the guarantees
the compiler gives me.

> If the compiler were to generate "always working" code, you wouldn't
> want to use it anyway.

I'm not sure what you mean by this. In general, I do want my compiler to
generate code that "always works" on every platform/configuration the
compiler claims to support.

DS


SenderX

unread,
Jun 2, 2003, 3:39:16 PM6/2/03
to
> I quoted this section to you:

I quoted it to Oliver S.

You must not have read it then.

SenderX

unread,
Jun 2, 2003, 3:40:52 PM6/2/03
to
You are confused on this issue.

David Schwartz

unread,
Jun 2, 2003, 3:51:00 PM6/2/03
to

"SenderX" <x...@xxx.xxx> wrote in message
news:8LNCa.1108875$S_4.1122132@rwcrnsc53...

> You are confused on this issue.

Then straighten me out.

Do we agree that the documentation says: "The system always reads the


current value of a volatile object at the point it is requested, even if the
previous instruction asked for a value from the same object".

And do we agree that if the compiler documentation makes a particular
guarantee or claim, the compiler should emit whatever assembly instruction
sequences it takes to make that guarantee on every process the compiler
supports?

If you disagree with either of the two things I'm saying above, then we
can discuss those things. Otherwise, with those things agreed upon, my
conclusion follows immediately.

DS


Alexander Terekhov

unread,
Jun 2, 2003, 4:46:59 PM6/2/03
to

David Schwartz wrote:
>
> "SenderX" <x...@xxx.xxx> wrote in message
> news:8LNCa.1108875$S_4.1122132@rwcrnsc53...
>
> > You are confused on this issue.
>
> Then straighten me out.
>
> Do we agree that the documentation says: "The system always reads the
> current value of a volatile object at the point it is requested, even if the
> previous instruction asked for a value from the same object".

Yes, but it's no more "current" than the value you write/read to
some stdio file... with the only difference that C/C++ volatiles
aren't synchronized ala {brain-dead} stdio files with their
implicit recursive locking scheme -- "strong" thread-safety that,
IMO, POSIX mandates rather *foolishly* (instead of providing much
more *efficient and sufficient* "basic" thread-safety guarantee
for ALL stdio operations on streams... not only *_unlocked() for
characters).

>
> And do we agree that if the compiler documentation makes a particular
> guarantee or claim, the compiler should emit whatever assembly instruction
> sequences it takes to make that guarantee on every process the compiler
> supports?

C/C++ volatiles are single-threaded beasts. The only case that
kinda "covers" asynchrony (and atomicity) is "static volatile
sig_atomic_t"; but it's also an "unsafe" thread-safety thing; it
doesn't ensure visibility across threads [see 4.10 rules; they
don't make any exceptions for static volatile sig_atomic_t's...
and you can't really synchronize it because that would require
async-signal-safe pthread calls, to begin with].

regards,
alexander.

David Schwartz

unread,
Jun 2, 2003, 5:32:45 PM6/2/03
to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3EDBB7C3...@web.de...

> > And do we agree that if the compiler documentation makes a particular
> > guarantee or claim, the compiler should emit whatever assembly
instruction
> > sequences it takes to make that guarantee on every process the compiler
> > supports?

> C/C++ volatiles are single-threaded beasts.

If this were any situation other than VC++ on WIN32, I'd agree with you.
However, in the special case of Microsoft Visual C++ on WIN32, you should be
able to assume that everything is talking about the multithreaded case
unless they say otherwise.

With GCC, for example, the documentation was written without
multithreading in mind. So if multithreading is an exception to something in
the documentation, you wouldn't expect it to be noted there. But VC++ was
developed from the ground up with multithreading in mind. In fact, WIN32 was
developed from the ground up for multithreading. The documentation reflects
this.

In any event, this could be an issue even with single-threaded programs.
Memory can be shared across processes running concurrently on different CPUs
even by programs that aren't multi-threaded.

If 'volatile' doesn't do what it's documented to do, then the
documentation is erroneous.

DS


SenderX

unread,
Jun 2, 2003, 7:36:55 PM6/2/03
to
> If 'volatile' doesn't do what it's documented to do, then the
> documentation is erroneous.

Nope. Its not in error.

It only talks about what the compiler is going to do, not what the hardware
is going to do.

VC++ does not wrap volatile access with acquire / release fences.

David Schwartz

unread,
Jun 2, 2003, 8:02:29 PM6/2/03
to

"SenderX" <x...@xxx.xxx> wrote in message
news:rcRCa.1110427$S_4.1123860@rwcrnsc53...

> > If 'volatile' doesn't do what it's documented to do, then the
> > documentation is erroneous.

> Nope. Its not in error.

> It only talks about what the compiler is going to do, not what the
hardware
> is going to do.

Okay, this is just ridiculous. The documentation does not tell you what
the compiler does or what the hardware does, it tells you what the
'volatile' specifier does. It says that the volatile specifier ensures that
the load occurs where the statement is. It is the compiler's job to emit the
necessary assembly code to make the hardware do whatever it is the
compiler's documentation says the qualifier is going to do.

If the compiler does not emit the necessary assembly codes to make the
hardware do what the documentation says will happen, the either the
documentation is in error or the compiler is broken.

People trying to use *compilers* should not have to know anything about
the hardware unless they want to. If the documentation for a compiler says a
language construct will have a particular effect, then the compiler has to
make the hardware do that. End of story.

DS


SenderX

unread,
Jun 3, 2003, 3:07:25 AM6/3/03
to
> People trying to use *compilers* should not have to know anything
about
> the hardware unless they want to. If the documentation for a compiler says
a
> language construct will have a particular effect, then the compiler has to
> make the hardware do that. End of story.

lol.

You really don't have a clue.

VC++ does not wrap volatile var access, with acquire / release fences.


Look at the new Java memory model coming out or .NET, it does wrap volatile
with fences cause its a high-level lang.

C is NOT a high-level lang.


I want you to show me disassembled VC++ volatile vars access, that's wrapped
with fence opcodes.


You will not be able to do this, cause your totally wrong about C/C++
volatiles.

Momchil Velikov

unread,
Jun 3, 2003, 3:28:07 AM6/3/03
to
"David Schwartz" <dav...@webmaster.com> wrote in message news:<bbgoik$cbl$1...@nntp.webmaster.com>...

> "SenderX" <x...@xxx.xxx> wrote in message
> news:rcRCa.1110427$S_4.1123860@rwcrnsc53...
>
> > > If 'volatile' doesn't do what it's documented to do, then the
> > > documentation is erroneous.
>
> > Nope. Its not in error.
>
> > It only talks about what the compiler is going to do, not what the
> hardware
> > is going to do.
>
> Okay, this is just ridiculous. The documentation does not tell you what
> the compiler does or what the hardware does, it tells you what the
> 'volatile' specifier does. It says that the volatile specifier ensures that
> the load occurs where the statement is. It is the compiler's job to emit the
> necessary assembly code to make the hardware do whatever it is the
> compiler's documentation says the qualifier is going to do.

So, the documentation is incomplet and inkorrect. Hardly surprising.
Isn't it just clear that the documentation speaks about emitting read
instructions and load does occur, only that the value read is
"current" only in certain situations, i.e. in uniprocessor ?



> People trying to use *compilers* should not have to know anything about
> the hardware unless they want to. If the documentation for a compiler says a
> language construct will have a particular effect, then the compiler has to
> make the hardware do that. End of story.

Or the documentation be corrected, because it is not what the compiler
ACTUALLY DOES. Period.

~velco

Alexander Terekhov

unread,
Jun 3, 2003, 4:02:10 AM6/3/03
to

David Schwartz wrote:
[...]

> People trying to use *compilers* should not have to know anything about
> the hardware unless they want to. If the documentation for a compiler says a
> language construct will have a particular effect, then the compiler has to
> make the hardware do that. End of story.

DS, Microsoft simply doesn't have people that fully understand the issues
of memory ordering and synchronization. Here's an illustration:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/synchronization_and_multiprocessor_issues.asp
(Synchronization and Multiprocessor Issues)

regards,
alexander.

David Schwartz

unread,
Jun 3, 2003, 4:25:51 AM6/3/03
to

"Momchil Velikov" <ve...@fadata.bg> wrote in message
news:87bded37.03060...@posting.google.com...

> "David Schwartz" <dav...@webmaster.com> wrote in message
news:<bbgoik$cbl$1...@nntp.webmaster.com>...

> So, the documentation is incomplet and inkorrect. Hardly surprising.


> Isn't it just clear that the documentation speaks about emitting read
> instructions and load does occur, only that the value read is
> "current" only in certain situations, i.e. in uniprocessor ?

No, it's not clear. I shouldn't even have to know what weak ordering
*is* to understand what a qualifier in a high-level language does.

> > People trying to use *compilers* should not have to know anything
about
> > the hardware unless they want to. If the documentation for a compiler
says a
> > language construct will have a particular effect, then the compiler has
to
> > make the hardware do that. End of story.

> Or the documentation be corrected, because it is not what the compiler
> ACTUALLY DOES. Period.

That's my point. The documentation is in error. Defending it by saying
that the compiler is not responsible for what the hardware does it utter
nonsense.

DS


David Schwartz

unread,
Jun 3, 2003, 4:27:48 AM6/3/03
to

"SenderX" <x...@xxx.xxx> wrote in message
news:NOXCa.1113770$S_4.1148965@rwcrnsc53...

> > People trying to use *compilers* should not have to know anything
> > about
> > the hardware unless they want to. If the documentation for a compiler
says
> > a
> > language construct will have a particular effect, then the compiler has
to
> > make the hardware do that. End of story.

> lol.

> You really don't have a clue.

Then show me where i'm wrong.

> VC++ does not wrap volatile var access, with acquire / release fences.

I never said it did. I said the compiler said that it provided ordering
guarantees. How it does that is not what I'm talking about.

> I want you to show me disassembled VC++ volatile vars access, that's
wrapped
> with fence opcodes.

I never said VC++ wraps volatile variable accesses with fence opcodes. I
said the documentation claims that VC++ volatile variable accesses have
ordering guarantees. If the only way to do this is by emitting fence
opcodes, then if VC++ does not emit fence opcodes, it's not doing what the
documentation says it will do.

> You will not be able to do this, cause your totally wrong about C/C++
> volatiles.

I'm not wrong. I'm not making any claims about C/C++ volatiles. The VC++
documentation is.

DS


David Schwartz

unread,
Jun 3, 2003, 4:43:12 AM6/3/03
to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3EDC5602...@web.de...

> DS, Microsoft simply doesn't have people that fully understand the issues
> of memory ordering and synchronization. Here's an illustration:

You know, I think you hit the nail on the head.

DS


SenderX

unread,
Jun 3, 2003, 5:40:15 AM6/3/03
to
> No, it's not clear. I shouldn't even have to know what weak ordering
> *is* to understand what a qualifier in a high-level language does.

You better know what weak-ordering is if you plan to deploy on a non-TSO
multi-processor box!

The discussion is on C/C++ compilers, not high-level lang's anyway.

;)

SenderX

unread,
Jun 3, 2003, 5:41:00 AM6/3/03
to
> I'm not wrong. I'm not making any claims about C/C++ volatiles. The
VC++
> documentation is.

The docs assume uni-processor for mem access, and say that volatile will not
be used in compiler optimizations.

They are not wrong at all.

SenderX

unread,
Jun 3, 2003, 6:15:00 AM6/3/03
to
There using InterlockedExchange as a barrier, lol.

David Schwartz

unread,
Jun 3, 2003, 6:33:41 AM6/3/03
to

"SenderX" <x...@xxx.xxx> wrote in message
news:M2_Ca.556903$Si4.5...@rwcrnsc51.ops.asp.att.net...

> > I'm not wrong. I'm not making any claims about C/C++ volatiles. The
> > VC++
> > documentation is.

> The docs assume uni-processor for mem access, and say that volatile will
not
> be used in compiler optimizations.

Why do you say the docs assume uni-processor for mem access? What good
would compiler documentation be if it assumed particular hardware
configurations in the documentation of basic language structures?!

> They are not wrong at all.

If the compiler's documentation about a qualifier is correct for only
some hardware platforms that the compiler clamis to support, the
documentation is badly broken. I shouldn't have to think about different
hardware platforms when I write C code -- I should be able to rely upon the
guarantees the compiler's documentation provides on all supported hardware
platforms, processors, operating systems, and so on.

Why are you trying to make excuses for something that's so obviously
wrong?

DS


SenderX

unread,
Jun 3, 2003, 6:42:00 AM6/3/03
to
> I shouldn't have to think about different hardware platforms when I write
C code.

lol