Win32 Interlocked API (was: Is this thread-safe on multi-processor Win32)

David Schwartz

unread,

Jun 13, 2003, 3:11:14 AM6/13/03

to

"SenderX" <x...@xxx.xxx> wrote in message
news:KfeGa.664391$Si4.6...@rwcrnsc51.ops.asp.att.net...

You are contradicting yourself. First you say:

> The Interlocked functions are simple wrappers /w barriers, to the
processors
> atomic ops. Nothing fancy at all. Disassemble the interlocked functions,
> there super slim.

Then you say:

> The interlocked API and microsoft will handle that. Alex's atomic template
> will make atomic ops port to different OS's and compilers.

So which is it?

> Current C volatile will not save you from mem-ordering. If you think it
> will, prepare to crash a system. C/C++ volatile save you from compiler
> ordering.

You say that. But the documentation doesn't agree.

> VC++ doesn't insert fences on volatile. You should not intermingle threads
> with current C standard. The C std. only knows of 1 single lifetime
thread,
> and nothing about memory ordering.

Okay, now you're just totally not making sense. What the hell does the C
standard have to do with what the 'volatile' keyword does on VC++? The C
standard specifically says that what 'volatile' does is implementation
defined, and I've cited the implementation documentation to you.

> > It is unacceptable to me to write code that the processor
> > manufacturer specifically says may not work on future processors just
> > because "everybody else is doing it".

> The hardware is irrelevant when using the interlocked API's. All the
correct
> membars and atomic ops are ensured by the OS. If windows is running on a
> computer, the interlocked api will work.

Correct TO DO WHAT? Nowhere does the documentation say that two
Interlocked* operations on different variables are ordered with respect to
each other. So why should I believe that this ordering will be respected on
weak ordered IA32 processors?

> If the new processor makes the interlocked api not work. Microsoft would
> have to recode / port a bunch of crap. They would do the work for you.
When
> Microsoft finally got windows running for that new processor, the
> interlocked functions will work.

Yes, will work according to what the specification says they'll do. But
it doesn't say they provide ordering! If you say they do, please cite me
where it says so.

> > So the fact that there are, essentially, no memory visibility
> guarantees

> The interlocked api's work, the proper barriers will be there.

And I should just take your word for it? You have figured out what they
do *TODAY*. The documentation says what they'll do forever.

DS

SenderX

unread,

Jun 13, 2003, 3:35:18 AM6/13/03

to

> Yes, will work according to what the specification says they'll do. But
> it doesn't say they provide ordering! If you say they do, please cite me
> where it says so.

They do provide ordering for memory visibility.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas
e/interlockedincrement.asp

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas
e/interlockedcompareexchangerelease64.asp

Read the part on the membars for weak order systems.

> The C
> standard specifically says that what 'volatile' does is implementation
> defined, and I've cited the implementation documentation to you.

I cited the VC++ docs to Oliver S. in the previous thread.

> > Current C volatile will not save you from mem-ordering. If you think it
> > will, prepare to crash a system. C/C++ volatile save you from compiler
> > ordering.
>
> You say that. But the documentation doesn't agree.

Sure, go on thinking that volatile will save you from weak ordering...

;)

> Nowhere does the documentation say that two
> Interlocked* operations on different variables are ordered with respect to
> each other.

They are ordered with respect to loads and stores. The membars ensure that.

> And I should just take your word for it? You have figured out what
they
> do *TODAY*. The documentation says what they'll do forever.

Take Intel and Microsoft's word for it.

Did you even read the devcon presentation I cited?

Lock-free is in for scaleable servers. You are a bit ignorant when it comes
to lock-free algos.

What do you think of the lock-free Windows SList API? I suppose you think it
will crash windows kernels developed for new processors!!!

lol!

--
The designer of the SMP and HyperThread friendly, AppCore library.

http://AppCore.home.attbi.com

David Schwartz

unread,

Jun 13, 2003, 3:42:48 AM6/13/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:W8fGa.927488$OV.855036@rwcrnsc54...

> > Yes, will work according to what the specification says they'll do. But
> > it doesn't say they provide ordering! If you say they do, please cite me
> > where it says so.

> They do provide ordering for memory visibility.
>
>
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas
> e/interlockedincrement.asp
>
>
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas
> e/interlockedcompareexchangerelease64.asp
>
> Read the part on the membars for weak order systems.

There is no such section. There is a section about IA-64 processors, but
I'm only concerned with IA-32.

> > The C
> > standard specifically says that what 'volatile' does is implementation
> > defined, and I've cited the implementation documentation to you.

> I cited the VC++ docs to Oliver S. in the previous thread.

Good, so we both know that they *say* that 'volatile' provides ordering
guarantees.

> > > Current C volatile will not save you from mem-ordering. If you think
it
> > > will, prepare to crash a system. C/C++ volatile save you from compiler
> > > ordering.

> > You say that. But the documentation doesn't agree.

> Sure, go on thinking that volatile will save you from weak ordering...

It comes down to whether I should trust you or the documentation. How
can you know what will happen on future IA-32 processors?

> > Nowhere does the documentation say that two
> > Interlocked* operations on different variables are ordered with respect
to
> > each other.

> They are ordered with respect to loads and stores. The membars ensure
that.

What membars? Are you thinking IA-64 again?

> > And I should just take your word for it? You have figured out what
> > they
> > do *TODAY*. The documentation says what they'll do forever.

> Take Intel and Microsoft's word for it.

Fine, show me where they say this.

> Did you even read the devcon presentation I cited?

You are not an authoritative source for what InterlockedIncrement will
do on future IA-32 processors. Intel is an authoritative source for what
future IA-32 processors will do with the code InterlockedIncrement generates
and Microsoft is an authoritative source for what InterlockedIncrement will
do on future processors.

Let's forget about what 'volatile' and the 'Interlocked*' functions
actually do, because we can't rely on that for future IA-32 processors.
Let's stick to the documentation.

> What do you think of the lock-free Windows SList API? I suppose you think
it
> will crash windows kernels developed for new processors!!!
>
> lol!

It's easy to laugh at people when you get to decide what they say. There
is no way we can discuss something as complex as the SList API when you
can't even grasp what I'm saying about simple things like 'volatile'.

DS

SenderX

unread,

Jun 13, 2003, 4:27:20 AM6/13/03

to

> It's easy to laugh at people when you get to decide what they say.
There
> is no way we can discuss something as complex as the SList API when you
> can't even grasp what I'm saying about simple things like 'volatile'.

If you think the SList API is complex, that explains why your arguments
criticizing any lock-free algos, are ignorant. Really. Lock-free is safe,
simple, and real.

Linux uses lock-free, call l.t. and yell at him.

The SList happens to use one of the simplist lock-free algos out there! Ever
heard of the IBM FreeList.

http://AppCore.home.attbi.com/src/FreeStack.c

That was my SList algo.

Here is another one, for midishare:

http://cvs.grame.fr/cgi-bin/midishare-cvs/src/common/Headers/lflifo.h?rev=1.
5&co

Please don't tell me you think those are complex!!

Also, you are oh so wrong on C volatiles. Wow.

> > Take Intel and Microsoft's word for it.
>
> Fine, show me where they say this.

Contact Microsoft ( msdn member ) and / or Intel.

> > They are ordered with respect to loads and stores. The membars ensure
> that.
>
> What membars? Are you thinking IA-64 again?

The membars opcodes for the processor windows is running on!!!!!!!!!!!!!!

The interlocked apis work, and you are totally out of order telling windows
programmers they will crash on windows running on new processors.

Would a windows kernel programmer please share his views on the interlocked
api's, to ease daves mind.

SenderX

unread,

Jun 13, 2003, 4:47:19 AM6/13/03

to

> http://AppCore.home.attbi.com/src/FreeStack.c

> That was my SList algo.

I meant:

That is my SList, using the very well known IBM FreeList algo.

David Schwartz

unread,

Jun 13, 2003, 6:21:17 AM6/13/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:IVfGa.18600$YZ2.13746@rwcrnsc53...

> > It's easy to laugh at people when you get to decide what they say.
> > There
> > is no way we can discuss something as complex as the SList API when you
> > can't even grasp what I'm saying about simple things like 'volatile'.

> If you think the SList API is complex, that explains why your arguments
> criticizing any lock-free algos, are ignorant. Really. Lock-free is safe,
> simple, and real.

You think the SList API is simpler than the 'volatile' keyword?!

> Linux uses lock-free, call l.t. and yell at him.

Why you rant off-topic is beyond me. This conversation is about the
WIN32 API, right?

> Also, you are oh so wrong on C volatiles. Wow.

I'm not making any argument about C volatiles, I'm simply quoting the
documentation to you. You are telling me that 'volatile' on vc++ means
something other than what the vc++ documentation says. I say, if that's
true, then the documentation is broken and that makes the platform very hard
to use. You say that I should just use it your way and ignore the
documentation. That doesn't address my point at all.

> > > Take Intel and Microsoft's word for it.

> > Fine, show me where they say this.

> Contact Microsoft ( msdn member ) and / or Intel.

Look, do you agree that a multithreading API is broken if it doesn't
have rigid rules regarding memory visiblity sufficient to allow people to
design correct programs without having to guess or assume? Have you read the
pthreads API's section on memory visibility? Do you understand what it is I
want?

> > > They are ordered with respect to loads and stores. The membars ensure
> > that.

> > What membars? Are you thinking IA-64 again?

> The membars opcodes for the processor windows is running on!!!!!!!!!!!!!!

Where does the documentation say that Interlocked operations use membars
on IA-32 processors that might require them that don't even exist yet?! How
could it possibly say that?

I don't want membars, because I don't know what future processors will
need. Maybe it'll be membars, maybe it'll be something else. I'm a
programmer writing C code, I don't care about opcodes and membars, what I
care about are memory visibility guarantees. With those, I can write C code
that I know will work.

> The interlocked apis work, and you are totally out of order telling
windows
> programmers they will crash on windows running on new processors.

Show me where the Microsoft documentation says that Interlocked*
operations assure sequential visiblity if two different threads call
Interlocked operations on two different variables. You can't do it. You know
it works from your own testing, and you hope it will work on future
processors, but you are just hoping.

> Would a windows kernel programmer please share his views on the
interlocked
> api's, to ease daves mind.

If this is needed, then the API documentation is hopelessly broken. What
are the memory visiblity rules on WIN32 for VC++? And don't tell me what you
have observed them to be, I need guarantees to write code.

DS

SenderX

unread,

Jun 13, 2003, 12:37:15 PM6/13/03

to

> I'm not making any argument about C volatiles, I'm simply quoting the
> documentation to you. You are telling me that 'volatile' on vc++ means
> something other than what the vc++ documentation says.

I challenge you to show me some disassembles of VC++ volatile access output.

Currently, there will be no fences. If there are no fences where there
should be, your code will not work on a non-TSO. Volatiles not safe for
weak-memory order on C. I repeat, not safe for memory ordering.

;)

> You think the SList API is simpler than the 'volatile' keyword?!

SList uses a straight forward, simple algo. C volatiles are virtually
meaningless on current C compilers.

> Look, do you agree that a multithreading API is broken if it doesn't
> have rigid rules regarding memory visiblity sufficient to allow people to
> design correct programs without having to guess or assume?

The Windows Threading API takes care of that for you. Events, Critical
Sections, Semaphores are protected by memory barriers on weak-order systems.

Microsoft even hade to tweak their kernels to get windows to be hyperthread
friendly.

If a new IA-32 processor that breaks the rules comes out, your apps won't
work because windows won't work. Microsoft will recode some kernel
internals, and ship the new version of windows for the new IA-32. So your
argument that Interlocked API's will break on upcoming processors is
ridiculous!

> If this is needed, then the API documentation is hopelessly broken.
What
> are the memory visiblity rules on WIN32 for VC++? And don't tell me what
you
> have observed them to be, I need guarantees to write code.

I wanted some kernel guys to tell you that the Interlocked API's are ok to
use. You are wrong, when you say it will crash the many thousands of apps
that use them, on upcoming processors that windows will be tweaked for.

lol.

Microsoft ensures that their critical sections provided by semaphores,
events, ect... Have the proper acquire / release semantics.

Critical section semantics:

load:

lock();
b = a;
unlock();

store:

lock();
a = b;
unlock();

load:

b = a;
/* Acquire barrier */

store:

a = b;
/* Release barrier */

David Schwartz

unread,

Jun 13, 2003, 2:19:30 PM6/13/03

to

You are just hopelessly unable to stay on topic, like trying to nail
jello to a wall. I give up.

DS

SenderX

unread,

Jun 13, 2003, 6:34:08 PM6/13/03

to

> You don't need fences on IA-32.

I think David is talking about Intel developing a new weak-order IA-32 in
the future, under Microsoft's nose. Thereby breaking all of the windows apps
that use Interlocked API's.

David must think Intel and Microsoft are scum, because if they do what David
thinks they may do...

That will make them scum indeed!!

Anyway, your correct on IA-32. But it's good to show where to put fences in
your code.

Like:

do
{
/* Load lock-free stack head */
LocalStack = SharedStack;

/* Acquire fence on non-TSO */

} while( ! CAS64( SharedStack,
LocalStack,
LocalStack.pFront->pNext,
LocalStack.lAba + 1 ) );

/* Release fence on non-TSO */

It makes it easier to port.

David Schwartz

unread,

Jun 14, 2003, 7:43:45 PM6/14/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:AjsGa.145791$DV.1...@rwcrnsc52.ops.asp.att.net...

> > You don't need fences on IA-32.

> I think David is talking about Intel developing a new weak-order IA-32 in
> the future, under Microsoft's nose. Thereby breaking all of the windows
apps
> that use Interlocked API's.

Since I've made my point absolutely clear so many times, why do you
continue to opine about what you think I'm talking about? The most obvious
error in your summation above is that I'm not talking about "breaking all of
the windows apps that use Interlocked API's", just those that rely on the
Interlocked API's alleged undocumented behavior regarding serialization.

> David must think Intel and Microsoft are scum, because if they do what
David
> thinks they may do...
>
> That will make them scum indeed!!

Then show me where the documentation says that the Interlocked* APIs
provide memory ordering for different variables. I've shown you where the
kernel API documentation goes out of its way to say this, and how the user
API goes out of its way not to.

> do
> {
> /* Load lock-free stack head */
> LocalStack = SharedStack;
>
> /* Acquire fence on non-TSO */
>
> } while( ! CAS64( SharedStack,
> LocalStack,
> LocalStack.pFront->pNext,
> LocalStack.lAba + 1 ) );
>
> /* Release fence on non-TSO */
>
> It makes it easier to port.

How can you know what fences will or won't be necessary on future
processors? All you can do is rely upon the specified behavior not to
change, because that would be scummy. What you are doing, however, is
advocating that others rely on *un*specified behavior.

DS

David Schwartz

unread,

Jun 14, 2003, 7:50:41 PM6/14/03

to

"Oliver S." <Foll...@gmx.net> wrote in message
news:Xns939AF212F461...@130.133.1.4...

> > Correct TO DO WHAT? Nowhere does the documentation say that two
> > Interlocked* operations on different variables are ordered with respect
> > to each other.

> "On IA-64 processor architectures, this function will generate a memory
> barrier (or fence) followed by the exchange operation with acquire seman-
> tics. This ensures the strict memory access ordering that is necessary,
> but it can be slower than necessary. For performance-critical applica-
> tions, consider using InterlockedExchangeAcquire64." - documentation of
> InterlockedExchange, InterlockedIncrement and InterlockedDecrement.

Except I'm not talking about IA-64, so why do you keep quoting me a
section about what they do on IA-64 processors? I'm only talking about the
WIN32 API.

In any event, the phrase "strict memory access ordering that is
necessary" can only mean necessary for the defined purpose and guarantees of
InterlockedIncrement, which is exactly what we're arguing about.

> And as Win32/IA32 uses the lock-prefix with InterlockedCompareExchange
> operations (since Win2K even on Non-SMP-Machines, although that's not
> necessary), I hardly doubt that M$ would change this with Non-IA32-ver-
> sions.

Why should I care what assembly instructions InterlockedICompareExchange
generates on an existent platform? I'm concerned about what they'll do on
future processors, and I have no way to know what those instructions will do
on future processors except to the extent that Intel makes guarantees.

I'm limited to the API guarantees that Microsoft makes, because
Microsoft would be breaking the API contract if they violated those
guarantees. Relying on undocumented behavior (such as Interlocked* functions
serializing across distinct variables) is the way you get into trouble on
future implementations of the API.

> > So why should I believe that this ordering will be respected on
> > weak ordered IA32 processors?

> Do you really think that Intel/AMD would produce a weak-ordered x86
> -processor that f.e. wouldn't be self-consistent so that most software
> would fail ? I think they don't have the option of breaking that back-
> ward-compatibility.

Intel has expressly stated that they may do so. They've made it clear
what behavior you can rely on and what behavior you can't. I trust Microsoft
to rely only on documented behavior so that my software will continue to
work so long as I rely only on Microsoft's documented behavior.

But the fact is that you are asking me to rely on undocumented behavior.

DS

SenderX

unread,

Jun 14, 2003, 8:58:20 PM6/14/03

to

> The most obvious
> error in your summation above is that I'm not talking about "breaking all
of
> the windows apps that use Interlocked API's", just those that rely on the
> Interlocked API's alleged undocumented behavior regarding serialization.

You refuse to learn:

INTERLOCKED API's ORDER ON LOADS AND STORES!!!!!!!!

So:

QUIT TELLING WINDOWS PEOPLE THERE UNSAFE!!!!!!!!!

But:

Its amusing to hear your misguided logic on this issue.

=)

SenderX

unread,

Jun 14, 2003, 9:06:39 PM6/14/03

to

> Except I'm not talking about IA-64, so why do you keep quoting me a
> section about what they do on IA-64 processors? I'm only talking about the
> WIN32 API.

The hardware doesn't matter. Microsoft puts the correct opcodes in there.
How many times to I have to lay this out for you.

I'm getting pissed off at the fact that you are telling people the
Interlocked API's are unsafe. There not unsafe.

Quit using the Win32 API if you don't like them.

90% OF WIN32 SYNC OBJECTS USE Interlocked API's:

Semaphores

CriticalSections

Spinlocks

QueuedSpinLocks

Ect...

So if I listen to your logic, there all broken on future IA-32 processors
that might implement a weak order???????

lol!

> In any event, the phrase "strict memory access ordering that is
> necessary" can only mean necessary for the defined purpose and guarantees
of
> InterlockedIncrement, which is exactly what we're arguing about.

Wow.

Learn to read. The proper barrier opcodes are included in ALL the
interlocked API's. Again, read the docs.

David Schwartz

unread,

Jun 14, 2003, 9:14:24 PM6/14/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:MwPGa.32948$YZ2.197617@rwcrnsc53...

> You refuse to learn:
>
> INTERLOCKED API's ORDER ON LOADS AND STORES!!!!!!!!

Where did you get this from? All I want to know is where you found this
out. I showed you where the documentation says "volatile" does this:

"Objects declared as volatile are not used in optimizations because their
value can change at any time. The system always reads the current value of a
volatile object at the point it is requested, even if the previous
instruction asked for a value from the same object. Also, the value of the
object is written immediately on assignment."

So where does the Interlocked API say anything like this for WIN32? And
do you claim the quotation above isn't about "order[ing] on loads and
stores"?

I understand that this is what the interlocked APIs happen to do, but I
prefer to rely on documented behavior, such as the paragraph above.

DS

David Schwartz

unread,

Jun 14, 2003, 9:20:34 PM6/14/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:zEPGa.948321$OV.1056776@rwcrnsc54...

> > Except I'm not talking about IA-64, so why do you keep quoting me a
> > section about what they do on IA-64 processors? I'm only talking about
the
> > WIN32 API.

> The hardware doesn't matter. Microsoft puts the correct opcodes in there.
> How many times to I have to lay this out for you.

The correct opcodes to get the documented behavior. So saying I should
rely on Microsoft to put the correct opcodes in there is the same as saying
I should rely on documented behavior.

> I'm getting pissed off at the fact that you are telling people the
> Interlocked API's are unsafe. There not unsafe.

I'm saying they are documented as providing ordering for concurrent
access between different variables, whereas the 'volatile' keyword *is*
documented as providing this.

> > In any event, the phrase "strict memory access ordering that is
> > necessary" can only mean necessary for the defined purpose and
guarantees
> of
> > InterlockedIncrement, which is exactly what we're arguing about.
>
> Wow.
>
> Learn to read. The proper barrier opcodes are included in ALL the
> interlocked API's. Again, read the docs.

I'm not interested in what they actually do on the platforms you've
tested them on, I'm interested in what they're documented to do. They are
not documented to provide ordering between accesses to different variables.
Really.

DS

SenderX

unread,

Jun 14, 2003, 9:22:56 PM6/14/03

to

> Where did you get this from? All I want to know is where you found
this
> out. I showed you where the documentation says "volatile" does this:

Go on thinking that C volatiles handle memory ordering. Its all good.

=)

> So where does the Interlocked API say anything like this for WIN32?
And
> do you claim the quotation above isn't about "order[ing] on loads and
> stores"?

I already cited the msdn docs a couple of times, why should I do it again?

Interlocked API's ( besides the acquire / release versions ) are protected
my barriers.

That means they are order by loads and stores.

SenderX

unread,

Jun 14, 2003, 9:24:59 PM6/14/03

to

> I'm saying they are documented as providing ordering for concurrent
> access between different variables, whereas the 'volatile' keyword *is*
> documented as providing this.

Your comparing C volatile, to the Interlocked API's???

rofl

> They are
> not documented to provide ordering between accesses to different
variables.
> Really.

Oh, yes they are.

David Schwartz

unread,

Jun 14, 2003, 9:27:32 PM6/14/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:QTPGa.948406$OV.1057968@rwcrnsc54...

> Interlocked API's ( besides the acquire / release versions ) are protected
> my barriers.
>
> That means they are order by loads and stores.

I understand that you drew the (correct) conclusion that they order on
loads and stores for current hardware. I want to know where they are
*documented* to order on loads and stores. Otherwise, I can't rely on them
ordering on loads and stores on future processors.

Remember this paragraph:

"Objects declared as volatile are not used in optimizations because their
value can change at any time. The system always reads the current value of a
volatile object at the point it is requested, even if the previous
instruction asked for a value from the same object. Also, the value of the
object is written immediately on assignment."

Let me ask you one very simple yes or no question: Does this paragraph
say that 'volatile' provides ordering for loads and stores?

DS

David Schwartz

unread,

Jun 15, 2003, 6:06:04 PM6/15/03

to

"Oliver S." <Foll...@gmx.net> wrote in message

news:Xns939B33C128DE...@130.133.1.4...

> Even Intel can't change that becuase of existing software. The relax-
> iations current x86s do are the largest extent by which relaxiations
> on x86s coud go without conflicting with a lot of software.

So basically, to rely on the Interlocked* APIs (to provide ordering
semantics across distinct variables), I have to disregard what Intel and
Microsoft say and instead go by the hope that they wouldn't introduce an
incompatible change.

The problem is, history is filled with stories of people who got bitten
by assuming that undocumented behavior would persist. Rather than be the
next victim, where I need such serialization, I'll just use a mutex,
critical section, or spinlock. Thank you very much.

It doesn't even perform better. If you care about serialization, you
have to be using at least two Interlocked* operations. Since acquiring a
spinlock requires only one locked bus cycle and relasing one requires, at
worst, one locked bus cycle, acquiring and releasing a spinlock is probably
only very slightly slower than two interlocked operations.

So if I rely on the undocumented behavior, I gain essentially nothing.

DS

SenderX

unread,

Jun 15, 2003, 6:32:57 PM6/15/03

to

> It doesn't even perform better.

If you think this:

MutexLock();

lVar++;

MutexUnlock();

Is faster than this:

InterlockedIncrement( &lVar );

You are a certified win32 moron.

ROFL!

You are 100% wrong.

You simply don't have a clue. We try to teach you, but you fail to learn.

David Schwartz

unread,

Jun 15, 2003, 9:30:30 PM6/15/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:tu6Ha.686339$Si4.7...@rwcrnsc51.ops.asp.att.net...

This will be my last response to SenderX on this topic, or probably any
topic.

> > It doesn't even perform better.

> If you think this:

> MutexLock();
> lVar++;
> MutexUnlock();

> Is faster than this:
>
> InterlockedIncrement( &lVar );
>

I explained in great detail what I was comparing to what, SenderX chose
to ignore my detailed explanation and instead take my claim out of context.
That is the third time he has done that in this thread and I can no longer
believe it's accidental.

For the record, I was comparing:

SpinLock();
var1++;
var2++;
SpinUnlock();

to

InterlockedIncrement(&var1);
InterlockedIncrement(&var2);

With a single operation, there are no ordering issues. This whole thread
is about ordering issues.

> You are a certified win32 moron.

Sure, if you get to change my claims you can spin them as moronic.

DS

Alexander Terekhov

unread,

Jun 16, 2003, 2:49:22 AM6/16/03

to

"Oliver S." wrote:
>
> > Except I'm not talking about IA-64, so why do you keep quoting me a
> > section about what they do on IA-64 processors?
>

> The fencing is mentioned because the IA-64 platform is currently the
> only one where an explicit fence is necessary.

Wrong. IA-32 does speculative reads, to begin with.

> I'd bet my right hand that M$ would preserve that behaviour ...

I'll bet my both hands and head that MS doesn't care at all... as long
as it continues to sell. They'll issue a whole bunch of silly patches
(breaking all sort of other things) in the event once/when/if they'll
"perceive some problem".

regards,
alexander.

Alexander Terekhov

unread,

Jun 16, 2003, 6:17:15 AM6/16/03

to

"Oliver S." wrote:
>
> > Wrong. IA-32 does speculative reads, to begin with.
>

> Yes, but the reads are sequential across synchronizing instructions
> with a lock-prefix. And the InterlockedXxx-APIs use this prefixes.

Really? Where did get that information? Well, I guess, I know...

and as for me being "polemic"... well, take a look at the following
(reviewed 4/4/2003) MS-DOC "kbfix":

http://support.microsoft.com/default.aspx?scid=kb%3Ben-us%3B316145
(DOC: Confusion in the Threading Design Guideline Help Document)

<quote>

DOC: Confusion in the Threading Design Guideline Help Document

The information in this article applies to:
Microsoft .NET Framework SDK 1.0

This article was previously published under Q316145

SUMMARY

In the Microsoft Developer Network (MSDN) Documentation that installs
with Visual Studio .NET, part of the following help topic is not very
clear and may cause some confusion:

Threading Design Guidelines

STATUS

This bug was corrected in the Microsoft .NET Framework SDK 1.1.

MORE INFORMATION

The "Threading Design Guidelines" help topic contains the following
text:

"Be aware of issues with the lock statement (SyncLock in Visual Basic).
It is tempting to use the lock statement to solve all threading problems.
However, the System.Threading.Interlocked Class is superior for updates
that must be made atomically. It executes a single lock prefix if there
is no contention. In a code review, you should watch out for instances
like the one shown in the following example.

[Visual Basic]
SyncLock Me
myField += 1
End SyncLock

[C#]
lock(this)
{
myField++;
}

Alternatively, it might be better to use more elaborate code to create rhs
outside of the lock, as in the following example. Then, you can use an
interlocked compare exchange to update x only if it is still null. This
assumes that creation of duplicate rhs values does not cause negative side
effects.

[Visual Basic]
If x Is Nothing Then
SyncLock Me
If x Is Nothing Then
' Perform some elaborate code to create rhs.
x = rhs
End If
End SyncLock
End If

[C#]
if (x == null)
{
lock (this)
{
if (x == null)
{
// Perform some elaborate code to create rhs.
x = rhs;
}
}
}"

To make the topic more clear, replace the previous text with the
following:

"Be aware of issues with the lock statement (SyncLock in Visual Basic).
It is tempting to use the lock statement to solve all threading problems.
However, the System.Threading.Interlocked Class is superior for updates
that must be made automatically. This class executes a single lock prefix
if there is no contention. For example, in a code review, you should
watch out for instances like the one shown in the following example:

[Visual Basic]
SyncLock Me
myField += 1
End SyncLock

[C#]
lock(this)
{
myField++;
}

Replace the previous sample code with the following sample code to
improve performance:

[Visual Basic]
System.Threading.Interlocked.Increment(myField)

[C#]
System.Threading.Interlocked.Increment(ref myField);

Another example is to update an object type variable only if the variable
is null. You can use the following code to update the variable and to make
the code thread safe:

[Visual Basic]
If x Is Nothing Then
SyncLock Me
If x Is Nothing Then
x = y
End If
End SyncLock
End If

[C#]
if (x == null)
{
lock (this)
{
if (x == null)
{
x = y;
}
}
}

For this sample, you can improve the performance if you use the following
code to replace the previous code:

[Visual Basic]
System.Threading.Interlocked.CompareExchange(x, y, Nothing)

[C#]
System.Threading.Interlocked.CompareExchange(ref x, y, null);"

Last Reviewed: 4/4/2003
Keywords: kbfix kbbug kbdocerr KB316145

</quote>

regards,
alexander.

Michael Furman

unread,

Jun 16, 2003, 2:55:42 PM6/16/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:bcgi24$547$1...@nntp.webmaster.com...
> [...]

> "Objects declared as volatile are not used in optimizations because their
> value can change at any time. The system always reads the current value of
a
> volatile object at the point it is requested, even if the previous
> instruction asked for a value from the same object. Also, the value of the
> object is written immediately on assignment."
>
> Let me ask you one very simple yes or no question: Does this paragraph
> say that 'volatile' provides ordering for loads and stores?

No it is not. It is only saying about read and write instructions rather
then accesses to
the memory.
Regards,
Michael

>
> DS
>
>

David Schwartz

unread,

Jun 16, 2003, 4:58:42 PM6/16/03

to

"Michael Furman" <Michae...@Yahoo.com> wrote in message
news:bcl3rg$kci7c$1...@ID-122417.news.dfncis.de...

> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:bcgi24$547$1...@nntp.webmaster.com...

> > "Objects declared as volatile are not used in optimizations because

their
> > value can change at any time. The system always reads the current value
of
> > a
> > volatile object at the point it is requested, even if the previous
> > instruction asked for a value from the same object. Also, the value of
the
> > object is written immediately on assignment."

> > Let me ask you one very simple yes or no question: Does this
paragraph
> > say that 'volatile' provides ordering for loads and stores?

> No it is not. It is only saying about read and write instructions rather
> then accesses to
> the memory.

Instructions? This is a description of a C language construct with
respect to other C language constructs. What do "read and write
instructions" have to do with anything? You are crossing levels.

DS

Michael Furman

unread,

Jun 16, 2003, 6:05:46 PM6/16/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:bclb22$13o$1...@nntp.webmaster.com...

No - you do! You posted some text fragment with words:

"The system always reads the current value of a volatile object at the
point it is requested, even if the previous instruction asked for a value

from the same object....."
and asked a question about it.
And, BTW Instructions (that mentioned in the text)
have much more to do with C language then "load and stores" (that are not
mentioned in the text), Just because instructions are being generated
by compiler.
Michael

>
> DS
>
>

SenderX

unread,

Jun 16, 2003, 7:58:39 PM6/16/03

to

> > Yes, but the reads are sequential across synchronizing instructions
> > with a lock-prefix. And the InterlockedXxx-APIs use this prefixes.
>
> Really? Where did get that information? Well, I guess, I know...

I got it from disassembling the Interlocked API's.

David Schwartz

unread,

Jun 16, 2003, 8:14:55 PM6/16/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:PQsHa.966001$OV.1070196@rwcrnsc54...

> > > Yes, but the reads are sequential across synchronizing instructions
> > > with a lock-prefix. And the InterlockedXxx-APIs use this prefixes.
> >
> > Really? Where did get that information? Well, I guess, I know...
>
> I got it from disassembling the Interlocked API's.

Which tells you nothing about what they'll do on future processors or
even future versions of the compiler or implementation. Relying on what the
current implementations happen to do is a recipe for disaster.

DS

David Schwartz

unread,

Jun 16, 2003, 9:56:18 PM6/16/03

to

"Oliver S." <Foll...@gmx.net> wrote in message

news:Xns939D205D1F1B...@130.133.1.4...
> > Which tells you nothing about what they'll do on future processors ...

> Intel guarantees that instructions which use the lock-prefix
> are serializing. It would be suicide for intel to break this.

This is ass-backwards engineering though. You should read the
documentation to find out what a function does, not reverse-engineer the
results.

> > ... or even future versions of the compiler or implementation.

> > Relying on what the current implementations happen to do is a
> > recipe for disaster.

> This might be right in 99,9% of all cases; but here it's even more
> than paranoid.

The point is, I shouldn't have to reverse-engineer a function, see what
it happens to do, and then try to rely on that. Worse, there are cases where
the documentation says you can rely on something, but the instructions
emitted don't provide that guarantee. Even worse, there is no memory
visibility rules, so you never know what you can rely on.

Where can I find the memory visibility rules for the WIN32
multithreading API?

DS

David Schwartz

unread,

Jun 16, 2003, 11:56:32 PM6/16/03

to

"Oliver S." <Foll...@gmx.net> wrote in message

news:Xns939D3482381A...@130.133.1.4...

I don't think we disagree on any facts, we just disagree on what they
mean.

> > This is ass-backwards engineering though. You should read the
> > documentation to find out what a function does, not reverse-engineer
> > the results.

> No, that behaviour is specified for "Pentium 4, Intel Xeon, and P6 Fami-
> ly Processors" in chapter 7.2.2 of the 'IA-32 Architecture Software Deve-
> loper's Guide, Volume3, System Programming Guide': "Reads or writes can-
> not pass (be carried out ahead of) I/O instructions, locked instructions,
> or serializing instructions"

So I should look at the IA-32 Architecture Software Developer's Guide to
discover what the Interlocked* functions do for WIN32 C programs? The
problem is, the Interlocked* functions should do the same thing on every
architecture, they're C functions. They should not be platform-specific.

So if you're right that this is where I need to look, it would be
because the WIN32 documentation for the Interlocked* API is broken.

> All current OSes which provide multithreading won't be safe from crash-
> ing if Intel would change that behaviour. That's because user-mode mutex-
> es like the CRITICAL_SECTION in Win32 are always implemented with locked
> read-modify-write-operations (pure kernel-synchronization wouldn't give
> a competitive performance here).

Look, the C language documentation for the Interlocked* APIs shouldn't
talk about read-modify-write operations and locked busses. It should say
what affect the Interlocked* functions have, not how they achieve this
affect.

> > The point is, I shouldn't have to reverse-engineer a function, see
> > what it happens to do, and then try to rely on that.

> Spoken more generally, I agree, but in this case I'm very sure that a
> lot of programmers rely on the serializing behaviour of the InterLocked*
> -APIs because they're serializing on all x86 Win32-systems.

So they deduce that behavior happens, and rely on it to continue to
happen in the future. The hope is that if enough people rely on it, it can't
possibly change, even though the behavior is not guaranteed. In fact, the
documentation goes out of its way to not provide this guarantee.

> > Worse, there are cases where the documentation says you can rely on
> > something, but the instructions emitted don't provide that guarantee.
>

> But that's not the case here.

No, but that is the case with 'volatile'. My point is that the
documentation is badly broken.

> > Even worse, there is no memory visibility rules, so you never know what
> > you can rely on.

> With all SMP-machines running from NT 3.1 to Windows Server 2003 the
> InterlockedXxx-APIs use read-modify-write-operations with a lock-prefix
> on x86-machines; this is self-evident because this APIs wouldn't be SMP
> -safe if the prefix would be omitted. And with this lock-prefix, you'll
> get a guaranteed bidirectional memory-barrier even with future x86-CPUs
> because of the circumstances described above.

I agree with this. But this not documented behavior of the Interlocked*
APIs.

> The IA-64-versions of Win-
> dows also maintains this behaviour according to the MSDN-library and this
> and the fact that there are InterlockedXxx-APIs with acquire and release
> -semantics show that M$ is aware of the potential problem which versions
> of the InterlockedXxx-APIs withoud membars would conjure up.

So why don't they just document the semantics of the Interlocked*
functions and write a section on memory visibility? Why put their developers
through all this pain and uncertainty?

DS

Joseph Seigh

unread,

Jun 17, 2003, 9:46:09 AM6/17/03

to

David Schwartz wrote:
> The point is, I shouldn't have to reverse-engineer a function, see what
> it happens to do, and then try to rely on that. Worse, there are cases where
> the documentation says you can rely on something, but the instructions
> emitted don't provide that guarantee. Even worse, there is no memory
> visibility rules, so you never know what you can rely on.
>
> Where can I find the memory visibility rules for the WIN32
> multithreading API?
>

I don't think there are any explicit memory visibility rules in the external
Posix documentation. That doesn't seem to be a problem for many people. For
most, mnemonics seem to be enough.

Joe Seigh

SenderX

unread,

Jun 18, 2003, 3:29:16 PM6/18/03

to

> SpinLock();
> var1++;
> var2++;
> SpinUnlock();
>
> to
>
> InterlockedIncrement(&var1);
> InterlockedIncrement(&var2);

Your comparing a single atomic write on two variables, to a race condition?

LOL! You have got to be kidding. You need to learn about race conditions,
really!

If you expect:

LONG l = 0;

InterlockedIncrement( &l );
InterlockedIncrement( &l );

to be ( l == 2 ), when multi-threads can touch l ...

Then, " you " should stay away from the Interlocked API's. You have to learn
how to use them first. This behavior is documented inside the Intel docs, I
can find and site if you want. QUIT saying that the Interlocked API's are
bad, when you don't have a clue about them.

P.S.

To fix your race infested Interlocked code:

typedef union U_Vars
{
unsigned __int64 Value64;

struct
{
LONG var1;
LONG var2;
};

} VARS, *LPVARS;

VARS Shared ( var1 = 0, var2 = 0 );

VARS Local, New;

do
{
New = Local = Shared;

/* Acquire barrier */

New.var1++;
New.var2++;
}

while( InterlockedCompareExchangeRelease64
( &Shared.Value64,
New.Value64,
Local.Value64 ) != Local.Value64 );

Alexander Terekhov

unread,

Jun 18, 2003, 3:50:36 PM6/18/03

to

SenderX wrote:
[...]
> QUIT saying that the Interlocked API's are bad, ...

Yeah. You should say instead that MS-Interlocked stuff is
*BRAIN DEAD*.

regards,
alexander.

--
"SCO to sue Al Gore

In a dramatic move, SCO today filed suit against Al Gore
alleging misappropriations of trade secrets and unfair
competition for inventing the Internet. SCO Executive
Chris Sontag was quoted as saying "This internet thing
is a Unix derivative and our contracts from AT&T clearly
give us all IP rights to Unix derivatives. What Al Gore
did is to transfer our rights to the world. Our lawsuit
is aimed at preserving our intellectual property rights."

When asked if he was on crack, Sontag replied, "No, I'm
on SCO, but crack is clearly a derivative of Unix and we
intend to sue the Colombian cocaine cartel next."."

-- bryan_w_taylor, SCOX board at yahoo

SenderX

unread,

Jun 18, 2003, 3:59:34 PM6/18/03

to

> Yeah. You should say instead that MS-Interlocked stuff is
> *BRAIN DEAD*.

;)

Anyway, there nothing but a simple API that wrappers the processors atomic
ops. There no meant to provide a global C atomic standard.

Do you ever think Microsoft will morph the Interlocked API's into an atomic
standard? If they do, how will that affect your upcoming atomic< T >?

I would like to be able to use your code for a portable compare-exchange (
attempt_update? ).

Ziv Caspi

unread,

Jun 18, 2003, 4:25:53 PM6/18/03

to

On Fri, 13 Jun 2003 00:42:48 -0700, "David Schwartz"
<dav...@webmaster.com> wrote:

> You are not an authoritative source for what InterlockedIncrement will
>do on future IA-32 processors. Intel is an authoritative source for what
>future IA-32 processors will do with the code InterlockedIncrement generates
>and Microsoft is an authoritative source for what InterlockedIncrement will
>do on future processors.

This shows you do realize Microsoft can't say what the current
implementation of InterlockedIncrement will do on a future IA32
processor, because this depends on the behavior of such processors and
Intel isn't providing any guarantees.

Given that, if Microsoft provided you with documentation that says
"the InterlockedIncrement implementation in Windows XP (no SP) is
guaranteed to work on all future IA32 processors", you would be right
to be suspicious (or at least surmise that there are some guarantees
Intel provided Microsoft that were not made public).

It is understandable that you want a spec for Interlocked* that is
robust to *any* future hardware changes. However, Microsoft, unlike
POSIX, does not provide a rigorous specification in this area. You get
some documentation (which is open for interpretation) and a wink that
says "so much code in Windows and other server software Microsoft has
written actually *does* rely on this to work, you can be sure Intel
wouldn't dare change it in future processors, and Microsoft would make
damn sure it'll work on future architectures".

[Disclaimer: This posting is provided "AS IS" with no warranties, and
confers no rights. Opinions said here represent my own, and not my
employer's. In addition, they are based only on publicly available
information.]

Alexander Terekhov

unread,

Jun 18, 2003, 4:31:33 PM6/18/03

to

SenderX wrote:
[...]

> I would like to be able to use your code for a portable compare-exchange (
> attempt_update? ).

First, you'd have to convince Butenhof that neither pthread_cond_signal()
nor pthread_cond_broadcast() need to impose ANY restrictions on COMPILER
induced reordering. The DR of mine seeking [trillions.. err. ok ;-) ] to
remove CV signaling functions from XBD/4.10 was also rejected -- result
of the DRB's "verdict", IIRC.

regards,
alexander.

Alexander Terekhov

unread,

Jun 18, 2003, 4:49:51 PM6/18/03

to

Ziv Caspi wrote:
[...]

> It is understandable that you want a spec for Interlocked* that is
> robust to *any* future hardware changes. However, Microsoft, unlike
> POSIX, does not provide a rigorous specification in this area. You get
> some documentation (which is open for interpretation) and a wink that
> says "so much code in Windows and other server software Microsoft has
> written actually *does* rely on this to work, you can be sure Intel
> wouldn't dare change it in future processors, and Microsoft would make
> damn sure it'll work on future architectures".

Intel already has MTRR stuff. Sure they'll preserve compatibility
for the "old" binary stuff. The point is that SOURCE CODE [things
ala broken DCSI/DCCI] "might" NOT work on the "fully-relaxed by
default" future IA-32 systems.

http://groups.google.com/groups?selm=3C615A7E.C61764C7%40web.de
http://groups.google.com/groups?selm=3C628158.69EA424F%40web.de

>
> [Disclaimer: This posting is provided "AS IS" with no warranties, and
> confers no rights. Opinions said here represent my own, and not my
> employer's. In addition, they are based only on publicly available
> information.]

Interesting. You're working for Wintel GmbH?

regards,
alexander.

SenderX

unread,

Jun 18, 2003, 5:47:45 PM6/18/03

to

> Intel already has MTRR stuff. Sure they'll preserve compatibility
> for the "old" binary stuff. The point is that SOURCE CODE [things
> ala broken DCSI/DCCI] "might" NOT work on the "fully-relaxed by
> default" future IA-32 systems.

If Intel designed a non-TSO IA-32, they would have the thousands and
thousands of current apps that use IA-32 instructions in mind... Microsoft,
and every other OS, would have to insure that their critical section API's
provide the correct visibility on such a IA-32.

The point is, if future IA-32 systems break the current rules... Your Win32
code won't work, because Windows won't work! So, you would have to wait for
Microsoft to get things running.

;)

I would also bet that the opcode names would be the same for most atomic ops
on future IA-32's.

cmpxchg
cmpxchg8b
xadd

ect...

Joseph Seigh

unread,

Jun 18, 2003, 6:16:10 PM6/18/03

to

SenderX wrote:
>
> > Yeah. You should say instead that MS-Interlocked stuff is
> > *BRAIN DEAD*.
>
> ;)
>
> Anyway, there nothing but a simple API that wrappers the processors atomic
> ops. There no meant to provide a global C atomic standard.
>
> Do you ever think Microsoft will morph the Interlocked API's into an atomic
> standard? If they do, how will that affect your upcoming atomic< T >?
>

Microsoft's stuff is too platform specific. They might be useful to implement
a standard but not to define it.

SenderX

unread,

Jun 18, 2003, 6:30:00 PM6/18/03

to

> Microsoft's stuff is too platform specific. They might be useful to
implement
> a standard but not to define it.

I see.

Joseph Seigh

unread,

Jun 18, 2003, 7:07:33 PM6/18/03

to

SenderX wrote:
>
> > Microsoft's stuff is too platform specific. They might be useful to
> implement
> > a standard but not to define it.
>
> I see.
>

I'd do it (define a standard of sorts) but I'm a little side tracked now.
I did a reader lock-free algorithm, which works really well compared to
just using mutexes. To be fair I need to compare it to a conventional
reader writer lock which of course does not exist in windows. So I've
been trying to write a quick and dirty rwlock which doesn't suck compared
a plain mutex. The best so far is 20% better compared with the reader
lock-free version which is 500% better than the mutex version.

Joe Seigh

SenderX

unread,

Jun 18, 2003, 7:17:34 PM6/18/03

to

> I'd do it (define a standard of sorts) but I'm a little side tracked now.
> I did a reader lock-free algorithm, which works really well compared to
> just using mutexes.

Cool. I was wondering if your reader uses events or semaphores to wake
waiting threads? Or does it just spin them?

> The best so far is 20% better compared with the reader
> lock-free version which is 500% better than the mutex version.

The lock-free stats are impressive, as always. ;)

My lock-free reader / writer avoids the kernel 100% of the time, if there
are only reads. So it performs great as a reader lock. Have you looked at
it? I can't get it to crash on me yet. =)

Joseph Seigh

unread,

Jun 18, 2003, 7:44:36 PM6/18/03

to

SenderX wrote:
>
> > I'd do it (define a standard of sorts) but I'm a little side tracked now.
> > I did a reader lock-free algorithm, which works really well compared to
> > just using mutexes.
>
> Cool. I was wondering if your reader uses events or semaphores to wake
> waiting threads? Or does it just spin them?

The reader lock-free doesn't use any windows synchronization. The rwlock
uses an event and a semaphore.

>
> > The best so far is 20% better compared with the reader
> > lock-free version which is 500% better than the mutex version.
>
> The lock-free stats are impressive, as always. ;)
>
> My lock-free reader / writer avoids the kernel 100% of the time, if there
> are only reads. So it performs great as a reader lock. Have you looked at
> it? I can't get it to crash on me yet. =)
>

Actually if you no writes going on, the reader lock-free performs no better
than a simple mutex or conventional rwlock. On a uniprocessor anyway.
But that appears to be worst case for it.

It's a linked list implementation along the lines of AWTEventMulticaster kind
of lock-free stuff. You can iterate through the list while nodes are being
deleted. You can even delete nodes without having to reiterate throught the
list with the lock held.

Joe Seigh

David Schwartz

unread,

Jun 18, 2003, 8:08:35 PM6/18/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:f43Ia.65723$YZ2.227410@rwcrnsc53...

> > SpinLock();
> > var1++;
> > var2++;
> > SpinUnlock();
> >
> > to
> >
> > InterlockedIncrement(&var1);
> > InterlockedIncrement(&var2);

> Your comparing a single atomic write on two variables, to a race
condition?
>
> LOL! You have got to be kidding. You need to learn about race conditions,
> really!

Okay, you're a troll. You are the only person in the world who couldn't
tell that the exact operations weren't important, it was just comparing two
non-interlocked operations protected by a spinlock to two interlocked
operations for performance reasons.

DS

David Schwartz

unread,

Jun 18, 2003, 8:10:57 PM6/18/03

to

"Ziv Caspi" <zi...@netvision.net.il> wrote in message
news:3eeaf935.3541292@newsvr...

> On Fri, 13 Jun 2003 00:42:48 -0700, "David Schwartz"
> <dav...@webmaster.com> wrote:

> > You are not an authoritative source for what InterlockedIncrement
will
> >do on future IA-32 processors. Intel is an authoritative source for what
> >future IA-32 processors will do with the code InterlockedIncrement
generates
> >and Microsoft is an authoritative source for what InterlockedIncrement
will
> >do on future processors.

> This shows you do realize Microsoft can't say what the current
> implementation of InterlockedIncrement will do on a future IA32
> processor, because this depends on the behavior of such processors and
> Intel isn't providing any guarantees.

That is not true, Intel provides many guarantees. This is one case where
they not only have not provided a guarantee but have explicitly stated that
there is no such guarantee.

> Given that, if Microsoft provided you with documentation that says
> "the InterlockedIncrement implementation in Windows XP (no SP) is
> guaranteed to work on all future IA32 processors", you would be right
> to be suspicious (or at least surmise that there are some guarantees
> Intel provided Microsoft that were not made public).

I don't doubt that it's guaranteed to work for it's documented purpose.
But I'm being asked to use it for purposes for which it's not documented.

> It is understandable that you want a spec for Interlocked* that is
> robust to *any* future hardware changes. However, Microsoft, unlike
> POSIX, does not provide a rigorous specification in this area. You get
> some documentation (which is open for interpretation) and a wink that
> says "so much code in Windows and other server software Microsoft has
> written actually *does* rely on this to work, you can be sure Intel
> wouldn't dare change it in future processors, and Microsoft would make
> damn sure it'll work on future architectures".

Exactly. And that's what I'm complaining about. I want a memory
visiblity specification, and there is none.

DS

David Schwartz

unread,

Jun 18, 2003, 8:12:12 PM6/18/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:565Ia.713709$Si4.8...@rwcrnsc51.ops.asp.att.net...

> If Intel designed a non-TSO IA-32, they would have the thousands and
> thousands of current apps that use IA-32 instructions in mind...
Microsoft,
> and every other OS, would have to insure that their critical section API's
> provide the correct visibility on such a IA-32.

EXACTLY! And the definition of 'correct' is "such that the operations
continue to work as they are documented to work". The problem is, they're
not documented as providing ordering between different variables, they are
only documented as providing ordering between the same variable accessed in
different threads.

DS

SenderX

unread,

Jun 18, 2003, 8:41:03 PM6/18/03

to

> Okay, you're a troll.

You say that the Interlocked API's are unsafe, and may break on upcoming
IA-32, there slower than critical sections, ect... Which is all totally
false.

You call me a troll? I was trying to tell you how current C volatile and the
Interlocked API's act, in the beginning of the discussion. You would not
learn, and kept telling programmers that the Interlocked API's are bad. You
even said people could use current C volatile ( VC++ docs describe it ) to
overcome a non-TSO processor.

Why should I let you tell other Windows programmers a bunch of crap like
this?

There are a bunch of open source win32 apps that use the Interlocked API's
in an effective manor. Like, every single threaded COM component out there.
See how they use the Interlocked API's, and learn to use them in your code.
They will increase your apps throughput.

P.S.

Do you realize that CRITICAL_SECTIONS, and many other kernel internals use
the Interlocked API's?

I can see you shaking your head now, but try to fight the ignorance... Your
codes performance depends on it.

;)

SenderX

unread,

Jun 18, 2003, 8:59:51 PM6/18/03

to

> Actually if you no writes going on, the reader lock-free performs no
better
> than a simple mutex or conventional rwlock. On a uniprocessor anyway.
> But that appears to be worst case for it.

Yes, any read ( reader ) / write lock under heavy write contention should
perform as good as an OS provided mutex. I know mine does.

> It's a linked list implementation along the lines of AWTEventMulticaster
kind
> of lock-free stuff.

A lock-free list for the waiters?

> You can iterate through the list while nodes are being
> deleted.

Do you achieve this without using atomic_ptr?

I have been tinkering with double-linked lists. I can atomically CAS 2
adjacent nodes when I use cmpxchg8b since my new lock-free system allows for
a node /w it's aba count to reside in a single DWORD. This makes lock-free
linked lists much easier, because I can CAS a nodes next and prev pointer
atomically.

> You can even delete nodes without having to reiterate throught the
> list with the lock held.

Nice.

Can you please cite some docs on AWTEventMulticaster internals?

SenderX

unread,

Jun 18, 2003, 9:09:10 PM6/18/03

to

> The problem is, they're
> not documented as providing ordering between different variables

Interlocked API's are not designed to prevent such a race-conditions. When
you say this race-condition is not documented, you are wrong.

/* Init */

LONG l1 = 0, l2 = 2;

/* Thread A: */
InterlockedIncrement( &l1 );
InterlockedDecrement( &l2 );
InterlockedIncrement( &l1 );
InterlockedDecrement( &l2 );

/* Threads B-D /*
Various Interlocked ops on l1 and l1

If you expect that Thread A will see that l1 == 2 && l2 == 0, then you need
to learn about race-conditions!

This behavior is documented in Intel systems docs and in other various
places, I can cite some if you want.

David Schwartz

unread,

Jun 18, 2003, 10:33:00 PM6/18/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:zE7Ia.982664$OV.1084077@rwcrnsc54...

> > Okay, you're a troll.

> You say that the Interlocked API's are unsafe, and may break on upcoming
> IA-32, there slower than critical sections, ect... Which is all totally
> false.

No, you imagine I say these things. Please cite me, word for word, where
I say that the Interlocked API's are unsafe or where I say they are slower
than critical sections.

You are totally imagining that I'm saying these things.

Question for you: How many cycles do you think this takes on a P4 (user
mode code, SMP):

optimized_spinlock_acquire();
i++;
j++;
optimized_spinlock_release();

And how many do you think this takes:

InterlockedIncrement(&i);
InterlockedIncrement(&j);

Note that my spinlock code provides an additional guarantee that the two
InterlockedIncrement's do not. But assume you don't need that guarantee --
we just care which is faster.

> You call me a troll? I was trying to tell you how current C volatile and
the
> Interlocked API's act, in the beginning of the discussion.

How much clearer can I make it, I don't care how they act. I care how
they're documented to act.

What they just happen to do on a current processor you happened to test
on is of absolutely no interest to me. I'd have to be an idiot to rely on
it. I care about what it does on processors you haven't tested on.

>You would not
> learn, and kept telling programmers that the Interlocked API's are bad.
You
> even said people could use current C volatile ( VC++ docs describe it ) to
> overcome a non-TSO processor.

*I* didn't say that, Microsoft did. I don't believe that Microsoft is
telling the truth. Please, quote me where I said that I believed that
volatiles are safe for non-TSO processors.

> Why should I let you tell other Windows programmers a bunch of crap like
> this?

I haven't said any of the things you accuse me of saying. Perhaps you
have a reading comprehension problem.

> There are a bunch of open source win32 apps that use the Interlocked API's
> in an effective manor. Like, every single threaded COM component out
there.
> See how they use the Interlocked API's, and learn to use them in your
code.
> They will increase your apps throughput.

Assuming those other applications are using them *correctly*. Where
'correctly' means, 'only relying on documented behavior'.

> Do you realize that CRITICAL_SECTIONS, and many other kernel internals use
> the Interlocked API's?

How does that help me? I presume that microsoft will re-implement these
things if that's required by a future processor. Perhaps on a future
processor it won't be possible to implement CRITICAL_SECTIONS in terms of
the Interlocked API's, and Microsoft will do it some other way. How things
happen to work now is not good enough for me, I need API guarantees so that
my code will work in the future.

> I can see you shaking your head now, but try to fight the ignorance...
Your
> codes performance depends on it.

You are so completely full of yourself that you can't see what's right
in front of you no matter how many times I rub your nose in it.

DS

Frank Cusack

unread,

Jun 19, 2003, 4:39:26 AM6/19/03

to

On Thu, 19 Jun 2003 01:09:10 GMT "SenderX" <x...@xxx.xxx> wrote:
>> The problem is, they're
>> not documented as providing ordering between different variables
>
> Interlocked API's are not designed to prevent such a race-conditions.

Thank god. You finally get it.

Now all you have to understand is that C volatile does guarantee ordering.

/fc

Joseph Seigh

unread,

Jun 19, 2003, 4:53:17 AM6/19/03

to

It doesn't. You are misapplying the C standard to situations not covered by
the standard.

Joe Seigh

David Schwartz

unread,

Jun 19, 2003, 4:55:03 AM6/19/03

to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3EF17B7A...@xemaps.com...

> It doesn't. You are misapplying the C standard to situations not covered
by
> the standard.

What the hell does the C standard have to do with anything? We're not
talking about a generic platform, we're talking about WIN32 and VC++ where
'volatile' is specifically documented as providing this. (See the many times
I quoted the section.)

DS

Joseph Seigh

unread,

Jun 19, 2003, 5:30:12 AM6/19/03

to

SenderX wrote:
>
> > It's a linked list implementation along the lines of AWTEventMulticaster
> kind
> > of lock-free stuff.
>
> A lock-free list for the waiters?

Lock-free for the readers.

>
> > You can iterate through the list while nodes are being
> > deleted.
>
> Do you achieve this without using atomic_ptr?

Standard technique of using GC to make linked list traveral safe.
Almost any form of GC will work. Java GC, atomic_ptr, RCU, etc...

>
> I have been tinkering with double-linked lists. I can atomically CAS 2
> adjacent nodes when I use cmpxchg8b since my new lock-free system allows for
> a node /w it's aba count to reside in a single DWORD. This makes lock-free
> linked lists much easier, because I can CAS a nodes next and prev pointer
> atomically.
>
> > You can even delete nodes without having to reiterate throught the
> > list with the lock held.
>
> Nice.
>

Actually, that's a result of using a doubly linked list internally. GC
guarantees that the reference is still valid when the list mutex is
acquired, even if the referenced item is no longer in the list. You can't
do that with conventional read/write locks. Even for a promoteable lock,
if the promote from read to write fails, the reference becomes invalid and
your iterator becomes invalid as well.

> Can you please cite some docs on AWTEventMulticaster internals?

It's in the source for the Java JDK but you don't want to look at it
unless you want to see what a linked list implemented by Lisp programmers
looks like.

Here's some of the test case source for the reader lock-free stuff
to give you an idea of what using it is like.

DWORD WINAPI ftest2(void * xxx) {
int j, k;
lfq_iterator_raw<t1> ndx1;

for (j = 0; j < n; j++) {
(j%m == 0) ? rw.wrlock() : rw.rdlock();
k = 0;
for (ndx1 = *q; ndx1.exists(); ndx1.next()) {
if (j%m == 0 && k++ == 5) {
InterlockedIncrement(&match1);
if (q->dequeue(ndx1)) {
InterlockedIncrement(&match2);
q->push(t1(-1));
}
Sleep(10);
}
}
rw.unlock();
}

return 0;
}

DWORD WINAPI ftest1(void * xxx) {
int j, k;
lfq_iterator<t1> ndx1;

for (j = 0; j < n; j++) {
k = 0;
for (ndx1 = *q; ndx1.exists(); ndx1.next()) {
if (j%m == 0 && k++ == 5) {
InterlockedIncrement(&match1);
if (q->dequeue(ndx1)) {
InterlockedIncrement(&match2);
q->push(t1(-1));
}
Sleep(10);
}
}
}

return 0;
}

ftest1 uses the lock-free iterator and ftest2 uses the non lock-free iterator.
The threads were multiplexed between reading and writing (every m'th iteration).
Iterators were not based on conventional C++ iterator design which is all but
useless for multi-threading. It's a little closer to Java style without its
"autoincrement" semantics. Also the dequeue method is a queue method, not
iterator method like Java's remove method.

Joe Seigh

Frank Cusack

unread,

Jun 19, 2003, 5:39:33 AM6/19/03

to

On Thu, 19 Jun 2003 08:53:17 GMT Joseph Seigh <jsei...@xemaps.com> wrote:
> It doesn't. You are misapplying the C standard to situations not covered by
> the standard.

Here we go again ... I should have remained cloaked! shields up

/fc

Joseph Seigh

unread,

Jun 19, 2003, 5:47:25 AM6/19/03

to

But we haven't seen any concrete evidence that VC++ does do this to support
your interpretation of the documentation.

Joe Seigh

David Schwartz

unread,

Jun 19, 2003, 5:50:44 AM6/19/03

to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message

news:3EF18829...@xemaps.com...

> But we haven't seen any concrete evidence that VC++ does do this to
support
> your interpretation of the documentation.

More ass-backwards reasoning. If the implementation doesn't do what the
documentation says, then either the implementation or the documentation is
in error. Yes, you can insist that somehow the documentation must be correct
no matter what we observe, we just have to figure out the right way to
interpret it. So whatever behavior we observe, we then interpret the
documentation to have meant that.

Perhaps you were a biblical literalist in a past life.

I will say this, if the documentation doesn't allow us to predict what
the product does, then the documentation is either inadequate or erroneous.
In any event, the documentation clearly states that reads and writes take
place in the order they're written.

DS

Joseph Seigh

unread,

Jun 19, 2003, 6:41:21 AM6/19/03

to

It's your interpretation of what is meant by reads and writes that is in question.
We don't interpret read and write and meaning the same thing that you do. And
you haven't convinced us that your interpretation is correct and ours is not.

One of us is right and one of us is wrong in the interpretation. But if it turns
out we are wrong, our code does not break.

Joe Seigh

David Schwartz

unread,

Jun 19, 2003, 6:55:45 AM6/19/03

to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message

news:3EF194CD...@xemaps.com...

> It's your interpretation of what is meant by reads and writes that is in
question.

From the perspective of a C programmer, a 'read' or 'write' can't mean
anything else. Tell me what you think the documentation for a C/C++ language
construct could mean by a "read" or "write" to an object.

> We don't interpret read and write and meaning the same thing that you do.
And
> you haven't convinced us that your interpretation is correct and ours is
not.

Tell me, in the context of a C-language construct, what is a 'read' and
what is a 'write'?

> One of us is right and one of us is wrong in the interpretation. But if
it turns
> out we are wrong, our code does not break.

Maybe I'm not explaining myself well, because this is yet another in a
long series of misunderstandings over things I really thought I made clear.

Suppose somebody says "2+2=5". My position would be "No, 2 plus 2 is not
5, this person is wrong". Would you reply "No, he must have meant something
else by 2 or by 5, you are misunderstanding his definition of 5. And in any
event, if you are wrong, all your addition will come out wrong."

I don't *believe* the documentation. I know what it's saying isn't true.
But I am quite certain that I know what it says.

Let me ask you a question. Suppose you have some new super volatile
qualifier that used the appropriate processor voodoo magic such that it did
ensure ordering on reads and writes across multiple threads. Wouldn't a good
way to document it be:

"With the 'supervolatile' keyword, the current value of an object is
always read at the point it is requested. In addition, the value of the
object is written immediately upon assignment."

If you'd like, I can quote the C standard on what a 'read' and 'write'
is, but somehow, I don't think it matters. Wouldn't a good way to describe a
system that never re-ordered memory requests be "reads are always performed
at the point they are requested, values are always written immediately upon
assignment."

Wouldn't you describe a weak ordered system as "reads are not always
performed at the point they are requested and values are not always written
immediately upon assignment". What the hell else could it possibly mean --
to a C/C++ programmer?

DS

Joseph Seigh

unread,

Jun 19, 2003, 7:21:07 AM6/19/03

to

David Schwartz wrote:
>
> "Joseph Seigh" <jsei...@xemaps.com> wrote in message
> news:3EF194CD...@xemaps.com...
>
> > It's your interpretation of what is meant by reads and writes that is in
> question.
>
> From the perspective of a C programmer, a 'read' or 'write' can't mean
> anything else. Tell me what you think the documentation for a C/C++ language
> construct could mean by a "read" or "write" to an object.
>

...

> Wouldn't you describe a weak ordered system as "reads are not always
> performed at the point they are requested and values are not always written
> immediately upon assignment". What the hell else could it possibly mean --
> to a C/C++ programmer?
>

It doesn't mean anything unless it says under what conditions, and how, the
effects of the first thread's actions (reads and writes) are observable to
other threads.

You could do this is a formal memory model which would define the effects
of formally defined "read" and "write" terms on memory state and use that
to make programatic inferences. It's more indirect and difficult to use,
but it would work. Java does that. But I haven't seen memory models defined
anywhere else and that includes Posix and win32.

Joe Seigh

David Schwartz

unread,

Jun 19, 2003, 7:26:26 AM6/19/03

to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message

news:3EF19E20...@xemaps.com...

> It doesn't mean anything unless it says under what conditions, and how,
the
> effects of the first thread's actions (reads and writes) are observable to
> other threads.

So long as the read is performed at the point it is requested and the
write is also performed at the point it is requested, this is not an issue.
This is only an issue if reads can be performed earlier than the point they
were requested and writes can be performed later than the point they were
requested.

Please explain to me how it's possible to have memory visibility issues
if all reads and all writes are performed at the point they're requested.

DS

David Butenhof

unread,

Jun 19, 2003, 7:58:40 AM6/19/03

to

Alexander Terekhov wrote:

> SenderX wrote:
> [...]
>> I would like to be able to use your code for a portable compare-exchange
>> ( attempt_update? ).
>
> First, you'd have to convince Butenhof that neither pthread_cond_signal()
> nor pthread_cond_broadcast() need to impose ANY restrictions on COMPILER
> induced reordering. The DR of mine seeking [trillions.. err. ok ;-) ] to
> remove CV signaling functions from XBD/4.10 was also rejected -- result
> of the DRB's "verdict", IIRC.

Sometimes, Alexander, you can be such a... well, OK, I won't use any of the
appropriate words here. And I was starting to think that perhaps you'd
"mellowed" a little over the past year or so...

First off, you act as if this were my personal decision, and that's silly.
While I grant my opinion is influential in the working group, I do NOT make
decisions for them. I am merely a consultant.

Second, I don't object in principle to the idea you raised. In fact, I
initially supported your suggestion. However, the working group had
significant concerns about any possible (currently unscoped) effects of
this change on the portability of existing applications. There is, after
all, no point in MAKING such a change if it cannot affect the behavior of
an application.

Like several other POSIX issues that have arisen lately, had this come up
during the initial POSIX working group discussions, I would have agreed. I
don't see any benefit, and some risk, to making such a change at this late
date.

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

David Schwartz

unread,

Jun 19, 2003, 8:04:47 AM6/19/03

to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3EED6872...@web.de...

> Wrong. IA-32 does speculative reads, to begin with.

Yes, it does, but it 'magically' acts as if it doesn't. Exactly how is
processor voodoo, but the general idea is that the speculative read is
pegged to the L1 cache line(s?) it read from. If the cache line is
invalidated, so is the speculative read (causing a stall and a new ready to
be issued to reacquire the cache line). Intel has specifically stated that
this behavior may disappear in a future IA-32 processor, so you can only
rely on it is a processor-specific optimization for existing processors.

(I'm not 100% sure I have this correct, so please correct me if I'm
wrong.)

DS

David Butenhof

unread,

Jun 19, 2003, 8:32:11 AM6/19/03

to

David Schwartz wrote:

>> You call me a troll? I was trying to tell you how current C volatile and
>> the Interlocked API's act, in the beginning of the discussion.
>
> How much clearer can I make it, I don't care how they act. I care how
> they're documented to act.
>
> What they just happen to do on a current processor you happened to
> test on is of absolutely no interest to me. I'd have to be an idiot to
> rely on it. I care about what it does on processors you haven't tested
> on.

Idiocy, David, is very much in the eyes of the beholder. I'm afraid you need
to step back and realize who you're talking to.

"SenderX" is no deliberate troll, but merely an inexperienced "seat of the
pants" Windows programmer. Have you been reading the "SenderX files" in
this newsgroup? SenderX is extremely clever and creative, but actually
believes (or at least writes and acts as if believing) that one can PROVE
the correctness of a parallel algorithm by TESTING on available systems.
This should really explain it all, yes?

You live in different realities, and in the long and rambling course of this
exchange it's become clear that SenderX will leave the SenderX dimension
only "in the fullness of time" as (and if) programming maturity arrives,
whereas you're unwilling to enter the SenderX dimension even for a brief
visit. You won't agree, because your frames of reference don't overlap.
Live with it. ;-)

And, oh my g*d, I can't believe I actually put my foot into this particular
discussion. I'm quite sure I'll never get the gook off my shoe...

Joseph Seigh

unread,

Jun 19, 2003, 9:05:00 AM6/19/03

to

It might help to show a example of what explicit thread semantics looks like.
The general process is that you formally define "visibility". You then define
rules, ie. actions which have an effect on visibility. If your rules are
sufficient and necessary then they will cover all conceivable uses of your
construct.

So, for atomic<T> as I would do it, first I probably use the definition of
visibility I defined for formal mutex semantics way back (cleaned up a
little). Possibly define atomicity if that is needed.

One rule (an easy one) would be: If thread A sees a value stored into
an atomic<T> variable by thread B, then all values, visible to thread B
before that store, are visible to thread A after its load.

Note that threads are explicitly mentioned and the conditional if. You
are not guaranteed to see the store in any fixed amount of time. Just
that if you do see it then you can make certain inferences.

If all you are doing is immutable objects then that is probably the only
rule that you need. You could do DCL with that.

Joe Seigh

David Schwartz

unread,

Jun 19, 2003, 1:58:37 PM6/19/03

to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message

news:3EF1ABDB...@xemaps.com...

> It has to explicitly say other threads.

It does. Here's what the same page I've been quoting says right at the
top: "The volatile keyword is a type qualifier used to declare that an
object can be modified in the program by something such as the operating
system, the hardware, or a concurrently executing thread."

> Volatile in C has classically been
> defined with respect to a single thread, signals running in that threads
> signal handler (which is really on the same thread) and with respect to
> i/o memory* where accesses have to be in order and not combined (by
optimization).
> Unfortunately it never said anything about other threads and since there
> was a workaround in the form of Posix and other mutexes, there has never
> been a huge amount of pressure to fix this.

How many times have I said, we are not talking about classic C. We are
talking specifically about VC++ and WIN32.

You are defending the undefendable. I understand, you don't want to
believe that it really is as bad as I'm saying it is. But it is. Really.

DS

SenderX

unread,

Jun 19, 2003, 3:14:56 PM6/19/03

to

> "SenderX" is no deliberate troll, but merely an inexperienced "seat of the
> pants" Windows programmer.

I am not inexperienced when it comes to Microsoft programming, I have many
years of Win32-IOCP/COM under my belt.

> that one can PROVE
> the correctness of a parallel algorithm by TESTING on available systems.

The test loads I put my algos under are extremely high on SMP IA-32, and I
have a friend who will let me test an algo for weeks on end under extreme
load.

Plus, the library I am working on is a " test " library. I thought by
posting the source, others would test and tinker, and give me some real good
feedback.

> SenderX will leave the SenderX dimension
> only "in the fullness of time" as (and if) programming maturity arrives,

Not exactly sure if I want to leave the lock-free world yet. ;)

My test library has improved the thread to thread communication for many app
sources that I have access to. Many people have been very impressed with it.

I just don't like Dave telling Win32 people that the Interlocked API's might
not work on upcoming processors.

SenderX

unread,

Jun 19, 2003, 3:37:24 PM6/19/03

to

> It does. Here's what the same page I've been quoting says right at the
> top: "The volatile keyword is a type qualifier used to declare that an
> object can be modified in the program by something such as the operating
> system, the hardware, or a concurrently executing thread."

In order for you to be correct, VC++ would have to inject fences on volatile
var access. It currently does not do that.

If you can provide us with some disassemblies of VC++ volatile access that
proves otherwise, please do so.

SenderX

unread,

Jun 19, 2003, 4:08:19 PM6/19/03

to

It looks like the threads are going through a shared linked list, and
pushing and popping on write access, or at any time using lock-free. Is that
right?

Is the list singly-linked?

> Almost any form of GC will work. Java GC, atomic_ptr, RCU, etc...

If you use atomic_ptr for the linked-list internals, would you need to use
an ABA proof CAS? Or can you bypass that?

I could use my lock-free stuff for this, but it would have to get nodes from
static pools. So the total nodes in the list would be limited to 3 or 4
hundred thousand. However, since it uses static storage a node can never be
deleted when other threads are using it...

David Schwartz

unread,

Jun 19, 2003, 4:20:29 PM6/19/03

to

"SenderX" <x...@xxx.xxx> wrote in message

news:UhoIa.74801$YZ2.245413@rwcrnsc53...

> > It does. Here's what the same page I've been quoting says right at the
> > top: "The volatile keyword is a type qualifier used to declare that an
> > object can be modified in the program by something such as the operating
> > system, the hardware, or a concurrently executing thread."

> In order for you to be correct, VC++ would have to inject fences on
volatile
> var access. It currently does not do that.

What I am saying is totally independent of any particular
implementation. I am talking about the consequences of what the
documentation says. I am not saying "this is what VC++ does", I am saying,
"this is what the documentation says VC++ does".

Suppose the manual for my car says "turning the key to the off position
will shut the engine off". So we go out to the car and try it, and turning
the key to the off position leaves the engine running. I would conclude that
either the manual is wrong or the car is broken. You would, apparently,
conclude that I must have misunderstood what the manual meant, and perhaps
the engine being off doesn't mean it stops running, after all, every master
mechanic will tell you that an engine can continue running even if it's shut
off.

The point is, manuals aren't written for master mechanics. If you have
to be a master mechanic to understand the manual, the manual is broken. And
even a master mechanic could reason, "just cutting the power isn't enough to
shut the engine off, so if the manual is right, the key must do something
more than turning off the power". The mechanics reasoning is correct even
though the conclusion he draws is false, but that's because his conclusion
is qualified by a premise "if the manual is right".

You would conclude the master mechanic was wrong. I would conclude the
documentation is wrong.

> If you can provide us with some disassemblies of VC++ volatile access that
> proves otherwise, please do so.

*sigh* You just don't get it. The methodology you are suggesting is just
plain wrong.

You are saying that I should find out what 'volatile' is *supposed* to
do by seeing what it actually does. I am saying that you should find out
what it's supposed to do from the documentation. If it doesn't actually do
what the documentation says it does, then either the documentation or the
implementation is broken.

If we figured out what things were supposed to do based upon what they
actually did, we would wind up writing code that broke with every update.
Things should continue to do what they're supposed to do, but may not
continue to do what they did.

DS

Joseph Seigh

unread,

Jun 19, 2003, 4:43:07 PM6/19/03

to

SenderX wrote:
>
> It looks like the threads are going through a shared linked list, and
> pushing and popping on write access, or at any time using lock-free. Is that
> right?
>
> Is the list singly-linked?
>
> > Almost any form of GC will work. Java GC, atomic_ptr, RCU, etc...
>
> If you use atomic_ptr for the linked-list internals, would you need to use
> an ABA proof CAS? Or can you bypass that?
>

Just read access does not require a lock. Write access uses a conventional
mutex.

For linked data strutures, the basic technique for removing a node is to make the
node unreachable (remove the links to it but not from it), wait until the node
is no longer being referenced and then delete it (GC).

If the structure is not being updated that often and most of the access is
read access, then using this technique can have considerable advantages even
though mutexes are still being used for the update.

For linux (see http://lse.sourceforge.net/locking/rcupdate.html Scaling the
dentry cache) "Using RCU (and lazy-lru algorithm), we could do lock-less lookup
for dentries and bring down the contention for dcache_lock while running dbench
considerably from 16.5% to 0.95% on an 8-way SMP box."

Of course that 16.5% might not have been as bad if they were using conventional
reader/writer locks to begin with. I came up with an reader/writer spin lock
that was as efficient as a simple spin lock and had fifo service order. But
there was no need for it because not having a reader/writer lock to begin
with, even read access was kept short. This was for VM but linux was
probably the same.

Of course, even if your mutex contention is low, p being the probablility of
contention for a single processor, contention for n processors is roughly
1 - (1 - p)**n which gets large quickly as n increases.

Joe Seigh

SenderX

unread,

Jun 19, 2003, 4:49:33 PM6/19/03

to

" The basic technique is to use regular pointers for the data structure in
question, e.g. linked list, and a gc pointer based collector object. "

Very interesting...

" For write access, in addition to getting write synchronization if
lock-free isn't being used, the
writer updates the data structure (the techniques for updating the data
structure are discussed
here. They're data structure dependent), allocates a new collector object
and swaps it for the
old collector object, queues any nodes removed from the data structure onto
the old collector
object, and drops the reference to the old collector object. "

The collector reference counting would use atomic_ptr correct?

Would this mimic your algo?

/* Global */

atomic_ptr< C_Collector > g_Collector( new C_Collector );

STACK g_Stack;

Stack pop operation:

/* Local */

STACK LocalNode;

local_ptr< C_Collector > pNewCollector( new C_Collector )

local_ptr< C_Collector > pLocalCollector = g_Collector;

/* Pop a node */
do
{
LocalNode = g_Stack;

}

while( ! CAS64( &g_Stack,
LocalNode,
LocalNode.pNode->pNext;
LocalNode.lAba + 1 ) );

/* Swap the collectors */
while( ! g_Collector.cas
( pLocalCollector,
pNewCollector ) )
{
pLocalCollector = g_Collector;
}

/* Queue the popped node on the old collector */
pLocalCollector.QueueNode( LocalNode.pNode );

/* Remove references */
pLocalCollector = NULL;
pNewCollector = NULL;

Is that even on the right track?

Michael Furman

unread,

Jun 19, 2003, 5:46:08 PM6/19/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:bct5ud$jcg$1...@nntp.webmaster.com...

>
> "SenderX" <x...@xxx.xxx> wrote in message
> news:UhoIa.74801$YZ2.245413@rwcrnsc53...
>
> > > It does. Here's what the same page I've been quoting says right at the
> > > top: "The volatile keyword is a type qualifier used to declare that an
> > > object can be modified in the program by something such as the
operating
> > > system, the hardware, or a concurrently executing thread."
>
> > In order for you to be correct, VC++ would have to inject fences on
> volatile
> > var access. It currently does not do that.
>
> What I am saying is totally independent of any particular
> implementation. I am talking about the consequences of what the
> documentation says. I am not saying "this is what VC++ does", I am saying,
> "this is what the documentation says VC++ does".

It does not say that:
> [...]
> "Objects declared as volatile are not used in optimizations because
their
> value can change at any time. The system always reads the current value
of
a
> volatile object at the point it is requested, even if the previous
> instruction asked for a value from the same object. Also, the value of
the
> object is written immediately on assignment."

It is talking about instructions that are generating by compiler.
"system always reads the current value... " means that compiler generates
instructions that access the address. It cannot mean more because it
is out of scope of the compiler. For example, some memory locations
could be mapped to something else then memory (memory mapped I/O)
so it could not be any guarantee about value that is being read.

What it says that in case of volatile compiler is not allowed to move the
instruction
that accessing the variable (or just eliminate it using value that was read
earlier).

Regards,
Michael

[I've added comp.lang.c++ group]

David Schwartz

unread,

Jun 19, 2003, 5:54:32 PM6/19/03

to

"Michael Furman" <Michae...@Yahoo.com> wrote in message
news:bctav2$m8brm$1...@ID-122417.news.dfncis.de...

> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:bct5ud$jcg$1...@nntp.webmaster.com...

> > What I am saying is totally independent of any particular

> > implementation. I am talking about the consequences of what the
> > documentation says. I am not saying "this is what VC++ does", I am
saying,
> > "this is what the documentation says VC++ does".

> It does not say that:
> > [...]
> > "Objects declared as volatile are not used in optimizations because
> > their
> > value can change at any time. The system always reads the current
value
> > of
> > a
> > volatile object at the point it is requested, even if the previous
> > instruction asked for a value from the same object. Also, the value of
> > the
> > object is written immediately on assignment."

> It is talking about instructions that are generating by compiler.

NO. A C/C++ document is not allowed to do this. It must say what the
instructions actually *DO*, not what the instructions *ARE*.

> "system always reads the current value... " means that compiler generates
> instructions that access the address.

In other words, it doesn't mean that the system reads the current value.

> It cannot mean more because it
> is out of scope of the compiler.

No, it's not. The compiler could use locks or fences or otherwise ensure
that "the value ... is read at the point it is requested". This is not an
impossible thing to do.

> For example, some memory locations
> could be mapped to something else then memory (memory mapped I/O)
> so it could not be any guarantee about value that is being read.

I don't know what you mean. How can something qualified 'volatile' to a
C/C++ compiler be memory mapped unless I do something outside the scope of
the compiler? I'm talking about well-behaved applications that do no magic
behind the compiler's back. Obviously, if you do something crazy, then the
gaurantees might not hold.

> What it says that in case of volatile compiler is not allowed to move the
> instruction
> that accessing the variable (or just eliminate it using value that was
read
> earlier).

How can it possibly mean that? This is C/C++ documentation. The only
instructions are C/C++ instructions. This is not an assembler document, this
is an explanation of what the C/C++ 'volatile' qualifier does to/for C/C++
code on this platform.

DS

David Schwartz

unread,

Jun 19, 2003, 5:55:39 PM6/19/03

to

Oh, one other thing: NO MORE FOLLOWUPS TO COMP.LANG.C++! This is not a
C++ issue, this is a WIN32/VC++ issue!

Please do not be mislead by the subject! This is about the WIN32
Interlocked API and specific guarantees about 'volatile' that VC++
documentation makes. I know what 'volatile' means in standard C++ and it's
something totally different!

DS

SenderX

unread,

Jun 19, 2003, 5:56:10 PM6/19/03

to

> And how do you think you'll get those "VARS" QWORD-aligned
> on a 32bit machine (ignoring that this isn't C-code) ?

ULARGE_INTEGER, SLIST_HEADER will work with cmpxchg8b on SMP systems ( which
InterlockedCompareExchange64 uses on 32-bit systems ).

I believe its safe to use malloc for the SLIST_ENTRY and SLIST_HEADERS on
SMP systems.

They only need to be QWORD aligned if your running on SMP systems.

> Why do you think an acquire-barrier is necessary here ?

This is a lock-free algo, I don't want to swap stale data on weak-order
systems?

> - Why do you think the CAS-function with release-semantics
> is necessary here ? Or did you simply use it because there
> isn't any non-release variant of this function ?

It better than using the normal 64-bit CAS Server 2003 API which uses a full
mfence. Alls I want is a guaranteed store ( release ), on non-TSO
processors.

> - Sure you would use this function which is available only in
> Windows Server 2003 ?

It is not only available in Windows Server 2003, because you can code this
exact function on your own using the same CAS opcode. ( cmpxchg8b on IA-32,
cmpxchg on IA-64 ).

There are several CAS64 functions posted on this news group.

Joseph Seigh

unread,

Jun 19, 2003, 6:01:48 PM6/19/03

to

SenderX wrote:
>
> " The basic technique is to use regular pointers for the data structure in
> question, e.g. linked list, and a gc pointer based collector object. "
>
> Very interesting...
>
> " For write access, in addition to getting write synchronization if
> lock-free isn't being used, the
> writer updates the data structure (the techniques for updating the data
> structure are discussed
> here. They're data structure dependent), allocates a new collector object
> and swaps it for the
> old collector object, queues any nodes removed from the data structure onto
> the old collector
> object, and drops the reference to the old collector object. "
>
> The collector reference counting would use atomic_ptr correct?

ummm... yeah, but it wasn't that simple as it turned out.

>
> Would this mimic your algo?
>

(snip)
no.

Actually, you have to back link the collector objects so that newer collector objects
which get created by subsequent node deletions stay alive as long as the oldest
collector. So it would look something like this

I2 I1
| |
V V
X ==> Q3 <== Q2 <== Q1
| |
v v
n1 n2...

Q3 is the current collector object, X is a reference counted pointer, atomic_ptr, to
it from the parent object. Q2 and Q1 are older collector objects back linked
with atomic_ptr and with deleted nodes n1, n2, etc..., queued using normal ptrs. The
destructors for collector object know to dequeue and delete the nodes. I2 and I1
are local_ptr references from active interators.

If the iterator for I1 finishes and drops, in this case the only, reference to Q1,
Q1 and n2 etc all get deleted.

If the iterator for I2 finishes before I1, I2 drops its reference but Q2 stays alive
until Q1 gets deleted, so the I1 iterator can safely access n1 if it needs to.

When an iterator get initialized, it simply gets a reference to the current collector
object from X.

There's some optimizations in the current prototype not shown here but that's the general
logic.

Joe Seigh

Stephen Howe

unread,

Jun 19, 2003, 10:37:32 PM6/19/03

to

> > > The C
> > > standard specifically says that what 'volatile' does is implementation
> > > defined, and I've cited the implementation documentation to you.
>
> > I cited the VC++ docs to Oliver S. in the previous thread.
>
> Good, so we both know that they *say* that 'volatile' provides
ordering
> guarantees.

Microsoft can claim what they like. The fact is that "volatile" provides not
sufficently strong enough guarantees for ordering.
This has been discussed to death on comp.lang.c++.moderated
Both ISO C99 and ISO C++98 are too weak to guarantee strong enough order and
the presence of "volatile" does not ensure that either.

See thread (no pun intended)
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=MPG.191d9257dfa5e
bf79896db%40news.hevanet.com&rnum=1&prev=/groups%3Fq%3DScott%2BMeyers%2Bdoub
le%2Bchecked%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3DMPG.191d9257dfa5ebf7989
6db%2540news.hevanet.com%26rnum%3D1

Watch the thread as people suggest volatile as an "answer".

The bottom line of this is that it is not possible to write portable C or
C++ code that does have sufficiently strong enough guarantees to useful for
threading. volatile provides some guarantees but not enough. And that means
using assembler to make such guarantees.

Some of the standards guys have wondered whether C and C++ should be
tightened up to make it useful. But doing so would "break" perfectly useful
compilers.

Stephen Howe

unread,

Jun 19, 2003, 10:48:43 PM6/19/03

to

> More ass-backwards reasoning. If the implementation doesn't do what
the
> documentation says, then either the implementation or the documentation is
> in error. Yes, you can insist that somehow the documentation must be
correct
> no matter what we observe, we just have to figure out the right way to
> interpret it. So whatever behavior we observe, we then interpret the
> documentation to have meant that.

I doubt whether Microsoft document "volatile" to have a stronger meaning
that the one that C and C++ standards gives it.
If you infer that from Microsoft's documentation, then it as you say, their
documentation is inadequate.

Stephen Howe

unread,

Jun 19, 2003, 10:53:12 PM6/19/03

to

> How many times have I said, we are not talking about classic C. We are
> talking specifically about VC++ and WIN32.

Do you mean Microsoft's C compiler or C++ compiler? Not many people know
that they ship 2 compilers with VC, a C compiler (which has not had an
amazing amount of work on it to bring it near ISO C99) and a C++ compiler
(which has had a lot of work on it, pretty close to ISO C++98).

The only claims I am interested in is the claims you are making on
"volatile"

Stephen Howe

unread,

Jun 19, 2003, 11:01:51 PM6/19/03

to

> > If you can provide us with some disassemblies of VC++ volatile access
that
> > proves otherwise, please do so.
>
> *sigh* You just don't get it. The methodology you are suggesting is
just
> plain wrong.
>
> You are saying that I should find out what 'volatile' is *supposed* to
> do by seeing what it actually does. I am saying that you should find out
> what it's supposed to do from the documentation. If it doesn't actually do
> what the documentation says it does, then either the documentation or the
> implementation is broken.

Or another possibility is that you are reading more into Microsofts
documentation than is there.
Their documentation is not wordsmithed to the same quality as an ISO
document where every comma, every word, every reference is checked. So you
might be infering things that the imprecise author never intended to convey.

I am 100% certain that if Microsoft meant that their "volatile" provided
extra guarantees on ordering more than the C and C++ standard does on
"volatile" they would have said so specifically. The fact that they are
silent means that their C++ compiler provides no additional guarantees
beyond what standard C++ does.

> If we figured out what things were supposed to do based upon what they
> actually did, we would wind up writing code that broke with every update.

Well you are right there. Code is supposed work based on the abstract
machine in the standards. If you peered at assembler code generated, how
would that guide you? The code generator could have a bug.

Stephen Howe

David Schwartz

unread,

Jun 19, 2003, 11:39:30 PM6/19/03

to

"Stephen Howe" <NOSPAM...@dial.pipex.com> wrote in message
news:3ef276e5$0$11385$cc9e...@news.dial.pipex.com...

> > How many times have I said, we are not talking about classic C. We
are
> > talking specifically about VC++ and WIN32.

> Do you mean Microsoft's C compiler or C++ compiler? Not many people know
> that they ship 2 compilers with VC, a C compiler (which has not had an
> amazing amount of work on it to bring it near ISO C99) and a C++ compiler
> (which has had a lot of work on it, pretty close to ISO C++98).

C++.

> The only claims I am interested in is the claims you are making on
> "volatile"

My claims are specifically limited to what Microsoft *says* volatile
does in C++ code on WIN32 compiled with VC++. I am not talking about what it
should do or what it actually does.

DS

David Schwartz

unread,

Jun 19, 2003, 11:37:21 PM6/19/03

to

"Stephen Howe" <NOSPAM...@dial.pipex.com> wrote in message

news:3ef27339$0$11377$cc9e...@news.dial.pipex.com...

> > > > The C
> > > > standard specifically says that what 'volatile' does is
implementation
> > > > defined, and I've cited the implementation documentation to you.

> > > I cited the VC++ docs to Oliver S. in the previous thread.

> > Good, so we both know that they *say* that 'volatile' provides
> > ordering
> > guarantees.

> Microsoft can claim what they like.

Microsoft is the sole authoritative source for what 'volatile' does on
VC++ for WIN32.

> The fact is that "volatile" provides not
> sufficently strong enough guarantees for ordering.
> This has been discussed to death on comp.lang.c++.moderated
> Both ISO C99 and ISO C++98 are too weak to guarantee strong enough order
and
> the presence of "volatile" does not ensure that either.

Nobody was talking about ISO C99 or ISO C++98. We were specifically
talking about what 'volatile' does on VC++ for WIN32. The standards all say
that it's implementation defined, so the authoritative source is the
implementation documentation which specifically says that volatile, on that
platform with that compiler, is suitable for synchronization across threads
and provides ordering guarantees.

However:

1) The documentation is probably in error.

2) Even if it wasn't, using 'volatile' for synchronization between
threads is an extremely bad habit and will definitely not work on large
number of real, existent platforms.

> The bottom line of this is that it is not possible to write portable C or
> C++ code that does have sufficiently strong enough guarantees to useful
for
> threading. volatile provides some guarantees but not enough. And that
means
> using assembler to make such guarantees.

Fascinating, except we weren't talking about writing portable code.

DS

David Schwartz

unread,

Jun 19, 2003, 11:38:39 PM6/19/03

to

"Stephen Howe" <NOSPAM...@dial.pipex.com> wrote in message

news:3ef275d7$0$11385$cc9e...@news.dial.pipex.com...

> I doubt whether Microsoft document "volatile" to have a stronger meaning
> that the one that C and C++ standards gives it.
> If you infer that from Microsoft's documentation, then it as you say,
their
> documentation is inadequate.

Here is the main page on 'volatile' for C++:

The volatile keyword is a type qualifier used to declare that an object can
be modified in the program by something such as the operating system, the
hardware, or a concurrently executing thread.

volatile declarator ;The following example declares a volatile integer nVint
whose value can be modified by external processes:

int volatile nVint;Objects declared as volatile are not used in

optimizations because their value can change at any time. The system always
reads the current value of a volatile object at the point it is requested,
even if the previous instruction asked for a value from the same object.
Also, the value of the object is written immediately on assignment.

One use of the volatile qualifier is to provide access to memory locations
used by asynchronous processes such as interrupt handlers.

My aplogies for any formatting brain damage.

DS

David Schwartz

unread,

Jun 19, 2003, 11:42:32 PM6/19/03

to

"Stephen Howe" <NOSPAM...@dial.pipex.com> wrote in message

news:3ef278ec$0$11382$cc9e...@news.dial.pipex.com...

> I am 100% certain that if Microsoft meant that their "volatile" provided
> extra guarantees on ordering more than the C and C++ standard does on
> "volatile" they would have said so specifically. The fact that they are
> silent means that their C++ compiler provides no additional guarantees
> beyond what standard C++ does.

See the excerpt that specifically says that reads do not take place
before they're requested and writes do not take place after they're
requested. It also specifically says 'volatile' is suitable for concurrent
access from multiple threads.

> > If we figured out what things were supposed to do based upon what
they
> > actually did, we would wind up writing code that broke with every
update.

> Well you are right there. Code is supposed work based on the abstract
> machine in the standards. If you peered at assembler code generated, how
> would that guide you? The code generator could have a bug.

Well, at least it would tell you if the code generator had a bug. ;) But
you'd be a fool to rely on quirks of the code generated that weren't part of
the documented behavior.

DS

SenderX

unread,

Jun 20, 2003, 1:49:12 AM6/20/03

to

> Where did you find something like that in the Win32 specification; if it
> would exist, it should be named InterlockedCompareExchange64, but there's
> only an undocumented _InterlockedCompareExchange64-intrinsic of the MSC++
> compiler.

Your correct, I confused this with an API.

However, if I did use 2003 Interlocked API's I would have to choose between
acquire or release?

> Stale data ??? An acquire barrier simply forces the reads and writes
> following in the instruction-stream to be ordered behind a single ope-
> ration - but that's not necessary here as you're doing something com-
> pletely different than acquiring a lock-variable.

So, if I were popping the front off a lock-free stack:

do
{
LocalStack = SharedStack;

/* I would not need acquire, or a load fence here? */
}

while( ! CAS64( &SharedStack,
LocalStack,
LocalStack.pNode->pNext,
LocalStack.lAba + 1 ) );

/* And I would not need release, or a store fence here? */

Ziv Caspi

unread,

Jun 20, 2003, 4:02:37 AM6/20/03

to

On Wed, 18 Jun 2003 22:49:51 +0200, Alexander Terekhov
<tere...@web.de> wrote:

>The point is that SOURCE CODE [things
>ala broken DCSI/DCCI] "might" NOT work on the "fully-relaxed by
>default" future IA-32 systems.

I fully appreciate that.

>> [Disclaimer: This posting is provided "AS IS" with no warranties, and
>> confers no rights. Opinions said here represent my own, and not my
>> employer's. In addition, they are based only on publicly available
>> information.]
>
>Interesting. You're working for Wintel GmbH?

To save you the trouble of Googling my name ;-): I work for Microsoft
(but not on Windows or any compiler).

Ziv.

Ziv Caspi

unread,

Jun 20, 2003, 4:02:40 AM6/20/03

to

On Wed, 18 Jun 2003 17:10:57 -0700, "David Schwartz"
<dav...@webmaster.com> wrote:

>> This shows you do realize Microsoft can't say what the current
>> implementation of InterlockedIncrement will do on a future IA32
>> processor, because this depends on the behavior of such processors and
>> Intel isn't providing any guarantees.
>
> That is not true, Intel provides many guarantees. This is one case where
>they not only have not provided a guarantee but have explicitly stated that
>there is no such guarantee.

I fail to see what isn't true in the statement you've quoted.

>> Given that, if Microsoft provided you with documentation that says
>> "the InterlockedIncrement implementation in Windows XP (no SP) is
>> guaranteed to work on all future IA32 processors", you would be right
>> to be suspicious (or at least surmise that there are some guarantees
>> Intel provided Microsoft that were not made public).
>
> I don't doubt that it's guaranteed to work for it's documented purpose.
>But I'm being asked to use it for purposes for which it's not documented.

Again, I don't understand how this relates to my words.

Ziv

Alexander Terekhov

unread,

Jun 20, 2003, 4:48:37 AM6/20/03

to

David Schwartz wrote:
>
> "Alexander Terekhov" <tere...@web.de> wrote in message
> news:3EED6872...@web.de...
>
> > Wrong. IA-32 does speculative reads, to begin with.
>
> Yes, it does, but it 'magically' acts as if it doesn't. ...

I mean that (AFAICS -- see 7.2.2)

r1 = m1
r2 = m2

may be executed out of order (with respect to each other). The cache
effects (speculative prefetching, whatever) are totally irrelevant
once you have the loaded value in the register.

Oder?

regards,
alexander.

Alexander Terekhov

unread,

Jun 20, 2003, 4:50:28 AM6/20/03

to

David Butenhof wrote:
[...]
> Sometimes, Alexander, you can be such a... well, OK, I won't use any of the
> appropriate words here. And I was starting to think that perhaps you'd
> "mellowed" a little over the past year or so...

Never.

>
> First off, you act as if this were my personal decision, and that's silly.
> While I grant my opinion is influential in the working group, I do NOT make
> decisions for them. I am merely a consultant.

Understatement, again.

>
> Second, I don't object in principle to the idea you raised. In fact, I
> initially supported your suggestion. However, the working group had
> significant concerns about any possible (currently unscoped) effects of
> this change on the portability of existing applications. There is, after
> all, no point in MAKING such a change if it cannot affect the behavior of
> an application.

It helps neither implementors nor application programmers. It does
kinda irritate both. Yes. I'm sure.

>
> Like several other POSIX issues that have arisen lately, had this come up
> during the initial POSIX working group discussions, I would have agreed. I
> don't see any benefit, and some risk, to making such a change at this late
> date.

"Status quo, you know, that is Latin for ``the mess we're in.''"

regards,
alexander.

Alexander Terekhov

unread,

Jun 20, 2003, 5:07:33 AM6/20/03

to

SenderX wrote:
>
> > Intel already has MTRR stuff. Sure they'll preserve compatibility
> > for the "old" binary stuff. The point is that SOURCE CODE [things

> > ala broken DCSI/DCCI] "might" NOT work on the "fully-relaxed by
> > default" future IA-32 systems.
>

> If Intel designed a non-TSO IA-32, they would have the thousands and
> thousands of current apps that use IA-32 instructions in mind... Microsoft,
> and every other OS, would have to insure that their critical section API's
> provide the correct visibility on such a IA-32.

MS critical section is also brain-dead. No one in his right mind
would want to delay the allocation of resources (silly MS-event)
beyond the initial lock operation (wait until "contention" arise).
The "problem" is that MS folks have no clue not only with respect
to "thread-safety"... they also have no idea WRT exception-safety.

regards,
alexander.

SenderX

unread,

Jun 20, 2003, 5:26:49 AM6/20/03

to

> Actually, you have to back link the collector objects so that newer
collector objects
> which get created by subsequent node deletions stay alive as long as the
oldest
> collector. So it would look something like this

If you ran ( pushed - popped ) hundreds-of-thousands of nodes concurrently
through a linked-list... Would the collector back list start to get big, and
lag on freeing nodes?

Test code is saying that I would not need to back link old collectors, if
and only if I used this algo strictly for the lock-free LIFO and FIFO's,
since they do not iterate through themselves...

I will post the test code if it continues to work well. This algo has
already shown a performance improvement over a FIFO whose nodes are
protected by atomic_ptrs.

Joseph Seigh

unread,

Jun 20, 2003, 6:24:19 AM6/20/03

to

SenderX wrote:
>
> > Actually, you have to back link the collector objects so that newer
> collector objects
> > which get created by subsequent node deletions stay alive as long as the
> oldest
> > collector. So it would look something like this
>
> If you ran ( pushed - popped ) hundreds-of-thousands of nodes concurrently
> through a linked-list... Would the collector back list start to get big, and
> lag on freeing nodes?

Yes. Stack overflow is a real risk here since dtors are recurrsive. There's
stuff you can do to prevent that from happening. But these kind of algorithms
are meant for data structures with relatively low levels of modification.

Joe Seigh

David Butenhof

unread,

Jun 20, 2003, 7:39:16 AM6/20/03

to

Alexander Terekhov wrote:

> David Butenhof wrote:
> [...]
>> Sometimes, Alexander, you can be such a... well, OK, I won't use any of
>> the appropriate words here. And I was starting to think that perhaps
>> you'd "mellowed" a little over the past year or so...
>
> Never.

;-)

>> Like several other POSIX issues that have arisen lately, had this come up
>> during the initial POSIX working group discussions, I would have agreed.
>> I don't see any benefit, and some risk, to making such a change at this
>> late date.
>
> "Status quo, you know, that is Latin for ``the mess we're in.''"

Yep. And the more successful and widespread the mess, the stickier it gets.

When we defined the original standard we could change most anything at the
drop of a proverbial hat (which is probably why we never wore hats to the
meetings... except perhaps in the Snowbird ski resort... do ski caps
count?) because there were no official/acknowledged existing
implementations nor applications depending on the current wording.

This is no longer true. There are many implementations, and many
applications. We DO make incompatible changes -- but with great care. The
change needs to be an important fix, and ideally we should be fairly
confident that the potential impact on real world code is small.

Furthermore, there are two quite different revision cycles in the current
process. In a major revision of the standard we have substantially greater
freedom; but right now we're not working on a major revision, but only a
"Technical Corrigenda". We need to be a lot more careful right now.

The benefit of this change would be to allow a small increment of
performance on SOME implementations, with no substantial functional
improvement, and the risk of breaking existing applications that depend on
the currently stated memory visibility guarantees. This is something that
might be reconsidered for the next major revision cycle, but is definitely
not appropriate for a TC.

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

David Butenhof

unread,

Jun 20, 2003, 7:57:01 AM6/20/03

to

SenderX wrote:

>> "SenderX" is no deliberate troll, but merely an inexperienced "seat of
>> the pants" Windows programmer.
>
> I am not inexperienced when it comes to Microsoft programming, I have many
> years of Win32-IOCP/COM under my belt.

Yes, and you don't really BELIEVE there's anything out there but Win32 and
X86. That will come, perhaps, in time. ;-)

>> that one can PROVE
>> the correctness of a parallel algorithm by TESTING on available systems.
>
> The test loads I put my algos under are extremely high on SMP IA-32, and I
> have a friend who will let me test an algo for weeks on end under extreme
> load.

But you're missing the point. No amount of testing PROVES anything except
that you haven't yet exposed any flaws that exist. This is enough for a
"seat of the pants Windows" programmer, as I said, but not nearly enough to
satisfy someone writing code that must be both correct and portable. (Or at
least, as correct and portable as can be practically achieved.)

Your testing doesn't mean "nothing" -- but it doesn't mean nearly enough,
either.

> Plus, the library I am working on is a " test " library. I thought by
> posting the source, others would test and tinker, and give me some real
> good feedback.

Absolutely, and there's nothing wrong at all. You're sharing, you've even
solicited input. It's your choice whether to TAKE that input, and what to
do with it, because you wrote the code. On the other hand, your loud and
constant advice and implications that "this is right and good and universal
because I've seen it work on one or two existing systems" isn't really
doing anybody any favors.

>> SenderX will leave the SenderX dimension
>> only "in the fullness of time" as (and if) programming maturity arrives,
>
> Not exactly sure if I want to leave the lock-free world yet. ;)

No reason to. It's not a bad place to be, and I'm sure your work has already
helped, and will continue to help, a number of people.

But I'm not talking about your code. I'm talking about your attitude.

> My test library has improved the thread to thread communication for many
> app sources that I have access to. Many people have been very impressed
> with it.

No doubt. And with a more balanced attitude and a more open mind, you could
do even more. ;-)

> I just don't like Dave telling Win32 people that the Interlocked API's
> might not work on upcoming processors.

Yes. I know. Whereas Dave doesn't like you telling people that they're
guaranteed to work on upcoming processors simply because you've seen them
work in the past on existing processors. You see the problem? You're most
likely correct, that Microsoft will in the end do whatever is necessary to
make those functions behave "correctly". They COULD recode all their own
uses to avoid old functions that no longer work on a new processor. But
many Microsoft developers must know, and a few may even care, that many
applications might have dependencies on them as well. The point, however,
is that Microsoft has made no explicit promises. You are INFERRING both
intended function AND "good intentions" from your experience with the
current implementation, and, like Dave Schwartz, I can entertain the
thought that you might be inferring too much.

Joseph Seigh

unread,

Jun 20, 2003, 8:48:21 AM6/20/03

to

David Butenhof wrote:
>
> SenderX wrote:
...

> > The test loads I put my algos under are extremely high on SMP IA-32, and I
> > have a friend who will let me test an algo for weeks on end under extreme
> > load.
>
> But you're missing the point. No amount of testing PROVES anything except
> that you haven't yet exposed any flaws that exist. This is enough for a
> "seat of the pants Windows" programmer, as I said, but not nearly enough to
> satisfy someone writing code that must be both correct and portable. (Or at
> least, as correct and portable as can be practically achieved.)
>

I agree that testing can't be used to prove correctness. Howerver, there are no
formal specs for win32 or Posix, so those can't be used to prove correctness. So
what do we have? Well, what we have here in c.p.t. at least is a bunch of
experts who while not having a common formal framework to discuss these
issue, do have a common enough understanding to come to some sort of consensus
most of the time. Which we use in lieu of formal proofs.

But outside of c.p.t this doesn't work. Take the last discussion of DCL or
whatever you call it on c.l.c++.m. Most people there beieve DCL will work
if you use volatile. You can't "prove" that it won't work because nobody
there (or here) will accept formal reasoning as an argument, even if you
had something to formally reason with. And they (c.l.c++.m) certainly won't
recognise you as authoritative (meaning you can win arguments simply because
you say so).

Joe Seigh

Balog Pal

unread,

Jun 20, 2003, 10:52:36 AM6/20/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:bcr7cs$f7u$1...@nntp.webmaster.com...

> Assuming those other applications are using them *correctly*. Where
> 'correctly' means, 'only relying on documented behavior'.
>
> > Do you realize that CRITICAL_SECTIONS, and many other kernel internals
use
> > the Interlocked API's?
>
> How does that help me? I presume that microsoft will re-implement
these
> things if that's required by a future processor. Perhaps on a future
> processor it won't be possible to implement CRITICAL_SECTIONS in terms of
> the Interlocked API's, and Microsoft will do it some other way. How things
> happen to work now is not good enough for me, I need API guarantees so
that
> my code will work in the future.

Well, and after a good dream/nightmare let's wake up.

Microsoft already released like a dozen versions of OS-es. The world haz
like a billon coppies of those. We users expect them to run on the next PC
with the little faster processor. Not a new OS, but the existing one. MS
can write new stuff, but can't change the old stuff out. So Intel will
no more likely want dumping incompatible chips on the marked than it did
those pentiums with fp calculation bug.

The market is a much bigger guarantee than any piece of dox. In this
particular situation.

I write software for the existing systems. I target some select set -- and
say it works for those I actually tested. I say it might work on other
sytems too (say W95). I can never claim it will work on some NEXT system.
Actually I know pretty much programs that do not work correctly on the next
version of the OS. Like work on W2k and partially fail on XP. Some next,
fixed version will work there.

So what is the reason to give up current knowledge and well predictable
constraints to gain, in practice, nothing? Should the impossible happen,
Intel breaks #LOCK, MS issues a new OS WIN43 that can run on that chip.
Then what? I go, rewrite, test my program and release a WIN43 version of
that too.

The important thing is to DOCUMENT what the actual program relies upon. So
that knowledge, that design consideration is not lost. The real problem
comes not from using a feature, documented or not, but from the loss of
documenting constraints. Or from biting too big a chunk, or claiming 'I
cover anything'. Having solid constraints and checking them on a new target
is much easier, and more practical. I'd say much safer too.

Paul

Balog Pal

unread,

Jun 20, 2003, 11:21:03 AM6/20/03

to

"David Butenhof" <David.B...@hp.com> wrote in message
news:3ef2...@usenet01.boi.hp.com...

> Yes, and you don't really BELIEVE there's anything out there but Win32 and
> X86. That will come, perhaps, in time. ;-)

That is unsupported conclusion. The scope of this conversation is WIN32.
Taht does not mean anythng outside does not wexist, but it means we have
this scope, conclusions can be used in that scope, and anythong outside is
irrelevant.

> This is enough for a
> "seat of the pants Windows" programmer, as I said, but not nearly enough
to
> satisfy someone writing code that must be both correct and portable. (Or
at
> least, as correct and portable as can be practically achieved.)

Huh, what that 'portable' actually means? That you grab your source,
blindly compile on any given platform, and expet the result working?
Guess not. Then that statement is nothing but a flame.

Portability is not an end, it is a meand to achieve some goals. You go
for it if you need, and you leave it alone if not. As it has quite a price.
In development resources and also in possible waste in the end-product.

Whet I install my freshly bought CoolWare 10.0 I expect it to perform well
in my environment. I, as a user don't care if the developer can port it to
something else. And I don;t accept the program to crawl or being ugly
either -- with the exuse it's the way best fits the issuing company.

The 'put great efforts to make 5% of your program portable' when you know
it will not be proted anyway, or 95% would be rewritten in case, is just
bad.

> Yes. I know. Whereas Dave doesn't like you telling people that they're
> guaranteed to work on upcoming processors simply because you've seen them
> work in the past on existing processors.

The claim was not that. But that it will work _as long as Windows runs_.
Pretty different thing.

Even in securyty field we talk relative. Like 'using this method is no more
practcal than brute forcing the 56 bit des key', etc. Telling absolute
statements may be silly and unfounded. But a relative one is okey.

When I buy a key I don;t expect it to unlock any possible ock, but the one
it is intended to.

Just like that, programs, correctly using Interlocked stuff will work
correctly everywhere as long as say the current Windows NT 4.0 runs. On
any system we can claim they both run ot both break. no more no less.

> You see the problem? You're most
> likely correct, that Microsoft will in the end do whatever is necessary to
> make those functions behave "correctly". They COULD recode all their own
> uses to avoid old functions that no longer work on a new processor.

You very well know that's not true. Even MS can't go back in time.

> But
> many Microsoft developers must know, and a few may even care, that many
> applications might have dependencies on them as well. The point, however,
> is that Microsoft has made no explicit promises.

Well put. So you shall not give any _more_ than that. Say just what you
know. That you can't state it will work anywhere either, more than MS could
promoise.

Paul

Alexander Terekhov

unread,

Jun 20, 2003, 11:12:19 AM6/20/03

to

Balog Pal wrote:
[...]
> Just like that, programs, correctly using Interlocked stuff will work ...

Sure. Except that you just can't use it "correctly" because
even MS doesn't know how to use "correctly". All you can do
is to try to INFER the correct usage based on reverse
engineering ("ass-backwards" approach) and the Intel specs.
Both have really nothing to do with using them "correctly".
That's the problem. That's why MS-intelocked stuff is brain
dead.

regards,
alexander.

Michael Furman

unread,

Jun 20, 2003, 1:51:08 PM6/20/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:bctbeo$muo$1...@nntp.webmaster.com...
>
> "Michael Furman" <Michae...@Yahoo.com> wrote in message
> news:bctav2$m8brm$1...@ID-122417.news.dfncis.de...

>
> > "David Schwartz" <dav...@webmaster.com> wrote in message

> > news:bct5ud$jcg$1...@nntp.webmaster.com...
>
> > > What I am saying is totally independent of any particular
> > > implementation. I am talking about the consequences of what the
> > > documentation says. I am not saying "this is what VC++ does", I am
> saying,
> > > "this is what the documentation says VC++ does".
>
> > It does not say that:
> > > [...]

> > > "Objects declared as volatile are not used in optimizations because
> > > their
> > > value can change at any time. The system always reads the current
> value
> > > of
> > > a
> > > volatile object at the point it is requested, even if the previous
> > > instruction asked for a value from the same object. Also, the value
of
> > > the
> > > object is written immediately on assignment."
>

> > It is talking about instructions that are generating by compiler.
>
> NO. A C/C++ document is not allowed to do this. It must say what the
> instructions actually *DO*, not what the instructions *ARE*.

1. Yes and no: I comment the text fragment that you posted earlier; IFAIK it
is
not official C/C++ document. C++ standard does not say anything about
instructions:

(from 7.1.5.1):
78 [Note: volatile is a hint to the implementation to avoid aggressive
optimization involving the object because
the value of the object might be changed by means undetectable by an
implementation. See 1.9 for detailed semantics.
In general, the semantics of volatile are intended to be the same in C++ as
they arein C. ]

paragraph [1.9] is too long but it also does not have anything about either
instructions.

>
> > "system always reads the current value... " means that compiler
generates
> > instructions that access the address.
>
> In other words, it doesn't mean that the system reads the current
value.
>
> > It cannot mean more because it
> > is out of scope of the compiler.
>
> No, it's not. The compiler could use locks or fences or otherwise
ensure
> that "the value ... is read at the point it is requested". This is not an
> impossible thing to do.

But it is not about C++ compiler that must conform the standard.

>
> > For example, some memory locations
> > could be mapped to something else then memory (memory mapped I/O)
> > so it could not be any guarantee about value that is being read.
>
> I don't know what you mean. How can something qualified 'volatile' to
a
> C/C++ compiler be memory mapped unless I do something outside the scope of
> the compiler? I'm talking about well-behaved applications that do no magic
> behind the compiler's back. Obviously, if you do something crazy, then the
> gaurantees might not hold.

>
> > What it says that in case of volatile compiler is not allowed to move
the
> > instruction
> > that accessing the variable (or just eliminate it using value that was
> read
> > earlier).
>
> How can it possibly mean that? This is C/C++ documentation. The only
> instructions are C/C++ instructions. This is not an assembler document,
this
> is an explanation of what the C/C++ 'volatile' qualifier does to/for C/C++
> code on this platform.

No - it is not C/C++ forman documentation - it is just informal comments.

Michael

>
> DS
>
>