Threads and initialization only once

44 views
Skip to first unread message

Thomas Segev

unread,
Sep 21, 2003, 7:16:08 AM9/21/03
to
In a multi-threaded application that runs on UNIX and NT, I need to
use a parameter which may be defined by an environment variable. If no
environment variable exists, I use a hard coded default.

The environment variables are not changed by this application.

Each time I need to use that parameter I could first try getenv(), and
then depending on the result I could use it, or use my default.
However I would like to make it more efficient and determine the value
to use only once (by the first thread). I can use the pthread_once
routine for this purpose.

I have two questions:
1. Is there some eqvivalent to pthread_once on NT ?

2. What can be wrong if instead I use the following (non)solution on
UNIX:

static int var = DEFAULT;
static int initialized = 0;


if (initialized)
{
/* use var */
...
}
else
if(pe = getenv(ENV_VARIALBE))
{
/* convert and assign the value of the environment variable to
var */
...
initialized = 1;
}

I realize that in this way multiple threads may call getenv() in
parallel and then set the value of var and initialized.
Although getenv()is not thread safe, there should not be a problem
with it as long as the application does not change or add environment
variables. Correct ?

Is there a danger in the above (other then being not formally thread
safe) ?
Can the value of var be unpredictable because multiple threads assign
it the same value in parallel ?

Thanks,
Thomas

Thomas Segev

unread,
Sep 21, 2003, 7:39:39 AM9/21/03
to
I would like to correct the program in my first post:

if (initialized)
{
/* use var */
...
}
else
{
pe = getenv(ENV_VARIALBE);
if (pe)

{
/* convert and assign the value of the
environment variable to var */
...
}
initialized = 1;
}

Thanks,
Thomas
"Thomas Segev" <thomas...@bmc.com> wrote in message
news:83b78f96.03092...@posting.google.com...

Patrick TJ McPhee

unread,
Sep 21, 2003, 10:09:34 PM9/21/03
to
In article <83b78f96.03092...@posting.google.com>,
Thomas Segev <thomas...@bmc.com> wrote:

% In a multi-threaded application that runs on UNIX and NT, I need to
% use a parameter which may be defined by an environment variable. If no
% environment variable exists, I use a hard coded default.
%
% The environment variables are not changed by this application.

Why not initialise a variable at the very start, when you're still
single-threaded? You don't need any synchronisation at all if you do
it that way.

--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

Thomas Segev

unread,
Sep 21, 2003, 11:28:23 PM9/21/03
to
Thanks Patrick, but my code is a dynamic load library and I am restricted
now to changes only there.
Unfortunately there is no process initialization call provided by this
library.
Even if I add such a call now, the calling thread code can not be modified
at present (I need a solution which is backwards compatible).

Thomas
"Patrick TJ McPhee" <pt...@interlog.com> wrote in message
news:bkllkt$e9a$3...@news.eusc.inter.net...

Alexander Terekhov

unread,
Sep 22, 2003, 2:32:59 AM9/22/03
to

Thomas Segev wrote:
[...]

> I have two questions:
> 1. Is there some eqvivalent to pthread_once on NT ?

http://groups.yahoo.com/group/boost/message/15442
("named mutex" thing)

http://groups.google.com/groups?selm=3ECBD751.8DE8AED5%40web.de
(DCSI stuff; you can use a "named mutex" instead of static one)

regards,
alexander.

David Schwartz

unread,
Sep 22, 2003, 5:44:17 AM9/22/03
to

"Thomas Segev" <thomas...@bmc.com> wrote in message
news:vmr3joh...@corp.supernews.com...

> I would like to correct the program in my first post:
> if (initialized)
> {
> /* use var */
> ...
> }
> else
> {
> pe = getenv(ENV_VARIALBE);
> if (pe)
> {
> /* convert and assign the value of the
> environment variable to var */
> ...
> }
> initialized = 1;
> }

The problem occurs in the first part of the 'if' block. Even though a
thread sees 'initialized' as one, there is no guarantee that it sees the
correct value for the variable.

I'm really curious why you are going to all this craziness. Why not just
use a CRITICAL_SECTION? Or why not use a CRITICAL_SECTION in combination
with an Interlocked* function? Or some other sane solution rather than
relying upon the compiler to 'just happen to' get it right?

Has profiling proven that using normal locking is a bottlenect?

DS


Thomas Segev

unread,
Sep 22, 2003, 6:49:21 AM9/22/03
to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:bkmg9p$bcg$1...@nntp.webmaster.com...

>
> "Thomas Segev" <thomas...@bmc.com> wrote in message
> news:vmr3joh...@corp.supernews.com...
>
> > I would like to correct the program in my first post:
> > if (initialized)
> > {
> > /* use var */
> > ...
> > }
> > else
> > {
> > pe = getenv(ENV_VARIALBE);
> > if (pe)
> > {
> > /* convert and assign the value of the
> > environment variable to var */
> > ...
> > }
> > initialized = 1;
> > }
>
> The problem occurs in the first part of the 'if' block. Even though a
> thread sees 'initialized' as one, there is no guarantee that it sees the
> correct value for the variable.

Could you explain why not ? Assuming the environment variable does
not change, several thread may attempt to set to the same value in
"parallel". What can go wrong here ?

>
> I'm really curious why you are going to all this craziness. Why not
just
> use a CRITICAL_SECTION? Or why not use a CRITICAL_SECTION in combination
> with an Interlocked* function? Or some other sane solution rather than
> relying upon the compiler to 'just happen to' get it right?

The application needs to be efficient. I don't want to add unnecessary
overhead.
>

> Has profiling proven that using normal locking is a bottlenect?

No, but why use it if it is not a must ?
So I am just wondering what can go wrong. How can a compiler implementation
corrupt the value of var ?

>
> DS
>
>
Thomas


David Butenhof

unread,
Sep 22, 2003, 7:26:23 AM9/22/03
to
Thomas Segev wrote:

> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:bkmg9p$bcg$1...@nntp.webmaster.com...
>>
>> "Thomas Segev" <thomas...@bmc.com> wrote in message
>> news:vmr3joh...@corp.supernews.com...
>>
>> > I would like to correct the program in my first post:
>> > if (initialized)
>> > {
>> > /* use var */
>> > ...
>> > }
>> > else
>> > {
>> > pe = getenv(ENV_VARIALBE);
>> > if (pe)
>> > {
>> > /* convert and assign the value of the
>> > environment variable to var */
>> > ...
>> > }
>> > initialized = 1;
>> > }
>>
>> The problem occurs in the first part of the 'if' block. Even though a
>> thread sees 'initialized' as one, there is no guarantee that it sees the
>> correct value for the variable.
>
> Could you explain why not ? Assuming the environment variable does
> not change, several thread may attempt to set to the same value in
> "parallel". What can go wrong here?

Efficient multiprocessor architectures, like Alpha and IA64, allow reads and
writes to be REORDERED to take best advantage of cache and interprocessor
memory protocols.

Without explicit synchronization control, there is no need to ensure that
one processor sees the writes of another processor in any particular order;
and therefore it makes no sense to slow down the pipeline to ensure that
everything occurs in order.

There have been many discussions on this newsgroup about the sort of "short
cut" technique you're trying to apply. On a machine without strict
cross-processor ordering, it just won't work because another processor may
see your write to "initialized" before it sees prior writes. Which means
that your 'var' may be uninitialized when some thread tries to read it,
even after having checked 'initialized'.

The only PORTABLE solution is to use proper synchronization around the whole
block -- e.g., locking a mutex before you test the value of initialized,
and unlocking it only after writing initialized.

There are more efficient non-portable solutions, generally involving memory
barriers or fences (depending on what machine you use), to ensure that both
reads and writes occur in the order you expect. That sort of
machine-specific coding generally makes sense only when the code will be
executed a lot and needs to be fast.

Do a google search for "DCL" or the less misleading term (coined by our
Alexander Terekhov) "DCSI" (double-checked serialized initialization)
should turn up a bunch of info. (You should restrict the search to
comp.programming.threads, because the abbreviations have been used for
various other things in other contexts.)

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

Thomas Segev

unread,
Sep 22, 2003, 9:04:14 AM9/22/03
to
Thanks David, this clarifies the issue.
Is there no simple "trick" to force order ?

Will the following work ?:

struct var
{
int value;
int initialized;
};

static struct var globalvar ={DEFAULT,0};

................
...............

struct init var localvar = {DEFAULT,0};

if (globalvar.initialized)
{
/* use globalvar.value */


...
}
else
{
pe = getenv(ENV_VARIALBE);
if (pe)
{
/* convert and assign the value of the

environment variable to localvar.value */
...
}
localvar.initialized = 1;
globalvar = localvar;
}

Thanks,
Thomas


"David Butenhof" <David.B...@hp.com> wrote in message
news:3f6e...@usenet01.boi.hp.com...

Patrick TJ McPhee

unread,
Sep 22, 2003, 2:15:52 PM9/22/03
to
In article <vmsr626...@corp.supernews.com>,
Thomas Segev <thomas...@bmc.com> wrote:

% Unfortunately there is no process initialization call provided by this
% library.
% Even if I add such a call now, the calling thread code can not be modified
% at present (I need a solution which is backwards compatible).

Depending on how it's loaded, you may be able to do it in an _init function.
If the library load itself is synchronised or happens when the process
has only one thread, the _init function will be synchronised.

Hovik Melikyan

unread,
Sep 23, 2003, 4:45:01 AM9/23/03
to
Alexander Terekhov <tere...@web.de> wrote in message news:<3F6E979B...@web.de>...

> Thomas Segev wrote:
> [...]
> > I have two questions:
> > 1. Is there some eqvivalent to pthread_once on NT ?
>
> http://groups.yahoo.com/group/boost/message/15442
> ("named mutex" thing)
>

You wrote there: "Interlocked stuff does not formally provide memory
synch guaranties...". Can you please explain what's wrong with LOCK
XCHG?

On the original question: I still think the best way to do such things
is to use atomic exchange:

int initialized = 0;

if (atomic_exchange(&initialized, 1) == 0)
{
// ... initialization here ...
}

Look for a library that does portable atomic operations on both
Windows and Unix.

--
Hovik Melikyan

SenderX

unread,
Sep 23, 2003, 4:50:02 AM9/23/03
to
> You wrote there: "Interlocked stuff does not formally provide memory
> synch guaranties...".

Alex is wrong on this.

Interlocked API's do contain memory barriers on non-TSO systems.

--
The designer of the experimental, SMP and HyperThread friendly, AppCore
library.

http://AppCore.home.comcast.net


Alexander Terekhov

unread,
Sep 23, 2003, 5:20:23 AM9/23/03
to

Hovik Melikyan wrote:
[...]

> You wrote there: "Interlocked stuff does not formally provide memory
> synch guaranties...". Can you please explain what's wrong with LOCK
> XCHG?

Nothing wrong except that LOCK XCHG is IA32 instruction, not MS-WIN
interlocked stuff.

>
> On the original question: I still think the best way to do such things
> is to use atomic exchange:
>
> int initialized = 0;
>
> if (atomic_exchange(&initialized, 1) == 0)

This isn't the best way.

> {
> // ... initialization here ...
> }
>
> Look for a library that does portable atomic operations on both
> Windows and Unix.

You'll have to stay in this threading business until at least 2010
(or so), you'll have a chance to see initial draft then, I guess.

regards
alexander.

Alexander Terekhov

unread,
Sep 23, 2003, 5:28:01 AM9/23/03
to

SenderX wrote:
>
> > You wrote there: "Interlocked stuff does not formally provide memory
> > synch guaranties...".
>
> Alex is wrong on this.

I wrote that statement long before MS added acquire and release calls
and sort of "clarified" that old stuff is meant to provide both. They
are still defined rather poorly. I have to say that their "definition"
remains totally brain-dead... and, as you know, we already had a long
long thread on this topic here, Sender.

regards,
alexander.

David Butenhof

unread,
Sep 23, 2003, 9:10:49 AM9/23/03
to
Thomas Segev wrote:

> Thanks David, this clarifies the issue.
> Is there no simple "trick" to force order ?

No. You can't force the compiler to generate the memory instructions you'd
like; and even if you could that won't force the hardware to issue (and
more importantly, retire) them in the order you'd like.

> Will the following work ?:

Nope. Packaging the data in a structure doesn't guarantee they'll be written
in a single atomic instruction -- which is the guarantee you'd need. (And
if both values are 32 bit on a 32 bit machine, or 64 bits on a 64 bit
machine, it may not even HAVE the single atomic instruction you'd like.) If
the structure is copied in two instructions, then those two writes can be
reordered. And even if the writes weren't reordered, you're still issuing
separate reads, and those could be reordered. You could go one step further
in your attempt and copy globalvar into localvar and then read the fields
of localvar... except that still doesn't guarantee the compiler will
generate a single copy operation, so you'd be no better off.

DCSI is a machine-specific optimization. To avoid a lock you need to include
specialized machine instructions.

Hovik Melikyan

unread,
Sep 23, 2003, 9:26:31 AM9/23/03
to
Alexander Terekhov <tere...@web.de> wrote in message news:<3F701057...@web.de>...

> Hovik Melikyan wrote:
> [...]
> > You wrote there: "Interlocked stuff does not formally provide memory
> > synch guaranties...". Can you please explain what's wrong with LOCK
> > XCHG?
>
> Nothing wrong except that LOCK XCHG is IA32 instruction, not MS-WIN
> interlocked stuff.
>

Windows NT/2000 InterlockedXXX stuff uses LOCK XCHG (also XADD). It is
different on Windows95/98 family since those systems can run on i386
which lacks XADD.

> >
> > int initialized = 0;
> >
> > if (atomic_exchange(&initialized, 1) == 0)
>
> This isn't the best way.
>

And the best way is ... ? (provided that you do have a portable atomic
exchange function handy)

>
> You'll have to stay in this threading business until at least 2010
> (or so), you'll have a chance to see initial draft then, I guess.
>

You mean, standardized atomic functions? This is tough, because you
need to have hardware support for it. Every engineering company has
its own way of doing threading and SMP. You'd never convince Sun to
add atomic increment (not compare-and-smth), for example. OTOH you
don't have to, I think. Sometimes by standardizing things you limit
creativity and the possibility of finding new solutions.

--
Hovik Melikyan

Luca

unread,
Sep 23, 2003, 9:36:03 AM9/23/03
to

Maybe I'm saying something stupid, but why don't you avoid the whole problem
safelly calling getenv(), reading a configuration file or settings default
config values at the beginning of your app, maybe filling a structure with
the program configuration? Then start all your threads passing the pointer
to the configuration strucure (or making it globally available).

Luca


Joe Seigh

unread,
Sep 23, 2003, 9:39:49 AM9/23/03
to

Hovik Melikyan wrote:
>
> Alexander Terekhov <tere...@web.de> wrote in message news:<3F701057...@web.de>...
>
> >

> > You'll have to stay in this threading business until at least 2010
> > (or so), you'll have a chance to see initial draft then, I guess.
> >
>
> You mean, standardized atomic functions? This is tough, because you
> need to have hardware support for it. Every engineering company has
> its own way of doing threading and SMP. You'd never convince Sun to
> add atomic increment (not compare-and-smth), for example. OTOH you
> don't have to, I think. Sometimes by standardizing things you limit
> creativity and the possibility of finding new solutions.
>

There are some interlocked functions that can be considered standard,
even on architectures that don't have them. Compare and swap is on
or can be simulated on almost every architecture. From there you
can construct fetch and op primatives, etc...

I think you'll find more creativity on platforms that do provide these
as programming api's than on platforms that do not.

Joe Seigh

Alexander Terekhov

unread,
Sep 23, 2003, 9:50:06 AM9/23/03
to

Hovik Melikyan wrote:
[...]
> InterlockedXXX stuff

http://groups.google.com/groups?threadm=3EC89028.7B60D559%40web.de

[...]


> You mean, standardized atomic functions? This is tough, because you
> need to have hardware support for it. Every engineering company has
> its own way of doing threading and SMP. You'd never convince Sun

http://gee.cs.oswego.edu/dl/concurrency-interest
http://gee.cs.oswego.edu/dl/concurrent/dist/docs/java/util/concurrent/atomic/AtomicInteger.html
http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000023.html
http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000024.html

regards,
alexander.

P.S. http://gee.cs.oswego.edu/dl/concurrency-interest/jsr166

"Preliminary versions of JSR166 concurrency utilities are available
for you to try out. This release contains versions that run natively
on solaris-sparc and linux-x86 using addons to hotspot 1.4.1, as
well as an "emulation mode" that should run on any 1.4.x JVM (but
slowly! at least 5 to 15 times slower)."

Thomas Segev

unread,
Sep 23, 2003, 10:02:19 AM9/23/03
to
"Luca" <n...@mail.net> wrote in message
news:bkpi72$nm9$1...@galileo.it.ip-plus.net...

> Thomas
>
> Maybe I'm saying something stupid, but why don't you avoid the whole
problem
> safelly calling getenv(), reading a configuration file or settings default
> config values at the beginning of your app, maybe filling a structure with
> the program configuration? Then start all your threads passing the pointer
> to the configuration strucure (or making it globally available).
>
> Luca
>
>

Luca,

Because my code is a dynamic load library and I am restricted now to changes
only there.


Unfortunately there is no process initialization call provided by this

library.


Even if I add such a call now, the calling thread code can not be modified

at present (I need a solution which is backwards compatible).


Thomas


Hovik Melikyan

unread,
Sep 23, 2003, 1:31:03 PM9/23/03
to
ho...@melikyan.com (Hovik Melikyan) wrote in message news:<ad206972.03092...@posting.google.com>...

> Alexander Terekhov <tere...@web.de> wrote in message > > >
> > > int initialized = 0;
> > >
> > > if (atomic_exchange(&initialized, 1) == 0)
> >
> > This isn't the best way.
> >
>
> And the best way is ... ? (provided that you do have a portable atomic
> exchange function handy)
>

The best way is tO^8###*&JKnssd&^ LOST CARRIER_

:)

So ... ? I'm still waiting and I'm impatient. I'd even accept your
recursive references, Alexander. :)

--
Hovik Melikyan

Hovik Melikyan

unread,
Sep 23, 2003, 1:37:28 PM9/23/03
to
Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F704FB1...@xemaps.com>...

>
> There are some interlocked functions that can be considered standard,
> even on architectures that don't have them. Compare and swap is on
> or can be simulated on almost every architecture. From there you
> can construct fetch and op primatives, etc...
>

I have a feeling that `weak' atomic functions (i.e.
compare-and-dosmth) can actually be replaced with a simple group of
functions that exchange, increment (and decrement) counters without
comparing with an expected value.

Ok, this is only a feeling :) but can anyone (1) explain why
algorithms based on CAS can't be replaced with algorithms that use
plain atomic exchange and (2) what does it mean in general and what
are typical programming patterns for CAS?

This feeling (of uselessness? superfluity? of CAS) comes from the fact
that I've never used any compare-and-... logic and I always managed to
solve the problem with plain exchange/inc/dec family of functions.
Basically you need them when you want to do performance optimization
where otherwise you'd use standard sync objects (mutex, sema, ...).

So the question is do plain atomic exchange/inc/dec replace CAS and
Co.?

Sorry for [slight] off-topic wrt this thread.

--
Hovik Melikyan

Joe Seigh

unread,
Sep 23, 2003, 2:30:32 PM9/23/03
to

Hovik Melikyan wrote:
>
> > Alexander Terekhov <tere...@web.de> wrote in message > > >

> > And the best way is ... ? (provided that you do have a portable atomic
> > exchange function handy)
> >
>
> The best way is tO^8###*&JKnssd&^ LOST CARRIER_
>
> :)
>
> So ... ? I'm still waiting and I'm impatient. I'd even accept your
> recursive references, Alexander. :)
>

Alexander is still waiting and impatient with the rate of progress
of the Posix standards committee on getting the interlocked
functions in the standard. :)

Joe Seigh

Joe Seigh

unread,
Sep 23, 2003, 2:34:50 PM9/23/03
to

Hovik Melikyan wrote:
>
> Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F704FB1...@xemaps.com>...

...


>
> So the question is do plain atomic exchange/inc/dec replace CAS and
> Co.?

There are things that you can do with CAS that you cannot do with
exchange/inc/dec. Pushing and popping from a stack for example.
Conditional things like decrement only if positive, etc ...

Joe Seigh

SenderX

unread,
Sep 23, 2003, 2:53:41 PM9/23/03
to
> Alexander is still waiting and impatient with the rate of progress
> of the Posix standards committee on getting the interlocked
> functions in the standard. :)

Alex calls the Interlocked API "totally brain-damaged"...

Why would Alex want POSIX to contain these horrible functions?

=P

David Schwartz

unread,
Sep 23, 2003, 2:53:09 PM9/23/03
to

"Thomas Segev" <thomas...@bmc.com> wrote in message
news:vmtl1fs...@corp.supernews.com...

> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:bkmg9p$bcg$1...@nntp.webmaster.com...

> > > if (initialized)


> > > {
> > > /* use var */
> > > ...
> > > }
> > > else
> > > {
> > > pe = getenv(ENV_VARIALBE);
> > > if (pe)
> > > {
> > > /* convert and assign the value of the
> > > environment variable to var */
> > > ...
> > > }
> > > initialized = 1;
> > > }

> > The problem occurs in the first part of the 'if' block. Even though
a
> > thread sees 'initialized' as one, there is no guarantee that it sees the
> > correct value for the variable.

> Could you explain why not ?

I just did.

> Assuming the environment variable does
> not change, several thread may attempt to set to the same value in
> "parallel". What can go wrong here ?

I just told you, just because a thread sees 'initialized' as one, it
does not follow that it must see the correct value for the variable.

Here's an example, support we start with:

int i=1, j=1;

And another thread does this:

i=2;
j=2;

Does it follow that another thread that sees 'j' as 2 must also see 'i'
as 2? The answer is no. Nothing prevents the compiler, processor, memory
hardware, or who knows what from reordering those two writes. In fact, 'i'
could be completely optimied into a register by the writing thread, so that
the 'i=2;' write has no effect on another thread that reads the 'i'
variable.

> > I'm really curious why you are going to all this craziness. Why not
> > just
> > use a CRITICAL_SECTION? Or why not use a CRITICAL_SECTION in combination
> > with an Interlocked* function? Or some other sane solution rather than
> > relying upon the compiler to 'just happen to' get it right?

> The application needs to be efficient. I don't want to add unnecessary
> overhead.

I don't know what to say to this. This is just so wrong for so many
reasons.

> > Has profiling proven that using normal locking is a bottlenect?

> No, but why use it if it is not a must ?
> So I am just wondering what can go wrong. How can a compiler
implementation
> corrupt the value of var ?

You are trying to rely upon guarantees that you don't have. What good is
an efficient program that is broken? Please google for 'premature
optimization'.

DS


David Schwartz

unread,
Sep 23, 2003, 2:54:35 PM9/23/03
to

"Thomas Segev" <thomas...@bmc.com> wrote in message
news:vmtsud5...@corp.supernews.com...

> Thanks David, this clarifies the issue.
> Is there no simple "trick" to force order ?

No, there is not.

> Will the following work ?:

No, it won't.

Stop this, please. Premature optimization is just wrong, plain and
simple. What you are doing won't work.

DS


SenderX

unread,
Sep 23, 2003, 2:58:54 PM9/23/03
to
> Stop this, please. Premature optimization is just wrong, plain and
> simple. What you are doing won't work.

I bet he can get it to work...

He has to use specific atomic-ops.

Not very portable, but they will work.

SenderX

unread,
Sep 23, 2003, 4:17:29 PM9/23/03
to
> Nope. Packaging the data in a structure doesn't guarantee they'll be
written
> in a single atomic instruction -- which is the guarantee you'd need. (And
> if both values are 32 bit on a 32 bit machine, or 64 bits on a 64 bit
> machine, it may not even HAVE the single atomic instruction you'd like.)

x86 has 64-bit swaps. Two pointers can be cas'd.

IA-64 has 64-bit compare and 128-bit swap. You can compare a single pointer,
and swap two.

IA-64 has release/acquire barriers built into the CAS.

AMD-64 can only CAS a single pointer.

PowerPC has LL/SC, any other processor with LL/SC can do it as well.


> DCSI is a machine-specific optimization. To avoid a lock you need to
include
> specialized machine instructions.

Its more than machine-specific?

What about the compiler... If the compiler were to interfere with the
ordering of instructions, odd things might happen.

;)

Here is some pseudo-code of an initialize-once algo...

Lock-Free Instance Code for...

Compiler: VC++ 6.0


// The shared pointer
volatile C_Object *pSharedObj = NULL;


// Turn off compiler optimizer
#pragma optimize( "", off )


// Instance the object
C_Object&

InstanceObject()
{
// Load the shared object
volatile C_Object* pLocalPtr
= InterlockedCompareExchangePointer
( &pSharedObj,
NULL, // Swap NULL
NULL ); // Compare with NULL

// If NULL, there is no object
if ( pLocalPtr == NULL )
{
// Create a new object
volatile C_Object* pNewPtr = new C_Object;

// Load the shared object again,
// and try and store the new one
pLocalPtr =
InterlockedCompareExchangePointer
( &pSharedObj,
pNewPtr, // Swap new object
NULL ); // Compare with NULL

// If NULL, the object has been updated by us
if ( pLocalPtr == NULL )
{
pLocalPtr = pNewPtr;
}

else
{
// Another thread swapped before us
delete pNewPtr;
}

// IA-64 Release
}

// IA-64 Acquire

// We got it!
return *pLocalPtr;
}


// Turn on compiler optimizer
#pragma optimize( "", on )


This should ONLY work for the VC++ 6.0 compiler, and maybe higher.

SenderX

unread,
Sep 23, 2003, 4:20:18 PM9/23/03
to
> What about the compiler... If the compiler were to interfere with the
> ordering of instructions, odd things might happen.

You could just code the whole thing in assembler. You don't have to worry
about your assembler instructions get reordered by the compiler.

=)

Thomas Segev

unread,
Sep 24, 2003, 12:05:09 AM9/24/03
to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:bkq4qr$7do$1...@nntp.webmaster.com...

>
> "Thomas Segev" <thomas...@bmc.com> wrote in message
> news:vmtl1fs...@corp.supernews.com...
>
> > "David Schwartz" <dav...@webmaster.com> wrote in message
> > news:bkmg9p$bcg$1...@nntp.webmaster.com...
>
> > > > if (initialized)
> > > > {
> > > > /* use var */
> > > > ...
> > > > }
> > > > else
> > > > {
> > > > pe = getenv(ENV_VARIALBE);
> > > > if (pe)
> > > > {
> > > > /* convert and assign the value of the
> > > > environment variable to var */
> > > > ...
> > > > }
> > > > initialized = 1;
> > > > }
>
> > > The problem occurs in the first part of the 'if' block. Even
though
> a
> > > thread sees 'initialized' as one, there is no guarantee that it sees
the
> > > correct value for the variable.
>
> > Could you explain why not ?
>
> I just did.

I did not understand it the first time.

>
> > Assuming the environment variable does
> > not change, several thread may attempt to set to the same value in
> > "parallel". What can go wrong here ?
>
> I just told you, just because a thread sees 'initialized' as one, it
> does not follow that it must see the correct value for the variable.
>
> Here's an example, support we start with:
>
> int i=1, j=1;
>
> And another thread does this:
>
> i=2;
> j=2;
>
> Does it follow that another thread that sees 'j' as 2 must also see
'i'
> as 2? The answer is no. Nothing prevents the compiler, processor, memory
> hardware, or who knows what from reordering those two writes. In fact, 'i'
> could be completely optimied into a register by the writing thread, so
that
> the 'i=2;' write has no effect on another thread that reads the 'i'
> variable.
>

What about volatile ? Why won't it help here ?

> > > I'm really curious why you are going to all this craziness. Why
not
> > > just
> > > use a CRITICAL_SECTION? Or why not use a CRITICAL_SECTION in
combination
> > > with an Interlocked* function? Or some other sane solution rather than
> > > relying upon the compiler to 'just happen to' get it right?
>
> > The application needs to be efficient. I don't want to add unnecessary
> > overhead.
>
> I don't know what to say to this. This is just so wrong for so many
> reasons.
>

I just want to understand why to do or not to do something.

> > > Has profiling proven that using normal locking is a bottlenect?
>
> > No, but why use it if it is not a must ?
> > So I am just wondering what can go wrong. How can a compiler
> implementation
> > corrupt the value of var ?
>
> You are trying to rely upon guarantees that you don't have. What good
is
> an efficient program that is broken? Please google for 'premature
> optimization'.
>

Ok, ok, I promise to behave.

> DS
>
>

Thomas


Thomas Segev

unread,
Sep 24, 2003, 12:12:54 AM9/24/03
to
Please explain why volatile will not help here.

Thomas
"SenderX" <x...@xxx.xxx> wrote in message
news:OJ0cb.561994$uu5.92097@sccrnsc04...

David Schwartz

unread,
Sep 24, 2003, 2:19:23 AM9/24/03
to

"Thomas Segev" <thomas...@bmc.com> wrote in message
news:vn26326...@corp.supernews.com...

> What about volatile ? Why won't it help here ?

Because 'volatile' has nothing to do with threads.

> Ok, ok, I promise to behave.

You'll be glad you did.

DS


David Schwartz

unread,
Sep 24, 2003, 2:20:43 AM9/24/03
to

"Thomas Segev" <thomas...@bmc.com> wrote in message
news:vn26hj9...@corp.supernews.com...

> Please explain why volatile will not help here.

Because 'volatile' is part of the C/C++ standard which says nothing
about what will or won't work in a multithreaded program. If the threading
standard you're using says it will work, then it will. It is known for a
fact that:

1) It does not work on WIN32.

2) It is not guaranteed to work by the pthreads standard.

DS


Alexander Terekhov

unread,
Sep 24, 2003, 3:41:30 AM9/24/03
to

Hovik Melikyan wrote:
[...]

> So ... ? I'm still waiting and I'm impatient.

I already provided a sort of answer. In my first reply.

> I'd even accept your recursive references, Alexander. :)

http://groups.google.com/groups?selm=3F6E979B.AE5A2BB9%40web.de
(click on the second link... and so forth)

regards,
alexander.

Alexander Terekhov

unread,
Sep 24, 2003, 5:17:58 AM9/24/03
to

SenderX wrote:
>
> > Alexander is still waiting and impatient with the rate of progress
> > of the Posix standards committee on getting the interlocked
> > functions in the standard. :)
>
> Alex calls the Interlocked API "totally brain-damaged"...
>
> Why would Alex want POSIX to contain these horrible functions?

Right.

http://terekhov.de/pthread_refcount_t/experimental/refcount.cpp
(see msync and atomic<> sketch)

I, for one, want POSIX to contain something along the lines of atomic<>
with "a whole bunch" of implementation provided specializations (for
scalars that can be made atomic). Plain C "layer" would have to resort
to some silly macros (or bizillion of atomic calls)... "implemented" on
top of C++ stuff, of course. A sort of first main problem is that POSIX
folks just keep telling me that "C++ bindings" isn't their business, so
to speak. For some reason, they just fail to realize that what they
have currently, can (and ought to) be "expressed" along the lines of:

typedef std::thread * pthread_t;

extern "C" pthread_t pthread_self() throw() {
return std::thread_self().raw_ptr();
}

typedef thread_specific_ptr<void, __c_TSD_cleanup> * pthread_key_t;

extern "C" int pthread_key_create(pthread_key_t * key,
void ( * dtor)(void *)) throw() {
try {
// can throw "shall fail" stuff only (std::bad_alloc/try_again)
*key = new pthread_key_t(__c_TSD_cleanup(dtor));
// "may fail" shall be caught in the std::unexpected() handler
}
catch(...) {
return __translate_exception_to_error_code_using_throw_and_catch();
}
return 0;
}

// PODs
typedef std::aligned_storage<std::mutex> pthread_mutex_t;
typedef std::aligned_storage<std::mutexattr_t> pthread_mutexattr_t;

extern "C" int pthread_mutex_init(pthread_mutex_t * mutex_storage,
const pthread_mutexattr_t * attr_storage) throw() {
try {
// can throw "shall fail" stuff only
new (mutex_storage->place()) std::mutex(attr_storage->object());
// "may fail" shall be caught in the std::unexpected() handler
}
catch(...) {
return __translate_exception_to_error_code_using_throw_and_catch();
}
return 0;
}

extern "C" int pthread_mutex_destroy(pthread_mutex_t * mutex_storage)
throw() {
// destructor with implicit throw()-nothing ES
mutex_storage->object().~object();
// "may fail" shall be caught in the std::unexpected() handler
return 0;
}

#define PTHREAD_CANCELED std::thread_canceled()

extern "C" void pthread_exit(void * ptr)
throw(std::thread_termination_request) {
ptr == PTHREAD_CANCELED ? std::thread_cancel() :
std::thread_exit(ptr);
}

struct thread_canceled {
operator void * () { return &unique; }
static thread_canceled unique;
};

template<typename T>
void thread_exit(T value) {
assert(std::thread_self().can_exit_with<T>());
throw thread_exit_value(value);
}

template<>
void thread_exit(std::thread_canceled) {
thread_cancel();
}

void thread_cancel() {
throw std::thread_cancel_request();
}

and so forth.

regards,
alexander.

Hovik Melikyan

unread,
Sep 24, 2003, 5:31:22 AM9/24/03
to
Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F7094D7...@xemaps.com>...

> Hovik Melikyan wrote:
> >
> > So the question is do plain atomic exchange/inc/dec replace CAS and
> > Co.?
>
> There are things that you can do with CAS that you cannot do with
> exchange/inc/dec. Pushing and popping from a stack for example.

I forgot to tell that I mean inc/dec that return values either before
or after the inc/dec operation (actually it doesn't matter), not just
negative/zero/positive result.

If so, I can push to the stack using a plain atomic increment: it
gives me the stack pointer where I should store stuff and increments
the SP atomically. So it works fine without CAS.

>
> Conditional things like decrement only if positive, etc ...
>

That's what I'm asking: what are those "conditional things" for? Show
me an example and I'll try to give a different way of doing it.

--
Hovik Melikyan

Hovik Melikyan

unread,
Sep 24, 2003, 6:00:12 AM9/24/03
to
Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F7093D5...@xemaps.com>...

>
> Alexander is still waiting and impatient with the rate of progress
> of the Posix standards committee on getting the interlocked
> functions in the standard. :)
>

http://groups.google.com/groups?selm=3EC108B8.980ACE53%40web.de

You mean this?

Great. But first, his functions not always have the 'return' statement
where they should :) and I assume that their return values indicate
`signness' or 'zeroness' of the resulting value. It's ok, but I have
several algorithms that use the exact value after the inc/dec
operation. One example is pushing to the stack (in the other porting).

Another example is a dynamic string class with copy-on-write that I
implemented [ http://tinyurl.com/oh2e ] The multithreaded version of
unique() has an atomic increment and comparison with 2, also some
other parts of the module has comparisons with 0. Signess won't help
here, or maybe there can be some other algorithm I'm not aware of.

These atomic functions are powerful and they can do many things, if
not any thing. So I'd propose to make pthread_refcount_* return the
values, not just signess.

--
Hovik Melikyan

Alexander Terekhov

unread,
Sep 24, 2003, 6:35:28 AM9/24/03
to

Hovik Melikyan wrote:
>
> Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F7093D5...@xemaps.com>...
> >
> > Alexander is still waiting and impatient with the rate of progress
> > of the Posix standards committee on getting the interlocked
> > functions in the standard. :)
> >
>
> http://groups.google.com/groups?selm=3EC108B8.980ACE53%40web.de
>
> You mean this?

Yeah.

>
> Great. But first, his functions not always have the 'return' statement
> where they should :)

Nobody's perfect. Google is no exception. Try

http://google.com/groups?selm=3EC108B8.980ACE53%40web.de&output=gplain

> and I assume that their return values indicate

> `signness' or 'zeroness' of the resulting value. It's ok, ...

return values indicate errors or '0K' (zero is OK).

> Another example is a dynamic string class with copy-on-write ...

http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
http://groups.google.com/groups?selm=3EC0F1F1.B78AA0DA%40web.de

regards,
alexander.

--
'YHBT' is a transliteration of an ancient Hebrew phrase
meaning 'There is no God but Allah'. 'HAND' is an ancient
VAX error message meaning 'Halting All Network Devices'.

Joe Seigh

unread,
Sep 24, 2003, 7:14:46 AM9/24/03
to

Hovik Melikyan wrote:
>
> Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F7094D7...@xemaps.com>...
> > Hovik Melikyan wrote:
> > >
> > > So the question is do plain atomic exchange/inc/dec replace CAS and
> > > Co.?
> >
> > There are things that you can do with CAS that you cannot do with
> > exchange/inc/dec. Pushing and popping from a stack for example.
>
> I forgot to tell that I mean inc/dec that return values either before
> or after the inc/dec operation (actually it doesn't matter), not just
> negative/zero/positive result.
>
> If so, I can push to the stack using a plain atomic increment: it
> gives me the stack pointer where I should store stuff and increments
> the SP atomically. So it works fine without CAS.
>

I was thinking of linked list stacks, not contiguous stacks. Besides,
I don't think you can push on or pop off a contiguous stack without
blocking. Something about atomically accessing the stack data and
manipulating the stack pointer at the same time. Maybe with DCAS.

> >
> > Conditional things like decrement only if positive, etc ...
> >
>
> That's what I'm asking: what are those "conditional things" for? Show
> me an example and I'll try to give a different way of doing it.
>

Search google for "lock-free" and study some the algorithms that it comes
up with.

Joe Seigh

Joe Seigh

unread,
Sep 24, 2003, 7:23:54 AM9/24/03
to

Alexander Terekhov wrote:


>
> SenderX wrote:
>
> > Alex calls the Interlocked API "totally brain-damaged"...
> >
> > Why would Alex want POSIX to contain these horrible functions?
>
> Right.
>
> http://terekhov.de/pthread_refcount_t/experimental/refcount.cpp
> (see msync and atomic<> sketch)
>
> I, for one, want POSIX to contain something along the lines of atomic<>
> with "a whole bunch" of implementation provided specializations (for
> scalars that can be made atomic). Plain C "layer" would have to resort
> to some silly macros (or bizillion of atomic calls)... "implemented" on
> top of C++ stuff, of course. A sort of first main problem is that POSIX
> folks just keep telling me that "C++ bindings" isn't their business, so
> to speak. For some reason, they just fail to realize that what they
> have currently, can (and ought to) be "expressed" along the lines of:
>

I would forget about C++. C++ is a compromise based on another compromise, C.
It is profoundly beyond recoverability. Somebody will have to design another
language from scratch. Not Java though. Not Aspect. Nice tries but they
fall rather short.

Joe Seigh

Hovik Melikyan

unread,
Sep 24, 2003, 9:36:24 AM9/24/03
to
Alexander Terekhov <tere...@web.de> wrote in message news:<3F717370...@web.de>...

> Hovik Melikyan wrote:
> >
> > Great. But first, his functions not always have the 'return' statement
> > where they should :)
>
> Nobody's perfect. Google is no exception. Try
>
> http://google.com/groups?selm=3EC108B8.980ACE53%40web.de&output=gplain
>

Ok :) Hope you are not compiling through google (if you know what it
is, I mean compiling) :)

>
> http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
>

Hmmmm... a couple of race conditions and also a memory leak bug. And
in fact, if you fix that bug you will get one more race condition,
unless you change the whole design. And I think the whole thing is
just wrong. Well, perhaps this code demonstrates something... other
than a working refcounted string.

--
Hovik Melikyan

Hovik Melikyan

unread,
Sep 24, 2003, 9:53:01 AM9/24/03
to
Alexander Terekhov <tere...@web.de> wrote in message news:<3F714AAA...@web.de>...

The question was: is there a better way of doing safe init-once than
this:

int initialized = 0;

if (atomic_exchange(&initialized, 1) == 0)

// ...

_provided that you do have a portable atomic exchange function handy_,
you answered "no, this isn't the best way" and then you say atomic
exchange can't be portable (from the indirect link above). I'm afraid
this is not an answer.


--
Hovik Melikyan

Joe Seigh

unread,
Sep 24, 2003, 10:19:31 AM9/24/03
to

Hovik Melikyan wrote:
>
> Alexander Terekhov <tere...@web.de> wrote in message news:<3F717370...@web.de>...
>
> >

> > http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
> >
>
> Hmmmm... a couple of race conditions and also a memory leak bug. And
> in fact, if you fix that bug you will get one more race condition,
> unless you change the whole design. And I think the whole thing is
> just wrong. Well, perhaps this code demonstrates something... other
> than a working refcounted string.
>

That's not a memory leak bug if you are referring to what I think you
are. That's the String uncontrolled reference to the COW buffer thing
where you don't know when or if the uncontrolled reference ever gets
dropped. Basically you can't share the COW buffer at that point and
you kind of hack that info in the refcount state to deal with it. So
basically you are not looking at a pure refcounting solution.

Joe Seigh

Alexander Terekhov

unread,
Sep 24, 2003, 10:22:53 AM9/24/03
to

Hovik Melikyan wrote:
[...]

> > http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
> >
>
> Hmmmm... a couple of race conditions and also a memory leak bug. And
> in fact, if you fix that bug you will get one more race condition,
> unless you change the whole design. And I think the whole thing is
> just wrong. Well, perhaps this code demonstrates something... other
> than a working refcounted string.

Brave fellow. Respect, respect. Care to elaborate?

regards,
alexander.

Alexander Terekhov

unread,
Sep 24, 2003, 10:48:08 AM9/24/03
to

Hovik Melikyan wrote:
[...]

> int initialized = 0;
>
> if (atomic_exchange(&initialized, 1) == 0)
> // ...

This is crapola and hence I wrote "This isn't the best way" in reply
to your statement "On the original question: I still think the best
way to do such things is to use atomic exchange: <crapola>". Kapis?

regards,
alexander.

Hovik Melikyan

unread,
Sep 24, 2003, 1:11:57 PM9/24/03
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:3F717F36...@xemaps.com...

>
> > >
> > > There are things that you can do with CAS that you cannot do with
> > > exchange/inc/dec. Pushing and popping from a stack for example.
> >
> > I forgot to tell that I mean inc/dec that return values either before
> > or after the inc/dec operation (actually it doesn't matter), not just
> > negative/zero/positive result.
> >
> > If so, I can push to the stack using a plain atomic increment: it
> > gives me the stack pointer where I should store stuff and increments
> > the SP atomically. So it works fine without CAS.
> >
>
> I was thinking of linked list stacks, not contiguous stacks.

Ok, linked list stack add/remove, lock-free wherever possible. First, I'd do
pop with atomic exchange:

item* stack_ptr;

item* stack_pop()
{
return atomic_exchange(&stack_ptr, stack_ptr->next);
}

This implies that the top stack item can't be accessed and popped
concurrently, and I think that would be a result of poor design. In typical
applications you access data and then pop it off the stack synchronously.

As for safe stack_push() for linked lists, it is tough and I can't think of
any lock-free implementation [yet] which is without a loop. A loop (wait
until you get what you expected) is possible both with the CAS scenario and
with XCHG, btw, so even here I can get rid of CAS. Besides, I'd prefer
locking over looping in this case, since loops are less predictable.

What I'm trying to say in general is that CAS _forces_ you to loop until
there are no concurrent attempts to do the same thing. So CAS-based
scenarios can be quite inefficient if the point of conflict is crowded. I'd
say looping is `built' into the concept of compare-and-swap(-or-whatever)
and that's what I dislike about it.

> Besides,
> I don't think you can push on or pop off a contiguous stack without
> blocking. Something about atomically accessing the stack data and
> manipulating the stack pointer at the same time.

Pushing with XCHG like I proposed is absolutely safe, it'd never interfere
with pop or access operations. Also, you can make pop operation lock-free as
well, provided that what is said for stack_pop() above is true for
contiguous stacks too.

>
> Search google for "lock-free" and study some the algorithms that it comes
> up with.
>

Thanks, but I'd also appreciate if you point out a couple of specific (and
clean) examples.

--
Hovik Melikyan

Hovik Melikyan

unread,
Sep 24, 2003, 1:55:14 PM9/24/03
to
Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F71AA82...@xemaps.com>...

I know that I'm looking at something that was posted in reply to my
reference to a refcounted, copy-on-write string implementation
(lock-free and CAS-free, BTW). If his code is not a RC/COW thing
[despite its name] then it was not relevant as a response to mine.

There are race conditions in his code wherever you see this:

if ( 2 > IntAtomicGet( data_->refs ) ||
1 > IntAtomicDecrement( data_->refs ) )

like in String::~String(), or this:

switch ( IntAtomicGet( data_->refs ) ) {
case 01: ...
case 00: ...
default: ...


The conclusion is that an atomic refcount that only returns signness
or even worse, only zeroness is at least useless and at most
completely brain-d... ok, ok, ill-designed. As far as I remember Alex
was saying the same thing, though with regard to Windows
InterlockedXXX, not his own code. Anyway, I think we reached agreement
on this matter.

And one more thing: if you use atomic.get_value() it means most likely
your code is erroneous (with extremely rare exceptions which I think
can be cleanly described; Alex's String class is definitely not among
those exceptions).

--
Hovik Melikyan

Hovik Melikyan

unread,
Sep 24, 2003, 2:01:29 PM9/24/03
to
Alexander Terekhov <tere...@web.de> wrote in message news:<3F71AEA8...@web.de>...
>
> Kapis?
>

Net :)

> regards,
> alexander.

Joe Seigh

unread,
Sep 24, 2003, 2:28:43 PM9/24/03
to

Hovik Melikyan wrote:
>
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:3F717F36...@xemaps.com...

> > I was thinking of linked list stacks, not contiguous stacks.
>
> Ok, linked list stack add/remove, lock-free wherever possible. First, I'd do
> pop with atomic exchange:
>
> item* stack_ptr;
>
> item* stack_pop()
> {
> return atomic_exchange(&stack_ptr, stack_ptr->next);
> }
>
> This implies that the top stack item can't be accessed and popped
> concurrently, and I think that would be a result of poor design. In typical
> applications you access data and then pop it off the stack synchronously.

So it's broken.


>
> As for safe stack_push() for linked lists, it is tough and I can't think of
> any lock-free implementation [yet] which is without a loop. A loop (wait
> until you get what you expected) is possible both with the CAS scenario and
> with XCHG, btw, so even here I can get rid of CAS. Besides, I'd prefer
> locking over looping in this case, since loops are less predictable.
>
> What I'm trying to say in general is that CAS _forces_ you to loop until
> there are no concurrent attempts to do the same thing. So CAS-based
> scenarios can be quite inefficient if the point of conflict is crowded. I'd
> say looping is `built' into the concept of compare-and-swap(-or-whatever)
> and that's what I dislike about it.

Usually the hardware implementation will provide fair forward progress for
CAS loops. They don't explicitly guarantee it usually but almost all
synchronization api's don't explicitly guarantee fair forward progress.
If not, in the case of a SMP hardware vendor with a death wish, you can
always provide a back off outer loop to reduce the contention level.

Some of the fetch and op implementations aren't any better since the loop
logic is just in microcode. Some of the new architectures have moved
fetch and op stuff to the memory controller, e.g. fetchadd on Itanium,
so they will have slightly better contention.

>
> > Besides,
> > I don't think you can push on or pop off a contiguous stack without
> > blocking. Something about atomically accessing the stack data and
> > manipulating the stack pointer at the same time.
>
> Pushing with XCHG like I proposed is absolutely safe, it'd never interfere
> with pop or access operations. Also, you can make pop operation lock-free as
> well, provided that what is said for stack_pop() above is true for
> contiguous stacks too.

Pushing with xchg requires two memory locations to be locked by the hardware.
AFAIK there are no machines with that apart from some ancient uniprocessor
Burroughs computers where that trick worked. But a lot of CISC instructions
that are atomic on uniprocessors, aren't on multiprocessors.


>
> >
> > Search google for "lock-free" and study some the algorithms that it comes
> > up with.
> >
>
> Thanks, but I'd also appreciate if you point out a couple of specific (and
> clean) examples.
>

Try http://citeseer.nj.nec.com/valois95lockfree.html for a start. CiteSeer
has a lot of stuff but their search sucks. Google works better.

Joe Seigh

Alexander Terekhov

unread,
Sep 24, 2003, 2:45:16 PM9/24/03
to

Hovik Melikyan wrote:
[...]

> There are race conditions in his code wherever you see this:
>
> if ( 2 > IntAtomicGet( data_->refs ) ||
> 1 > IntAtomicDecrement( data_->refs ) )
>
> like in String::~String(), or this:
>
> switch ( IntAtomicGet( data_->refs ) ) {
> case 01: ...
> case 00: ...
> default: ...

Both are fine. Read up on "basic" thread-safety (also known as
"thread safe as an int").

regards,
alexander.

Joe Seigh

unread,
Sep 24, 2003, 3:00:55 PM9/24/03
to

It should be "thread safe as char[]". Int is actually atomic under
some operations.

(to Hovik) In C++ land the convention is that thread-safe
doesn't mean atomic. You have to own the String object to
access it. This means the reference count cannot go to
zero except if you are dropping a reference by deleting
the String object. This avoids a lot of race conditions
that would otherwise exist if you were writing atomically
thread-safe objects.

Joe Seigh

Alexander Terekhov

unread,
Sep 24, 2003, 3:13:48 PM9/24/03
to

Hovik Melikyan wrote:
>
> Alexander Terekhov <tere...@web.de> wrote in message news:<3F71AEA8...@web.de>...
> >
> > Kapis?
> >
>
> Net :)

atomic_exchange(&initialized, 1) can't be used prior to init. The
next thread can go boom immediately after you exchange 0 with 1.

regards,
alexander.

--
"otvali, durnaya ptica,
uletaj s peredovoj.
zdes' tebe ne pozhevit'sa.
fuck you, voron! ja ne tvoj!"

-- http://www.stihi.ru/poems/2001/10/03-362.html

SenderX

unread,
Sep 24, 2003, 3:39:50 PM9/24/03
to
> Ok, linked list stack add/remove, lock-free wherever possible. First, I'd
do
> pop with atomic exchange:
>
> item* stack_ptr;
>
> item* stack_pop()
> {
> return atomic_exchange(&stack_ptr, stack_ptr->next);
> }

Nope.

Use cmpxchg for push, and cmpxchg8b for pop.


> Thanks, but I'd also appreciate if you point out a couple of specific (and
> clean) examples.

I've posted many.

Here is the code I use for a lock-free stack:


LockFree_IA32.asm
---------------------

.586
.MODEL FLAT, C
.CODE

; Lock-Free Stack Push
; extern void acLfStackPush( ac_LfStack* pStack, ac_LfNode* pNode );
acLfStackPush PROC

; Load the front
mov edx, [esp + 4]
mov eax, [edx]

; Load the new node
mov ecx, [esp + 8]


acLfStackPush_Retry:

; Prepare the Cas
mov [ecx], eax

; Try and link the new node
cmpxchg [edx], ecx

jne acLfStackPush_Retry

ret

acLfStackPush ENDP


; Lock-Free Stack Pop
; extern ac_LfNode* acLfStackPop( ac_LfStack* pStack );
acLfStackPop PROC

; Preserve
push esi
push ebx

; Load the front
mov esi, [esp + 12]
mov edx, [esi + 4]
mov eax, [esi]


acLfStackPop_Retry:

; Validate
test eax, eax

je acLfStackPop_Fail

; Prepare the Cas
mov ebx, [eax]
lea ecx, [edx + 1]

; Try and unlink the front
cmpxchg8b qword ptr [esi]

jne acLfStackPop_Retry


acLfStackPop_Fail:

; Restore
pop ebx
pop esi

ret

acLfStackPop ENDP

SenderX

unread,
Sep 24, 2003, 3:46:17 PM9/24/03
to
> Use cmpxchg for push, and cmpxchg8b for pop.

cmpxchg = CAS
cmpxchg8b = CAS2 ( not DCAS )

SenderX

unread,
Sep 24, 2003, 4:56:23 PM9/24/03
to
> Nope.
>
> Use cmpxchg for push, and cmpxchg8b for pop.

ll/sc can be used as well, and it eliminates the need for an ABA sequence
number.

PowerPC has one.

Mike Mowbray

unread,
Sep 24, 2003, 9:37:33 PM9/24/03
to
Hovik Melikyan wrote:

> What I'm trying to say in general is that CAS _forces_ you to
> loop until there are no concurrent attempts to do the same thing.
> So CAS-based scenarios can be quite inefficient if the point of
> conflict is crowded. I'd say looping is `built' into the concept
> of compare-and-swap(-or-whatever) and that's what I dislike
> about it.

Youch! Just a minute... In practical cases, the looping is not
a problem because mostly the thread succeeds on the first try.
And even if it has to try a second time, that's rare. Also, if
you're comparing mutex-locking against a (well-written) CAS loop,
the CAS loop will win easily every time. Modern mutex-lock code
tends to be adaptive: if thread-A owning the mutex is on-CPU, then
thread-B spins trying to acquire the mutex and only sleeps if
thread-A goes off-CPU. Try some performance tests (on a multi-CPU
machine) of a CAS-based atomic-increment versus a mutex-based
increment and you'll see that the CAS version gets exponentially
better as the number of contending threads increases. Even with
a single thread, the CAS version is always better in my experience.

> [...] I'd also appreciate if you point out a couple of specific
> (and clean) examples [of conditional-CAS applications]

OK, how about this...

Imagine a multi-threaded database server engine. It has a pool
of threads that handle transactions, involving disk access through
a shared cache of blocks. We wish to know dynamically (i.e: in
production) how well our cache algorithms are performing. As one
part of this, we wish to take samples every minute (say) of the
maximum time that any thread was kept waiting by the shared cache,
(i.e: to help diagnose and improve any long-delay transaction
problems.)

To implement this measurement, each transaction thread notes the
(hi-res) time in nanosecs before and after each access, and assigns
the difference to a global integer atomically but *only* if the
difference is *greater* than the value currently stored in the
global integer. Independently, every minute a "sampler" thread wakes
up and performs an atomic exchange to grab the global integer's value
and replace it with 0, before saving the sampled value somehow.

This gives a cheap way of keeping a simple statistic without less
overhead than a mutex-based solution. I doubt you could achieve
this with merely atomic inc/dec.


- MikeM.

SenderX

unread,
Sep 25, 2003, 12:24:13 AM9/25/03
to
> Youch! Just a minute... In practical cases, the looping is not
> a problem because mostly the thread succeeds on the first try.

Yep.


> And even if it has to try a second time, that's rare.

"Mostly", taking SMP into consideration. CAS is fair though...

I haven't seen live-lock on properly built CAS sections......


> Also, if
> you're comparing mutex-locking against a (well-written) CAS loop,
> the CAS loop will win easily every time.

Yep.


> Also, if
> you're comparing mutex-locking against a (well-written) CAS loop,
> the CAS loop will win easily every time.

For sure.


> Try some performance tests (on a multi-CPU
> machine) of a CAS-based atomic-increment versus a mutex-based
> increment and you'll see that the CAS version gets exponentially
> better as the number of contending threads increases. Even with
> a single thread, the CAS version is always better in my experience.

Yep.


> This gives a cheap way of keeping a simple statistic without less
> overhead than a mutex-based solution. I doubt you could achieve
> this with merely atomic inc/dec.

Yep...

SenderX

unread,
Sep 25, 2003, 12:56:29 AM9/25/03
to
> > Also, if
> > you're comparing mutex-locking against a (well-written) CAS loop,
> > the CAS loop will win easily every time.
>
> Yep.
>
>
> > Also, if
> > you're comparing mutex-locking against a (well-written) CAS loop,
> > the CAS loop will win easily every time.
>
> For sure.

Sorry about that double quote.

=)


Hovik Melikyan

unread,
Sep 25, 2003, 1:39:00 AM9/25/03
to
Alexander Terekhov <tere...@web.de> wrote in message news:<3F71ECE...@web.de>...

>
> atomic_exchange(&initialized, 1) can't be used prior to init. The
> next thread can go boom immediately after you exchange 0 with 1.
>

It won't go boom, it will see 1. All threads other than the first one
will exchange 1 with 1.

> regards,
> alexander.
>
> --
> "otvali, durnaya ptica,
> uletaj s peredovoj.
> zdes' tebe ne pozhevit'sa.
> fuck you, voron! ja ne tvoj!"
>

He-he, "Nevermore!", kak govoritsya. :))

--
Hovik Melikyan

Hovik Melikyan

unread,
Sep 25, 2003, 2:18:29 AM9/25/03