Threads and initialization only once

Thomas Segev

unread,

Sep 21, 2003, 7:16:08 AM9/21/03

to

In a multi-threaded application that runs on UNIX and NT, I need to
use a parameter which may be defined by an environment variable. If no
environment variable exists, I use a hard coded default.

The environment variables are not changed by this application.

Each time I need to use that parameter I could first try getenv(), and
then depending on the result I could use it, or use my default.
However I would like to make it more efficient and determine the value
to use only once (by the first thread). I can use the pthread_once
routine for this purpose.

I have two questions:
1. Is there some eqvivalent to pthread_once on NT ?

2. What can be wrong if instead I use the following (non)solution on
UNIX:

static int var = DEFAULT;
static int initialized = 0;

if (initialized)
{
/* use var */
...
}
else
if(pe = getenv(ENV_VARIALBE))
{
/* convert and assign the value of the environment variable to
var */
...
initialized = 1;
}

I realize that in this way multiple threads may call getenv() in
parallel and then set the value of var and initialized.
Although getenv()is not thread safe, there should not be a problem
with it as long as the application does not change or add environment
variables. Correct ?

Is there a danger in the above (other then being not formally thread
safe) ?
Can the value of var be unpredictable because multiple threads assign
it the same value in parallel ?

Thanks,
Thomas

Thomas Segev

unread,

Sep 21, 2003, 7:39:39 AM9/21/03

to

I would like to correct the program in my first post:

if (initialized)
{
/* use var */
...
}
else

{
pe = getenv(ENV_VARIALBE);
if (pe)

{
/* convert and assign the value of the
environment variable to var */
...
}
initialized = 1;
}

Thanks,
Thomas
"Thomas Segev" <thomas...@bmc.com> wrote in message
news:83b78f96.03092...@posting.google.com...

Patrick TJ McPhee

unread,

Sep 21, 2003, 10:09:34 PM9/21/03

to

In article <83b78f96.03092...@posting.google.com>,
Thomas Segev <thomas...@bmc.com> wrote:

% In a multi-threaded application that runs on UNIX and NT, I need to
% use a parameter which may be defined by an environment variable. If no
% environment variable exists, I use a hard coded default.
%
% The environment variables are not changed by this application.

Why not initialise a variable at the very start, when you're still
single-threaded? You don't need any synchronisation at all if you do
it that way.

--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

Thomas Segev

unread,

Sep 21, 2003, 11:28:23 PM9/21/03

to

Thanks Patrick, but my code is a dynamic load library and I am restricted
now to changes only there.
Unfortunately there is no process initialization call provided by this
library.
Even if I add such a call now, the calling thread code can not be modified
at present (I need a solution which is backwards compatible).

Thomas
"Patrick TJ McPhee" <pt...@interlog.com> wrote in message
news:bkllkt$e9a$3...@news.eusc.inter.net...

Alexander Terekhov

unread,

Sep 22, 2003, 2:32:59 AM9/22/03

to

Thomas Segev wrote:
[...]

> I have two questions:
> 1. Is there some eqvivalent to pthread_once on NT ?

http://groups.yahoo.com/group/boost/message/15442
("named mutex" thing)

http://groups.google.com/groups?selm=3ECBD751.8DE8AED5%40web.de
(DCSI stuff; you can use a "named mutex" instead of static one)

regards,
alexander.

David Schwartz

unread,

Sep 22, 2003, 5:44:17 AM9/22/03

to

"Thomas Segev" <thomas...@bmc.com> wrote in message

news:vmr3joh...@corp.supernews.com...

> I would like to correct the program in my first post:
> if (initialized)
> {
> /* use var */
> ...
> }
> else
> {
> pe = getenv(ENV_VARIALBE);
> if (pe)
> {
> /* convert and assign the value of the
> environment variable to var */
> ...
> }
> initialized = 1;
> }

The problem occurs in the first part of the 'if' block. Even though a
thread sees 'initialized' as one, there is no guarantee that it sees the
correct value for the variable.

I'm really curious why you are going to all this craziness. Why not just
use a CRITICAL_SECTION? Or why not use a CRITICAL_SECTION in combination
with an Interlocked* function? Or some other sane solution rather than
relying upon the compiler to 'just happen to' get it right?

Has profiling proven that using normal locking is a bottlenect?

DS

Thomas Segev

unread,

Sep 22, 2003, 6:49:21 AM9/22/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:bkmg9p$bcg$1...@nntp.webmaster.com...

>
> "Thomas Segev" <thomas...@bmc.com> wrote in message
> news:vmr3joh...@corp.supernews.com...
>
> > I would like to correct the program in my first post:
> > if (initialized)
> > {
> > /* use var */
> > ...
> > }
> > else
> > {
> > pe = getenv(ENV_VARIALBE);
> > if (pe)
> > {
> > /* convert and assign the value of the
> > environment variable to var */
> > ...
> > }
> > initialized = 1;
> > }
>
> The problem occurs in the first part of the 'if' block. Even though a
> thread sees 'initialized' as one, there is no guarantee that it sees the
> correct value for the variable.

Could you explain why not ? Assuming the environment variable does
not change, several thread may attempt to set to the same value in
"parallel". What can go wrong here ?

>
> I'm really curious why you are going to all this craziness. Why not
just
> use a CRITICAL_SECTION? Or why not use a CRITICAL_SECTION in combination
> with an Interlocked* function? Or some other sane solution rather than
> relying upon the compiler to 'just happen to' get it right?

The application needs to be efficient. I don't want to add unnecessary
overhead.
>

> Has profiling proven that using normal locking is a bottlenect?

No, but why use it if it is not a must ?
So I am just wondering what can go wrong. How can a compiler implementation
corrupt the value of var ?

>
> DS
>
>
Thomas

David Butenhof

unread,

Sep 22, 2003, 7:26:23 AM9/22/03

to

Thomas Segev wrote:

> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:bkmg9p$bcg$1...@nntp.webmaster.com...
>>
>> "Thomas Segev" <thomas...@bmc.com> wrote in message
>> news:vmr3joh...@corp.supernews.com...
>>
>> > I would like to correct the program in my first post:
>> > if (initialized)
>> > {
>> > /* use var */
>> > ...
>> > }
>> > else
>> > {
>> > pe = getenv(ENV_VARIALBE);
>> > if (pe)
>> > {
>> > /* convert and assign the value of the
>> > environment variable to var */
>> > ...
>> > }
>> > initialized = 1;
>> > }
>>
>> The problem occurs in the first part of the 'if' block. Even though a
>> thread sees 'initialized' as one, there is no guarantee that it sees the
>> correct value for the variable.
>
> Could you explain why not ? Assuming the environment variable does
> not change, several thread may attempt to set to the same value in
> "parallel". What can go wrong here?

Efficient multiprocessor architectures, like Alpha and IA64, allow reads and
writes to be REORDERED to take best advantage of cache and interprocessor
memory protocols.

Without explicit synchronization control, there is no need to ensure that
one processor sees the writes of another processor in any particular order;
and therefore it makes no sense to slow down the pipeline to ensure that
everything occurs in order.

There have been many discussions on this newsgroup about the sort of "short
cut" technique you're trying to apply. On a machine without strict
cross-processor ordering, it just won't work because another processor may
see your write to "initialized" before it sees prior writes. Which means
that your 'var' may be uninitialized when some thread tries to read it,
even after having checked 'initialized'.

The only PORTABLE solution is to use proper synchronization around the whole
block -- e.g., locking a mutex before you test the value of initialized,
and unlocking it only after writing initialized.

There are more efficient non-portable solutions, generally involving memory
barriers or fences (depending on what machine you use), to ensure that both
reads and writes occur in the order you expect. That sort of
machine-specific coding generally makes sense only when the code will be
executed a lot and needs to be fast.

Do a google search for "DCL" or the less misleading term (coined by our
Alexander Terekhov) "DCSI" (double-checked serialized initialization)
should turn up a bunch of info. (You should restrict the search to
comp.programming.threads, because the abbreviations have been used for
various other things in other contexts.)

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

Thomas Segev

unread,

Sep 22, 2003, 9:04:14 AM9/22/03

to

Thanks David, this clarifies the issue.
Is there no simple "trick" to force order ?

Will the following work ?:

struct var
{
int value;
int initialized;
};

static struct var globalvar ={DEFAULT,0};

................
...............

struct init var localvar = {DEFAULT,0};

if (globalvar.initialized)
{
/* use globalvar.value */

...
}
else
{
pe = getenv(ENV_VARIALBE);
if (pe)
{
/* convert and assign the value of the

environment variable to localvar.value */
...
}
localvar.initialized = 1;
globalvar = localvar;
}

Thanks,
Thomas

"David Butenhof" <David.B...@hp.com> wrote in message
news:3f6e...@usenet01.boi.hp.com...

Patrick TJ McPhee

unread,

Sep 22, 2003, 2:15:52 PM9/22/03

to

In article <vmsr626...@corp.supernews.com>,
Thomas Segev <thomas...@bmc.com> wrote:

% Unfortunately there is no process initialization call provided by this
% library.
% Even if I add such a call now, the calling thread code can not be modified
% at present (I need a solution which is backwards compatible).

Depending on how it's loaded, you may be able to do it in an _init function.
If the library load itself is synchronised or happens when the process
has only one thread, the _init function will be synchronised.

Hovik Melikyan

unread,

Sep 23, 2003, 4:45:01 AM9/23/03

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3F6E979B...@web.de>...

> Thomas Segev wrote:
> [...]
> > I have two questions:
> > 1. Is there some eqvivalent to pthread_once on NT ?
>
> http://groups.yahoo.com/group/boost/message/15442
> ("named mutex" thing)
>

You wrote there: "Interlocked stuff does not formally provide memory
synch guaranties...". Can you please explain what's wrong with LOCK
XCHG?

On the original question: I still think the best way to do such things
is to use atomic exchange:

int initialized = 0;

if (atomic_exchange(&initialized, 1) == 0)
{
// ... initialization here ...
}

Look for a library that does portable atomic operations on both
Windows and Unix.

--
Hovik Melikyan

SenderX

unread,

Sep 23, 2003, 4:50:02 AM9/23/03

to

> You wrote there: "Interlocked stuff does not formally provide memory
> synch guaranties...".

Alex is wrong on this.

Interlocked API's do contain memory barriers on non-TSO systems.

--
The designer of the experimental, SMP and HyperThread friendly, AppCore
library.

http://AppCore.home.comcast.net

Alexander Terekhov

unread,

Sep 23, 2003, 5:20:23 AM9/23/03

to

Hovik Melikyan wrote:
[...]

> You wrote there: "Interlocked stuff does not formally provide memory
> synch guaranties...". Can you please explain what's wrong with LOCK
> XCHG?

Nothing wrong except that LOCK XCHG is IA32 instruction, not MS-WIN
interlocked stuff.

>
> On the original question: I still think the best way to do such things
> is to use atomic exchange:
>
> int initialized = 0;
>
> if (atomic_exchange(&initialized, 1) == 0)

This isn't the best way.

> {
> // ... initialization here ...
> }
>
> Look for a library that does portable atomic operations on both
> Windows and Unix.

You'll have to stay in this threading business until at least 2010
(or so), you'll have a chance to see initial draft then, I guess.

regards
alexander.

Alexander Terekhov

unread,

Sep 23, 2003, 5:28:01 AM9/23/03

to

SenderX wrote:
>
> > You wrote there: "Interlocked stuff does not formally provide memory
> > synch guaranties...".
>
> Alex is wrong on this.

I wrote that statement long before MS added acquire and release calls
and sort of "clarified" that old stuff is meant to provide both. They
are still defined rather poorly. I have to say that their "definition"
remains totally brain-dead... and, as you know, we already had a long
long thread on this topic here, Sender.

regards,
alexander.

David Butenhof

unread,

Sep 23, 2003, 9:10:49 AM9/23/03

to

Thomas Segev wrote:

> Thanks David, this clarifies the issue.
> Is there no simple "trick" to force order ?

No. You can't force the compiler to generate the memory instructions you'd
like; and even if you could that won't force the hardware to issue (and
more importantly, retire) them in the order you'd like.

> Will the following work ?:

Nope. Packaging the data in a structure doesn't guarantee they'll be written
in a single atomic instruction -- which is the guarantee you'd need. (And
if both values are 32 bit on a 32 bit machine, or 64 bits on a 64 bit
machine, it may not even HAVE the single atomic instruction you'd like.) If
the structure is copied in two instructions, then those two writes can be
reordered. And even if the writes weren't reordered, you're still issuing
separate reads, and those could be reordered. You could go one step further
in your attempt and copy globalvar into localvar and then read the fields
of localvar... except that still doesn't guarantee the compiler will
generate a single copy operation, so you'd be no better off.

DCSI is a machine-specific optimization. To avoid a lock you need to include
specialized machine instructions.

Hovik Melikyan

unread,

Sep 23, 2003, 9:26:31 AM9/23/03

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3F701057...@web.de>...

> Hovik Melikyan wrote:
> [...]
> > You wrote there: "Interlocked stuff does not formally provide memory
> > synch guaranties...". Can you please explain what's wrong with LOCK
> > XCHG?
>
> Nothing wrong except that LOCK XCHG is IA32 instruction, not MS-WIN
> interlocked stuff.
>

Windows NT/2000 InterlockedXXX stuff uses LOCK XCHG (also XADD). It is
different on Windows95/98 family since those systems can run on i386
which lacks XADD.

> >
> > int initialized = 0;
> >
> > if (atomic_exchange(&initialized, 1) == 0)
>
> This isn't the best way.
>

And the best way is ... ? (provided that you do have a portable atomic
exchange function handy)

>
> You'll have to stay in this threading business until at least 2010
> (or so), you'll have a chance to see initial draft then, I guess.
>

You mean, standardized atomic functions? This is tough, because you
need to have hardware support for it. Every engineering company has
its own way of doing threading and SMP. You'd never convince Sun to
add atomic increment (not compare-and-smth), for example. OTOH you
don't have to, I think. Sometimes by standardizing things you limit
creativity and the possibility of finding new solutions.

--
Hovik Melikyan

Luca

unread,

Sep 23, 2003, 9:36:03 AM9/23/03

to

Maybe I'm saying something stupid, but why don't you avoid the whole problem
safelly calling getenv(), reading a configuration file or settings default
config values at the beginning of your app, maybe filling a structure with
the program configuration? Then start all your threads passing the pointer
to the configuration strucure (or making it globally available).

Luca

Joe Seigh

unread,

Sep 23, 2003, 9:39:49 AM9/23/03

to

Hovik Melikyan wrote:
>
> Alexander Terekhov <tere...@web.de> wrote in message news:<3F701057...@web.de>...
>
> >

> > You'll have to stay in this threading business until at least 2010
> > (or so), you'll have a chance to see initial draft then, I guess.
> >
>
> You mean, standardized atomic functions? This is tough, because you
> need to have hardware support for it. Every engineering company has
> its own way of doing threading and SMP. You'd never convince Sun to
> add atomic increment (not compare-and-smth), for example. OTOH you
> don't have to, I think. Sometimes by standardizing things you limit
> creativity and the possibility of finding new solutions.
>

There are some interlocked functions that can be considered standard,
even on architectures that don't have them. Compare and swap is on
or can be simulated on almost every architecture. From there you
can construct fetch and op primatives, etc...

I think you'll find more creativity on platforms that do provide these
as programming api's than on platforms that do not.

Joe Seigh

Alexander Terekhov

unread,

Sep 23, 2003, 9:50:06 AM9/23/03

to

Hovik Melikyan wrote:
[...]
> InterlockedXXX stuff

http://groups.google.com/groups?threadm=3EC89028.7B60D559%40web.de

[...]

> You mean, standardized atomic functions? This is tough, because you
> need to have hardware support for it. Every engineering company has
> its own way of doing threading and SMP. You'd never convince Sun

http://gee.cs.oswego.edu/dl/concurrency-interest
http://gee.cs.oswego.edu/dl/concurrent/dist/docs/java/util/concurrent/atomic/AtomicInteger.html
http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000023.html
http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000024.html

regards,
alexander.

P.S. http://gee.cs.oswego.edu/dl/concurrency-interest/jsr166

"Preliminary versions of JSR166 concurrency utilities are available
for you to try out. This release contains versions that run natively
on solaris-sparc and linux-x86 using addons to hotspot 1.4.1, as
well as an "emulation mode" that should run on any 1.4.x JVM (but
slowly! at least 5 to 15 times slower)."

Thomas Segev

unread,

Sep 23, 2003, 10:02:19 AM9/23/03

to

"Luca" <n...@mail.net> wrote in message
news:bkpi72$nm9$1...@galileo.it.ip-plus.net...

> Thomas
>
> Maybe I'm saying something stupid, but why don't you avoid the whole
problem
> safelly calling getenv(), reading a configuration file or settings default
> config values at the beginning of your app, maybe filling a structure with
> the program configuration? Then start all your threads passing the pointer
> to the configuration strucure (or making it globally available).
>
> Luca
>
>

Luca,

Because my code is a dynamic load library and I am restricted now to changes
only there.

Unfortunately there is no process initialization call provided by this

library.

Even if I add such a call now, the calling thread code can not be modified

at present (I need a solution which is backwards compatible).

Thomas

Hovik Melikyan

unread,

Sep 23, 2003, 1:31:03 PM9/23/03

to

ho...@melikyan.com (Hovik Melikyan) wrote in message news:<ad206972.03092...@posting.google.com>...

> Alexander Terekhov <tere...@web.de> wrote in message > > >

> > > int initialized = 0;
> > >
> > > if (atomic_exchange(&initialized, 1) == 0)
> >
> > This isn't the best way.
> >
>
> And the best way is ... ? (provided that you do have a portable atomic
> exchange function handy)
>

The best way is tO^8###*&JKnssd&^ LOST CARRIER_

:)

So ... ? I'm still waiting and I'm impatient. I'd even accept your
recursive references, Alexander. :)

--
Hovik Melikyan

Hovik Melikyan

unread,

Sep 23, 2003, 1:37:28 PM9/23/03

to

Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F704FB1...@xemaps.com>...

>
> There are some interlocked functions that can be considered standard,
> even on architectures that don't have them. Compare and swap is on
> or can be simulated on almost every architecture. From there you
> can construct fetch and op primatives, etc...
>

I have a feeling that `weak' atomic functions (i.e.
compare-and-dosmth) can actually be replaced with a simple group of
functions that exchange, increment (and decrement) counters without
comparing with an expected value.

Ok, this is only a feeling :) but can anyone (1) explain why
algorithms based on CAS can't be replaced with algorithms that use
plain atomic exchange and (2) what does it mean in general and what
are typical programming patterns for CAS?

This feeling (of uselessness? superfluity? of CAS) comes from the fact
that I've never used any compare-and-... logic and I always managed to
solve the problem with plain exchange/inc/dec family of functions.
Basically you need them when you want to do performance optimization
where otherwise you'd use standard sync objects (mutex, sema, ...).

So the question is do plain atomic exchange/inc/dec replace CAS and
Co.?

Sorry for [slight] off-topic wrt this thread.

--
Hovik Melikyan

Joe Seigh

unread,

Sep 23, 2003, 2:30:32 PM9/23/03

to

Hovik Melikyan wrote:
>
> > Alexander Terekhov <tere...@web.de> wrote in message > > >

> > And the best way is ... ? (provided that you do have a portable atomic
> > exchange function handy)
> >
>
> The best way is tO^8###*&JKnssd&^ LOST CARRIER_
>
> :)
>
> So ... ? I'm still waiting and I'm impatient. I'd even accept your
> recursive references, Alexander. :)
>

Alexander is still waiting and impatient with the rate of progress
of the Posix standards committee on getting the interlocked
functions in the standard. :)

Joe Seigh

unread,

Sep 23, 2003, 2:34:50 PM9/23/03

to

Hovik Melikyan wrote:
>
> Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F704FB1...@xemaps.com>...

...

>
> So the question is do plain atomic exchange/inc/dec replace CAS and
> Co.?

There are things that you can do with CAS that you cannot do with
exchange/inc/dec. Pushing and popping from a stack for example.
Conditional things like decrement only if positive, etc ...

Joe Seigh

SenderX

unread,

Sep 23, 2003, 2:53:41 PM9/23/03

to

> Alexander is still waiting and impatient with the rate of progress
> of the Posix standards committee on getting the interlocked
> functions in the standard. :)

Alex calls the Interlocked API "totally brain-damaged"...

Why would Alex want POSIX to contain these horrible functions?

=P

David Schwartz

unread,

Sep 23, 2003, 2:53:09 PM9/23/03

to

"Thomas Segev" <thomas...@bmc.com> wrote in message

news:vmtl1fs...@corp.supernews.com...

> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:bkmg9p$bcg$1...@nntp.webmaster.com...

> > > if (initialized)

> > > {
> > > /* use var */
> > > ...
> > > }
> > > else
> > > {
> > > pe = getenv(ENV_VARIALBE);
> > > if (pe)
> > > {
> > > /* convert and assign the value of the
> > > environment variable to var */
> > > ...
> > > }
> > > initialized = 1;
> > > }

> > The problem occurs in the first part of the 'if' block. Even though
a
> > thread sees 'initialized' as one, there is no guarantee that it sees the
> > correct value for the variable.

> Could you explain why not ?

I just did.

> Assuming the environment variable does
> not change, several thread may attempt to set to the same value in
> "parallel". What can go wrong here ?

I just told you, just because a thread sees 'initialized' as one, it
does not follow that it must see the correct value for the variable.

Here's an example, support we start with:

int i=1, j=1;

And another thread does this:

i=2;
j=2;

Does it follow that another thread that sees 'j' as 2 must also see 'i'
as 2? The answer is no. Nothing prevents the compiler, processor, memory
hardware, or who knows what from reordering those two writes. In fact, 'i'
could be completely optimied into a register by the writing thread, so that
the 'i=2;' write has no effect on another thread that reads the 'i'
variable.

> > I'm really curious why you are going to all this craziness. Why not
> > just
> > use a CRITICAL_SECTION? Or why not use a CRITICAL_SECTION in combination
> > with an Interlocked* function? Or some other sane solution rather than
> > relying upon the compiler to 'just happen to' get it right?

> The application needs to be efficient. I don't want to add unnecessary
> overhead.

I don't know what to say to this. This is just so wrong for so many
reasons.

> > Has profiling proven that using normal locking is a bottlenect?

> No, but why use it if it is not a must ?
> So I am just wondering what can go wrong. How can a compiler
implementation
> corrupt the value of var ?

You are trying to rely upon guarantees that you don't have. What good is
an efficient program that is broken? Please google for 'premature
optimization'.

DS

David Schwartz

unread,

Sep 23, 2003, 2:54:35 PM9/23/03

to

"Thomas Segev" <thomas...@bmc.com> wrote in message

news:vmtsud5...@corp.supernews.com...

> Thanks David, this clarifies the issue.
> Is there no simple "trick" to force order ?

No, there is not.

> Will the following work ?:

No, it won't.

Stop this, please. Premature optimization is just wrong, plain and
simple. What you are doing won't work.

DS

SenderX

unread,

Sep 23, 2003, 2:58:54 PM9/23/03

to

> Stop this, please. Premature optimization is just wrong, plain and
> simple. What you are doing won't work.

I bet he can get it to work...

He has to use specific atomic-ops.

Not very portable, but they will work.

SenderX

unread,

Sep 23, 2003, 4:17:29 PM9/23/03

to

> Nope. Packaging the data in a structure doesn't guarantee they'll be
written
> in a single atomic instruction -- which is the guarantee you'd need. (And
> if both values are 32 bit on a 32 bit machine, or 64 bits on a 64 bit
> machine, it may not even HAVE the single atomic instruction you'd like.)

x86 has 64-bit swaps. Two pointers can be cas'd.

IA-64 has 64-bit compare and 128-bit swap. You can compare a single pointer,
and swap two.

IA-64 has release/acquire barriers built into the CAS.

AMD-64 can only CAS a single pointer.

PowerPC has LL/SC, any other processor with LL/SC can do it as well.

> DCSI is a machine-specific optimization. To avoid a lock you need to
include
> specialized machine instructions.

Its more than machine-specific?

What about the compiler... If the compiler were to interfere with the
ordering of instructions, odd things might happen.

;)

Here is some pseudo-code of an initialize-once algo...

Lock-Free Instance Code for...

Compiler: VC++ 6.0

// The shared pointer
volatile C_Object *pSharedObj = NULL;

// Turn off compiler optimizer
#pragma optimize( "", off )

// Instance the object
C_Object&

InstanceObject()
{
// Load the shared object
volatile C_Object* pLocalPtr
= InterlockedCompareExchangePointer
( &pSharedObj,
NULL, // Swap NULL
NULL ); // Compare with NULL

// If NULL, there is no object
if ( pLocalPtr == NULL )
{
// Create a new object
volatile C_Object* pNewPtr = new C_Object;

// Load the shared object again,
// and try and store the new one
pLocalPtr =
InterlockedCompareExchangePointer
( &pSharedObj,
pNewPtr, // Swap new object
NULL ); // Compare with NULL

// If NULL, the object has been updated by us
if ( pLocalPtr == NULL )
{
pLocalPtr = pNewPtr;
}

else
{
// Another thread swapped before us
delete pNewPtr;
}

// IA-64 Release
}

// IA-64 Acquire

// We got it!
return *pLocalPtr;
}

// Turn on compiler optimizer
#pragma optimize( "", on )

This should ONLY work for the VC++ 6.0 compiler, and maybe higher.

SenderX

unread,

Sep 23, 2003, 4:20:18 PM9/23/03

to

> What about the compiler... If the compiler were to interfere with the
> ordering of instructions, odd things might happen.

You could just code the whole thing in assembler. You don't have to worry
about your assembler instructions get reordered by the compiler.

=)

Thomas Segev

unread,

Sep 24, 2003, 12:05:09 AM9/24/03

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:bkq4qr$7do$1...@nntp.webmaster.com...

>
> "Thomas Segev" <thomas...@bmc.com> wrote in message
> news:vmtl1fs...@corp.supernews.com...
>
> > "David Schwartz" <dav...@webmaster.com> wrote in message
> > news:bkmg9p$bcg$1...@nntp.webmaster.com...
>
> > > > if (initialized)
> > > > {
> > > > /* use var */
> > > > ...
> > > > }
> > > > else
> > > > {
> > > > pe = getenv(ENV_VARIALBE);
> > > > if (pe)
> > > > {
> > > > /* convert and assign the value of the
> > > > environment variable to var */
> > > > ...
> > > > }
> > > > initialized = 1;
> > > > }
>
> > > The problem occurs in the first part of the 'if' block. Even
though
> a
> > > thread sees 'initialized' as one, there is no guarantee that it sees
the
> > > correct value for the variable.
>
> > Could you explain why not ?
>
> I just did.

I did not understand it the first time.

>
> > Assuming the environment variable does
> > not change, several thread may attempt to set to the same value in
> > "parallel". What can go wrong here ?
>
> I just told you, just because a thread sees 'initialized' as one, it
> does not follow that it must see the correct value for the variable.
>
> Here's an example, support we start with:
>
> int i=1, j=1;
>
> And another thread does this:
>
> i=2;
> j=2;
>
> Does it follow that another thread that sees 'j' as 2 must also see
'i'
> as 2? The answer is no. Nothing prevents the compiler, processor, memory
> hardware, or who knows what from reordering those two writes. In fact, 'i'
> could be completely optimied into a register by the writing thread, so
that
> the 'i=2;' write has no effect on another thread that reads the 'i'
> variable.
>

What about volatile ? Why won't it help here ?

> > > I'm really curious why you are going to all this craziness. Why
not
> > > just
> > > use a CRITICAL_SECTION? Or why not use a CRITICAL_SECTION in
combination
> > > with an Interlocked* function? Or some other sane solution rather than
> > > relying upon the compiler to 'just happen to' get it right?
>
> > The application needs to be efficient. I don't want to add unnecessary
> > overhead.
>
> I don't know what to say to this. This is just so wrong for so many
> reasons.
>

I just want to understand why to do or not to do something.

> > > Has profiling proven that using normal locking is a bottlenect?
>
> > No, but why use it if it is not a must ?
> > So I am just wondering what can go wrong. How can a compiler
> implementation
> > corrupt the value of var ?
>
> You are trying to rely upon guarantees that you don't have. What good
is
> an efficient program that is broken? Please google for 'premature
> optimization'.
>

Ok, ok, I promise to behave.

> DS
>
>

Thomas

Thomas Segev

unread,

Sep 24, 2003, 12:12:54 AM9/24/03

to

Please explain why volatile will not help here.

Thomas
"SenderX" <x...@xxx.xxx> wrote in message
news:OJ0cb.561994$uu5.92097@sccrnsc04...

David Schwartz

unread,

Sep 24, 2003, 2:19:23 AM9/24/03

to

"Thomas Segev" <thomas...@bmc.com> wrote in message

news:vn26326...@corp.supernews.com...

> What about volatile ? Why won't it help here ?

Because 'volatile' has nothing to do with threads.

> Ok, ok, I promise to behave.

You'll be glad you did.

DS

David Schwartz

unread,

Sep 24, 2003, 2:20:43 AM9/24/03

to

"Thomas Segev" <thomas...@bmc.com> wrote in message

news:vn26hj9...@corp.supernews.com...

> Please explain why volatile will not help here.

Because 'volatile' is part of the C/C++ standard which says nothing
about what will or won't work in a multithreaded program. If the threading
standard you're using says it will work, then it will. It is known for a
fact that:

1) It does not work on WIN32.

2) It is not guaranteed to work by the pthreads standard.

DS

Alexander Terekhov

unread,

Sep 24, 2003, 3:41:30 AM9/24/03

to

Hovik Melikyan wrote:
[...]

> So ... ? I'm still waiting and I'm impatient.

I already provided a sort of answer. In my first reply.

> I'd even accept your recursive references, Alexander. :)

http://groups.google.com/groups?selm=3F6E979B.AE5A2BB9%40web.de
(click on the second link... and so forth)

regards,
alexander.

Alexander Terekhov

unread,

Sep 24, 2003, 5:17:58 AM9/24/03

to

SenderX wrote:
>
> > Alexander is still waiting and impatient with the rate of progress
> > of the Posix standards committee on getting the interlocked
> > functions in the standard. :)
>
> Alex calls the Interlocked API "totally brain-damaged"...
>
> Why would Alex want POSIX to contain these horrible functions?

Right.

http://terekhov.de/pthread_refcount_t/experimental/refcount.cpp
(see msync and atomic<> sketch)

I, for one, want POSIX to contain something along the lines of atomic<>
with "a whole bunch" of implementation provided specializations (for
scalars that can be made atomic). Plain C "layer" would have to resort
to some silly macros (or bizillion of atomic calls)... "implemented" on
top of C++ stuff, of course. A sort of first main problem is that POSIX
folks just keep telling me that "C++ bindings" isn't their business, so
to speak. For some reason, they just fail to realize that what they
have currently, can (and ought to) be "expressed" along the lines of:

typedef std::thread * pthread_t;

extern "C" pthread_t pthread_self() throw() {
return std::thread_self().raw_ptr();
}

typedef thread_specific_ptr<void, __c_TSD_cleanup> * pthread_key_t;

extern "C" int pthread_key_create(pthread_key_t * key,
void ( * dtor)(void *)) throw() {
try {
// can throw "shall fail" stuff only (std::bad_alloc/try_again)
*key = new pthread_key_t(__c_TSD_cleanup(dtor));
// "may fail" shall be caught in the std::unexpected() handler
}
catch(...) {
return __translate_exception_to_error_code_using_throw_and_catch();
}
return 0;
}

// PODs
typedef std::aligned_storage<std::mutex> pthread_mutex_t;
typedef std::aligned_storage<std::mutexattr_t> pthread_mutexattr_t;

extern "C" int pthread_mutex_init(pthread_mutex_t * mutex_storage,
const pthread_mutexattr_t * attr_storage) throw() {
try {
// can throw "shall fail" stuff only
new (mutex_storage->place()) std::mutex(attr_storage->object());
// "may fail" shall be caught in the std::unexpected() handler
}
catch(...) {
return __translate_exception_to_error_code_using_throw_and_catch();
}
return 0;
}

extern "C" int pthread_mutex_destroy(pthread_mutex_t * mutex_storage)
throw() {
// destructor with implicit throw()-nothing ES
mutex_storage->object().~object();
// "may fail" shall be caught in the std::unexpected() handler
return 0;
}

#define PTHREAD_CANCELED std::thread_canceled()

extern "C" void pthread_exit(void * ptr)
throw(std::thread_termination_request) {
ptr == PTHREAD_CANCELED ? std::thread_cancel() :
std::thread_exit(ptr);
}

struct thread_canceled {
operator void * () { return &unique; }
static thread_canceled unique;
};

template<typename T>
void thread_exit(T value) {
assert(std::thread_self().can_exit_with<T>());
throw thread_exit_value(value);
}

template<>
void thread_exit(std::thread_canceled) {
thread_cancel();
}

void thread_cancel() {
throw std::thread_cancel_request();
}

and so forth.

regards,
alexander.

Hovik Melikyan

unread,

Sep 24, 2003, 5:31:22 AM9/24/03

to

Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F7094D7...@xemaps.com>...

> Hovik Melikyan wrote:
> >
> > So the question is do plain atomic exchange/inc/dec replace CAS and
> > Co.?
>
> There are things that you can do with CAS that you cannot do with
> exchange/inc/dec. Pushing and popping from a stack for example.

I forgot to tell that I mean inc/dec that return values either before
or after the inc/dec operation (actually it doesn't matter), not just
negative/zero/positive result.

If so, I can push to the stack using a plain atomic increment: it
gives me the stack pointer where I should store stuff and increments
the SP atomically. So it works fine without CAS.

>
> Conditional things like decrement only if positive, etc ...
>

That's what I'm asking: what are those "conditional things" for? Show
me an example and I'll try to give a different way of doing it.

--
Hovik Melikyan

Hovik Melikyan

unread,

Sep 24, 2003, 6:00:12 AM9/24/03

to

Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F7093D5...@xemaps.com>...

>
> Alexander is still waiting and impatient with the rate of progress
> of the Posix standards committee on getting the interlocked
> functions in the standard. :)
>

http://groups.google.com/groups?selm=3EC108B8.980ACE53%40web.de

You mean this?

Great. But first, his functions not always have the 'return' statement
where they should :) and I assume that their return values indicate
`signness' or 'zeroness' of the resulting value. It's ok, but I have
several algorithms that use the exact value after the inc/dec
operation. One example is pushing to the stack (in the other porting).

Another example is a dynamic string class with copy-on-write that I
implemented [ http://tinyurl.com/oh2e ] The multithreaded version of
unique() has an atomic increment and comparison with 2, also some
other parts of the module has comparisons with 0. Signess won't help
here, or maybe there can be some other algorithm I'm not aware of.

These atomic functions are powerful and they can do many things, if
not any thing. So I'd propose to make pthread_refcount_* return the
values, not just signess.

--
Hovik Melikyan

Alexander Terekhov

unread,

Sep 24, 2003, 6:35:28 AM9/24/03

to

Hovik Melikyan wrote:
>
> Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F7093D5...@xemaps.com>...
> >
> > Alexander is still waiting and impatient with the rate of progress
> > of the Posix standards committee on getting the interlocked
> > functions in the standard. :)
> >
>
> http://groups.google.com/groups?selm=3EC108B8.980ACE53%40web.de
>
> You mean this?

Yeah.

>
> Great. But first, his functions not always have the 'return' statement
> where they should :)

Nobody's perfect. Google is no exception. Try

http://google.com/groups?selm=3EC108B8.980ACE53%40web.de&output=gplain

> and I assume that their return values indicate

> `signness' or 'zeroness' of the resulting value. It's ok, ...

return values indicate errors or '0K' (zero is OK).

> Another example is a dynamic string class with copy-on-write ...

http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
http://groups.google.com/groups?selm=3EC0F1F1.B78AA0DA%40web.de

regards,
alexander.

--
'YHBT' is a transliteration of an ancient Hebrew phrase
meaning 'There is no God but Allah'. 'HAND' is an ancient
VAX error message meaning 'Halting All Network Devices'.

Joe Seigh

unread,

Sep 24, 2003, 7:14:46 AM9/24/03

to

Hovik Melikyan wrote:
>
> Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F7094D7...@xemaps.com>...
> > Hovik Melikyan wrote:
> > >
> > > So the question is do plain atomic exchange/inc/dec replace CAS and
> > > Co.?
> >
> > There are things that you can do with CAS that you cannot do with
> > exchange/inc/dec. Pushing and popping from a stack for example.
>
> I forgot to tell that I mean inc/dec that return values either before
> or after the inc/dec operation (actually it doesn't matter), not just
> negative/zero/positive result.
>
> If so, I can push to the stack using a plain atomic increment: it
> gives me the stack pointer where I should store stuff and increments
> the SP atomically. So it works fine without CAS.
>

I was thinking of linked list stacks, not contiguous stacks. Besides,
I don't think you can push on or pop off a contiguous stack without
blocking. Something about atomically accessing the stack data and
manipulating the stack pointer at the same time. Maybe with DCAS.

> >
> > Conditional things like decrement only if positive, etc ...
> >
>
> That's what I'm asking: what are those "conditional things" for? Show
> me an example and I'll try to give a different way of doing it.
>

Search google for "lock-free" and study some the algorithms that it comes
up with.

Joe Seigh

unread,

Sep 24, 2003, 7:23:54 AM9/24/03

to

Alexander Terekhov wrote:

>
> SenderX wrote:
>
> > Alex calls the Interlocked API "totally brain-damaged"...
> >
> > Why would Alex want POSIX to contain these horrible functions?
>
> Right.
>
> http://terekhov.de/pthread_refcount_t/experimental/refcount.cpp
> (see msync and atomic<> sketch)
>
> I, for one, want POSIX to contain something along the lines of atomic<>
> with "a whole bunch" of implementation provided specializations (for
> scalars that can be made atomic). Plain C "layer" would have to resort
> to some silly macros (or bizillion of atomic calls)... "implemented" on
> top of C++ stuff, of course. A sort of first main problem is that POSIX
> folks just keep telling me that "C++ bindings" isn't their business, so
> to speak. For some reason, they just fail to realize that what they
> have currently, can (and ought to) be "expressed" along the lines of:
>

I would forget about C++. C++ is a compromise based on another compromise, C.
It is profoundly beyond recoverability. Somebody will have to design another
language from scratch. Not Java though. Not Aspect. Nice tries but they
fall rather short.

Joe Seigh

Hovik Melikyan

unread,

Sep 24, 2003, 9:36:24 AM9/24/03

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3F717370...@web.de>...

> Hovik Melikyan wrote:
> >
> > Great. But first, his functions not always have the 'return' statement
> > where they should :)
>
> Nobody's perfect. Google is no exception. Try
>
> http://google.com/groups?selm=3EC108B8.980ACE53%40web.de&output=gplain
>

Ok :) Hope you are not compiling through google (if you know what it
is, I mean compiling) :)

>
> http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
>

Hmmmm... a couple of race conditions and also a memory leak bug. And
in fact, if you fix that bug you will get one more race condition,
unless you change the whole design. And I think the whole thing is
just wrong. Well, perhaps this code demonstrates something... other
than a working refcounted string.

--
Hovik Melikyan

Hovik Melikyan

unread,

Sep 24, 2003, 9:53:01 AM9/24/03

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3F714AAA...@web.de>...

The question was: is there a better way of doing safe init-once than
this:

int initialized = 0;

if (atomic_exchange(&initialized, 1) == 0)

// ...

_provided that you do have a portable atomic exchange function handy_,
you answered "no, this isn't the best way" and then you say atomic
exchange can't be portable (from the indirect link above). I'm afraid
this is not an answer.

--
Hovik Melikyan

Joe Seigh

unread,

Sep 24, 2003, 10:19:31 AM9/24/03

to

Hovik Melikyan wrote:
>
> Alexander Terekhov <tere...@web.de> wrote in message news:<3F717370...@web.de>...
>
> >

> > http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
> >
>
> Hmmmm... a couple of race conditions and also a memory leak bug. And
> in fact, if you fix that bug you will get one more race condition,
> unless you change the whole design. And I think the whole thing is
> just wrong. Well, perhaps this code demonstrates something... other
> than a working refcounted string.
>

That's not a memory leak bug if you are referring to what I think you
are. That's the String uncontrolled reference to the COW buffer thing
where you don't know when or if the uncontrolled reference ever gets
dropped. Basically you can't share the COW buffer at that point and
you kind of hack that info in the refcount state to deal with it. So
basically you are not looking at a pure refcounting solution.

Joe Seigh

Alexander Terekhov

unread,

Sep 24, 2003, 10:22:53 AM9/24/03

to

Hovik Melikyan wrote:
[...]

> > http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
> >
>
> Hmmmm... a couple of race conditions and also a memory leak bug. And
> in fact, if you fix that bug you will get one more race condition,
> unless you change the whole design. And I think the whole thing is
> just wrong. Well, perhaps this code demonstrates something... other
> than a working refcounted string.

Brave fellow. Respect, respect. Care to elaborate?

regards,
alexander.

Alexander Terekhov

unread,

Sep 24, 2003, 10:48:08 AM9/24/03

to

Hovik Melikyan wrote:
[...]

> int initialized = 0;
>
> if (atomic_exchange(&initialized, 1) == 0)
> // ...

This is crapola and hence I wrote "This isn't the best way" in reply
to your statement "On the original question: I still think the best
way to do such things is to use atomic exchange: <crapola>". Kapis?

regards,
alexander.

Hovik Melikyan

unread,

Sep 24, 2003, 1:11:57 PM9/24/03

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:3F717F36...@xemaps.com...

>
> > >
> > > There are things that you can do with CAS that you cannot do with
> > > exchange/inc/dec. Pushing and popping from a stack for example.
> >
> > I forgot to tell that I mean inc/dec that return values either before
> > or after the inc/dec operation (actually it doesn't matter), not just
> > negative/zero/positive result.
> >
> > If so, I can push to the stack using a plain atomic increment: it
> > gives me the stack pointer where I should store stuff and increments
> > the SP atomically. So it works fine without CAS.
> >
>
> I was thinking of linked list stacks, not contiguous stacks.

Ok, linked list stack add/remove, lock-free wherever possible. First, I'd do
pop with atomic exchange:

item* stack_ptr;

item* stack_pop()
{
return atomic_exchange(&stack_ptr, stack_ptr->next);
}

This implies that the top stack item can't be accessed and popped
concurrently, and I think that would be a result of poor design. In typical
applications you access data and then pop it off the stack synchronously.

As for safe stack_push() for linked lists, it is tough and I can't think of
any lock-free implementation [yet] which is without a loop. A loop (wait
until you get what you expected) is possible both with the CAS scenario and
with XCHG, btw, so even here I can get rid of CAS. Besides, I'd prefer
locking over looping in this case, since loops are less predictable.

What I'm trying to say in general is that CAS _forces_ you to loop until
there are no concurrent attempts to do the same thing. So CAS-based
scenarios can be quite inefficient if the point of conflict is crowded. I'd
say looping is `built' into the concept of compare-and-swap(-or-whatever)
and that's what I dislike about it.

> Besides,
> I don't think you can push on or pop off a contiguous stack without
> blocking. Something about atomically accessing the stack data and
> manipulating the stack pointer at the same time.

Pushing with XCHG like I proposed is absolutely safe, it'd never interfere
with pop or access operations. Also, you can make pop operation lock-free as
well, provided that what is said for stack_pop() above is true for
contiguous stacks too.

>
> Search google for "lock-free" and study some the algorithms that it comes
> up with.
>

Thanks, but I'd also appreciate if you point out a couple of specific (and
clean) examples.

--
Hovik Melikyan

Hovik Melikyan

unread,

Sep 24, 2003, 1:55:14 PM9/24/03

to

Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F71AA82...@xemaps.com>...

I know that I'm looking at something that was posted in reply to my
reference to a refcounted, copy-on-write string implementation
(lock-free and CAS-free, BTW). If his code is not a RC/COW thing
[despite its name] then it was not relevant as a response to mine.

There are race conditions in his code wherever you see this:

if ( 2 > IntAtomicGet( data_->refs ) ||
1 > IntAtomicDecrement( data_->refs ) )

like in String::~String(), or this:

switch ( IntAtomicGet( data_->refs ) ) {
case 01: ...
case 00: ...
default: ...

The conclusion is that an atomic refcount that only returns signness
or even worse, only zeroness is at least useless and at most
completely brain-d... ok, ok, ill-designed. As far as I remember Alex
was saying the same thing, though with regard to Windows
InterlockedXXX, not his own code. Anyway, I think we reached agreement
on this matter.

And one more thing: if you use atomic.get_value() it means most likely
your code is erroneous (with extremely rare exceptions which I think
can be cleanly described; Alex's String class is definitely not among
those exceptions).

--
Hovik Melikyan

Hovik Melikyan

unread,

Sep 24, 2003, 2:01:29 PM9/24/03

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3F71AEA8...@web.de>...
>
> Kapis?
>

Net :)

> regards,
> alexander.

Joe Seigh

unread,

Sep 24, 2003, 2:28:43 PM9/24/03

to

Hovik Melikyan wrote:
>
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:3F717F36...@xemaps.com...

> > I was thinking of linked list stacks, not contiguous stacks.
>
> Ok, linked list stack add/remove, lock-free wherever possible. First, I'd do
> pop with atomic exchange:
>
> item* stack_ptr;
>
> item* stack_pop()
> {
> return atomic_exchange(&stack_ptr, stack_ptr->next);
> }
>
> This implies that the top stack item can't be accessed and popped
> concurrently, and I think that would be a result of poor design. In typical
> applications you access data and then pop it off the stack synchronously.

So it's broken.

>
> As for safe stack_push() for linked lists, it is tough and I can't think of
> any lock-free implementation [yet] which is without a loop. A loop (wait
> until you get what you expected) is possible both with the CAS scenario and
> with XCHG, btw, so even here I can get rid of CAS. Besides, I'd prefer
> locking over looping in this case, since loops are less predictable.
>
> What I'm trying to say in general is that CAS _forces_ you to loop until
> there are no concurrent attempts to do the same thing. So CAS-based
> scenarios can be quite inefficient if the point of conflict is crowded. I'd
> say looping is `built' into the concept of compare-and-swap(-or-whatever)
> and that's what I dislike about it.

Usually the hardware implementation will provide fair forward progress for
CAS loops. They don't explicitly guarantee it usually but almost all
synchronization api's don't explicitly guarantee fair forward progress.
If not, in the case of a SMP hardware vendor with a death wish, you can
always provide a back off outer loop to reduce the contention level.

Some of the fetch and op implementations aren't any better since the loop
logic is just in microcode. Some of the new architectures have moved
fetch and op stuff to the memory controller, e.g. fetchadd on Itanium,
so they will have slightly better contention.

>
> > Besides,
> > I don't think you can push on or pop off a contiguous stack without
> > blocking. Something about atomically accessing the stack data and
> > manipulating the stack pointer at the same time.
>
> Pushing with XCHG like I proposed is absolutely safe, it'd never interfere
> with pop or access operations. Also, you can make pop operation lock-free as
> well, provided that what is said for stack_pop() above is true for
> contiguous stacks too.

Pushing with xchg requires two memory locations to be locked by the hardware.
AFAIK there are no machines with that apart from some ancient uniprocessor
Burroughs computers where that trick worked. But a lot of CISC instructions
that are atomic on uniprocessors, aren't on multiprocessors.

>
> >
> > Search google for "lock-free" and study some the algorithms that it comes
> > up with.
> >
>
> Thanks, but I'd also appreciate if you point out a couple of specific (and
> clean) examples.
>

Try http://citeseer.nj.nec.com/valois95lockfree.html for a start. CiteSeer
has a lot of stuff but their search sucks. Google works better.

Joe Seigh

Alexander Terekhov

unread,

Sep 24, 2003, 2:45:16 PM9/24/03

to

Hovik Melikyan wrote:
[...]

> There are race conditions in his code wherever you see this:
>
> if ( 2 > IntAtomicGet( data_->refs ) ||
> 1 > IntAtomicDecrement( data_->refs ) )
>
> like in String::~String(), or this:
>
> switch ( IntAtomicGet( data_->refs ) ) {
> case 01: ...
> case 00: ...
> default: ...

Both are fine. Read up on "basic" thread-safety (also known as
"thread safe as an int").

regards,
alexander.

Joe Seigh

unread,

Sep 24, 2003, 3:00:55 PM9/24/03

to

It should be "thread safe as char[]". Int is actually atomic under
some operations.

(to Hovik) In C++ land the convention is that thread-safe
doesn't mean atomic. You have to own the String object to
access it. This means the reference count cannot go to
zero except if you are dropping a reference by deleting
the String object. This avoids a lot of race conditions
that would otherwise exist if you were writing atomically
thread-safe objects.

Joe Seigh

Alexander Terekhov

unread,

Sep 24, 2003, 3:13:48 PM9/24/03

to

Hovik Melikyan wrote:
>
> Alexander Terekhov <tere...@web.de> wrote in message news:<3F71AEA8...@web.de>...
> >
> > Kapis?
> >
>
> Net :)

atomic_exchange(&initialized, 1) can't be used prior to init. The
next thread can go boom immediately after you exchange 0 with 1.

regards,
alexander.

--
"otvali, durnaya ptica,
uletaj s peredovoj.
zdes' tebe ne pozhevit'sa.
fuck you, voron! ja ne tvoj!"

-- http://www.stihi.ru/poems/2001/10/03-362.html

SenderX

unread,

Sep 24, 2003, 3:39:50 PM9/24/03

to

> Ok, linked list stack add/remove, lock-free wherever possible. First, I'd
do
> pop with atomic exchange:
>
> item* stack_ptr;
>
> item* stack_pop()
> {
> return atomic_exchange(&stack_ptr, stack_ptr->next);
> }

Nope.

Use cmpxchg for push, and cmpxchg8b for pop.

> Thanks, but I'd also appreciate if you point out a couple of specific (and
> clean) examples.

I've posted many.

Here is the code I use for a lock-free stack:

LockFree_IA32.asm
---------------------

.586
.MODEL FLAT, C
.CODE

; Lock-Free Stack Push
; extern void acLfStackPush( ac_LfStack* pStack, ac_LfNode* pNode );
acLfStackPush PROC

; Load the front
mov edx, [esp + 4]
mov eax, [edx]

; Load the new node
mov ecx, [esp + 8]

acLfStackPush_Retry:

; Prepare the Cas
mov [ecx], eax

; Try and link the new node
cmpxchg [edx], ecx

jne acLfStackPush_Retry

ret

acLfStackPush ENDP

; Lock-Free Stack Pop
; extern ac_LfNode* acLfStackPop( ac_LfStack* pStack );
acLfStackPop PROC

; Preserve
push esi
push ebx

; Load the front
mov esi, [esp + 12]
mov edx, [esi + 4]
mov eax, [esi]

acLfStackPop_Retry:

; Validate
test eax, eax

je acLfStackPop_Fail

; Prepare the Cas
mov ebx, [eax]
lea ecx, [edx + 1]

; Try and unlink the front
cmpxchg8b qword ptr [esi]

jne acLfStackPop_Retry

acLfStackPop_Fail:

; Restore
pop ebx
pop esi

ret

acLfStackPop ENDP

SenderX

unread,

Sep 24, 2003, 3:46:17 PM9/24/03

to

> Use cmpxchg for push, and cmpxchg8b for pop.

cmpxchg = CAS
cmpxchg8b = CAS2 ( not DCAS )

SenderX

unread,

Sep 24, 2003, 4:56:23 PM9/24/03

to

> Nope.
>
> Use cmpxchg for push, and cmpxchg8b for pop.

ll/sc can be used as well, and it eliminates the need for an ABA sequence
number.

PowerPC has one.

Mike Mowbray

unread,

Sep 24, 2003, 9:37:33 PM9/24/03

to

Hovik Melikyan wrote:

> What I'm trying to say in general is that CAS _forces_ you to
> loop until there are no concurrent attempts to do the same thing.
> So CAS-based scenarios can be quite inefficient if the point of
> conflict is crowded. I'd say looping is `built' into the concept
> of compare-and-swap(-or-whatever) and that's what I dislike
> about it.

Youch! Just a minute... In practical cases, the looping is not
a problem because mostly the thread succeeds on the first try.
And even if it has to try a second time, that's rare. Also, if
you're comparing mutex-locking against a (well-written) CAS loop,
the CAS loop will win easily every time. Modern mutex-lock code
tends to be adaptive: if thread-A owning the mutex is on-CPU, then
thread-B spins trying to acquire the mutex and only sleeps if
thread-A goes off-CPU. Try some performance tests (on a multi-CPU
machine) of a CAS-based atomic-increment versus a mutex-based
increment and you'll see that the CAS version gets exponentially
better as the number of contending threads increases. Even with
a single thread, the CAS version is always better in my experience.

> [...] I'd also appreciate if you point out a couple of specific
> (and clean) examples [of conditional-CAS applications]

OK, how about this...

Imagine a multi-threaded database server engine. It has a pool
of threads that handle transactions, involving disk access through
a shared cache of blocks. We wish to know dynamically (i.e: in
production) how well our cache algorithms are performing. As one
part of this, we wish to take samples every minute (say) of the
maximum time that any thread was kept waiting by the shared cache,
(i.e: to help diagnose and improve any long-delay transaction
problems.)

To implement this measurement, each transaction thread notes the
(hi-res) time in nanosecs before and after each access, and assigns
the difference to a global integer atomically but *only* if the
difference is *greater* than the value currently stored in the
global integer. Independently, every minute a "sampler" thread wakes
up and performs an atomic exchange to grab the global integer's value
and replace it with 0, before saving the sampled value somehow.

This gives a cheap way of keeping a simple statistic without less
overhead than a mutex-based solution. I doubt you could achieve
this with merely atomic inc/dec.

- MikeM.

SenderX

unread,

Sep 25, 2003, 12:24:13 AM9/25/03

to

> Youch! Just a minute... In practical cases, the looping is not
> a problem because mostly the thread succeeds on the first try.

Yep.

> And even if it has to try a second time, that's rare.

"Mostly", taking SMP into consideration. CAS is fair though...

I haven't seen live-lock on properly built CAS sections......

> Also, if
> you're comparing mutex-locking against a (well-written) CAS loop,
> the CAS loop will win easily every time.

Yep.

> Also, if
> you're comparing mutex-locking against a (well-written) CAS loop,
> the CAS loop will win easily every time.

For sure.

> Try some performance tests (on a multi-CPU
> machine) of a CAS-based atomic-increment versus a mutex-based
> increment and you'll see that the CAS version gets exponentially
> better as the number of contending threads increases. Even with
> a single thread, the CAS version is always better in my experience.

Yep.

> This gives a cheap way of keeping a simple statistic without less
> overhead than a mutex-based solution. I doubt you could achieve
> this with merely atomic inc/dec.

Yep...

SenderX

unread,

Sep 25, 2003, 12:56:29 AM9/25/03

to

> > Also, if
> > you're comparing mutex-locking against a (well-written) CAS loop,
> > the CAS loop will win easily every time.
>
> Yep.
>
>
> > Also, if
> > you're comparing mutex-locking against a (well-written) CAS loop,
> > the CAS loop will win easily every time.
>
> For sure.

Sorry about that double quote.

=)

Hovik Melikyan

unread,

Sep 25, 2003, 1:39:00 AM9/25/03

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3F71ECE...@web.de>...

>
> atomic_exchange(&initialized, 1) can't be used prior to init. The
> next thread can go boom immediately after you exchange 0 with 1.
>

It won't go boom, it will see 1. All threads other than the first one
will exchange 1 with 1.

> regards,
> alexander.
>
> --
> "otvali, durnaya ptica,
> uletaj s peredovoj.
> zdes' tebe ne pozhevit'sa.
> fuck you, voron! ja ne tvoj!"
>

He-he, "Nevermore!", kak govoritsya. :))

--
Hovik Melikyan

Hovik Melikyan

unread,

Sep 25, 2003, 2:18:29 AM9/25/03

to

Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F71EC71...@xemaps.com>...

> >
> > Hovik Melikyan wrote:
> > [...]
> > > There are race conditions in his code wherever you see this:
> > >
> > > if ( 2 > IntAtomicGet( data_->refs ) ||
> > > 1 > IntAtomicDecrement( data_->refs ) )
> > >
> > > like in String::~String(), or this:
> > >
> > > switch ( IntAtomicGet( data_->refs ) ) {
> > > case 01: ...
> > > case 00: ...
> > > default: ...
> >
>

> (to Hovik) In C++ land the convention is that thread-safe
> doesn't mean atomic. You have to own the String object to
> access it. This means the reference count cannot go to
> zero except if you are dropping a reference by deleting
> the String object.

Absolutely. But does it explain the race conditions [in Alex's code]
between getting the refcount value and inc/decrementing it depending
on that value? His String objects are not shared and can be accessed
only from an owning thread, but StringBuf is shared, isn't it?

--
Hovik Melikyan

Mike Mowbray

unread,

Sep 25, 2003, 2:51:56 AM9/25/03

to

Hovik Melikyan wrote:

> [...] is there a better way of doing safe
> init-once than this:
> [...]

Let me flesh out your example a little more:

SomeClass the_global;
int initialized = 0;

if (atomic_exchange(&initialized, 1) == 0)

{
...initialize <the_global>...
}

...use <the_global>, now that it's been
initialized...

If that's what you meant, then unfortunately it's not
threadsafe. Thread#1 comes along, gets a result of 0
from atomic_exchange() and inits <the_global>. But
Thread#2 could come along, get a non-zero result from
atomic_exchange() and jump straight to the line
"...use <the_global>..." before Thread#1 has finished
init'ing it.

I suspect that's why Alexander said "crapola".

- MikeM.

SenderX

unread,

Sep 25, 2003, 3:08:32 AM9/25/03

to

> Absolutely. But does it explain the race conditions [in Alex's code]
> between getting the refcount value and inc/decrementing it depending
> on that value?

atomic_ptr solves all...

> His String objects are not shared and can be accessed
> only from an owning thread, but StringBuf is shared, isn't it?

atomic_ptr my friend...

http://groups.google.com/groups?&selm=3DE393B0.39A4EEC1%40xemaps.com

atomic_ptr..................

wonderful code!

P.S.

Joe has posted membars for this as well!

Hovik Melikyan

unread,

Sep 25, 2003, 5:15:22 AM9/25/03

to

Mike Mowbray <mi...@despammed.com> wrote in message news:<3F7246C8...@despammed.com>...

> Hovik Melikyan wrote:
>
> > What I'm trying to say in general is that CAS _forces_ you to
> > loop until there are no concurrent attempts to do the same thing.
> > So CAS-based scenarios can be quite inefficient if the point of
> > conflict is crowded. I'd say looping is `built' into the concept
> > of compare-and-swap(-or-whatever) and that's what I dislike
> > about it.
>
> Youch! Just a minute... In practical cases, the looping is not
> a problem because mostly the thread succeeds on the first try.
> And even if it has to try a second time, that's rare. Also, if
> you're comparing mutex-locking against a (well-written) CAS loop,
> the CAS loop will win easily every time. Modern mutex-lock code
> tends to be adaptive: if thread-A owning the mutex is on-CPU, then
> thread-B spins trying to acquire the mutex and only sleeps if
> thread-A goes off-CPU. Try some performance tests (on a multi-CPU
> machine) of a CAS-based atomic-increment versus a mutex-based
> increment and you'll see that the CAS version gets exponentially
> better as the number of contending threads increases. Even with
> a single thread, the CAS version is always better in my experience.
>

Agreed, CAS may be better against mutex, but we were talking about CAS
vs. something that doesn't loop at all. Loops (be it your own
`handicrafted' loop or in the mutex implementation) waste time on
uniprocessor systems and it is not any better on SMP systems. You have
100 threads on 2 CPU's which means it's just 50/50 whether you will or
you won't waste time by spinning around the condition.

(There are smarter mutex implementations that are trying to take
advantage of SMP, btw, and I don't see anything like that in, f.ex.,
SenderX's code.)

>
>
> > [...] I'd also appreciate if you point out a couple of specific
> > (and clean) examples [of conditional-CAS applications]
>
> OK, how about this...
>

> To implement this measurement, each transaction thread notes the
> (hi-res) time in nanosecs before and after each access, and assigns
> the difference to a global integer atomically but *only* if the
> difference is *greater* than the value currently stored in the
> global integer. Independently, every minute a "sampler" thread wakes
> up and performs an atomic exchange to grab the global integer's value
> and replace it with 0, before saving the sampled value somehow.
>

You described a task that requires comparing and setting _literally_.
And what if I need average, not max? Or a quadratic bias? :) Are you
saying that there should be a CPU instruction for doing that too?

--
Hovik Melikyan

Alexander Terekhov

unread,

Sep 25, 2003, 5:33:20 AM9/25/03

to

Hovik Melikyan wrote:
>
> Joe Seigh <jsei...@xemaps.com> wrote in message news:<3F71EC71...@xemaps.com>...
> > >
> > > Hovik Melikyan wrote:
> > > [...]
> > > > There are race conditions in his code wherever you see this:
> > > >
> > > > if ( 2 > IntAtomicGet( data_->refs ) ||
> > > > 1 > IntAtomicDecrement( data_->refs ) )
> > > >
> > > > like in String::~String(), or this:
> > > >
> > > > switch ( IntAtomicGet( data_->refs ) ) {
> > > > case 01: ...
> > > > case 00: ...
> > > > default: ...
> > >
> >
> > (to Hovik) In C++ land the convention is that thread-safe
> > doesn't mean atomic. You have to own the String object to
> > access it. This means the reference count cannot go to
> > zero except if you are dropping a reference by deleting
> > the String object.
>
> Absolutely. But does it explain the race conditions [in Alex's code]
> between getting the refcount value and inc/decrementing it depending
> on that value?

What race conditions? Show me some scenarios illustrating what
you mean saying "race conditions".

> His String objects are not shared and can be accessed
> only from an owning thread, but StringBuf is shared, isn't it?

First off, strings object can be shared "just like ints". If you
only call *const* stuff on them (and it's a good idea to make all
such immutable instances const qualified), then you don't need to
worry about any sync (except lifetime management, of course). If
you call any mutators (non-const stuff), then any access to such
(non-const; mutable) shared string object needs to by synchronized
with respect to "writes" (non-const stuff), but you can still have
concurrent "reads" (you'd need a read-write lock or something like
that). And, of course, you'll have to "pray" that those mysterious
"memory locations" won't stay in your way (breaking everything...
false-sharing aside for a moment).

regards,
alexander.

Alexander Terekhov

unread,

Sep 25, 2003, 6:04:25 AM9/25/03

to

SenderX wrote:
>
> > Absolutely. But does it explain the race conditions [in Alex's code]
> > between getting the refcount value and inc/decrementing it depending
> > on that value?
>
> atomic_ptr solves all...

Joe's atomic_ptr can solve only one "problem". The "problem" of
providing "strong" (atomic) AND "non-blocking" (do it *without*
locks) thread-safety with respect to operations on smart pointers
managing lifetime of objects "owned" (ownership can be either
exclusive or shared) by the smart pointer instances. That's it.

regards,
alexander.

Hovik Melikyan

unread,

Sep 25, 2003, 6:14:39 AM9/25/03

to

Mike Mowbray <mi...@despammed.com> wrote in message news:<3F729077...@despammed.com>...

> Hovik Melikyan wrote:
>
> > [...] is there a better way of doing safe
> > init-once than this:
> > [...]
>
> Let me flesh out your example a little more:
>
> SomeClass the_global;
> int initialized = 0;
>
> if (atomic_exchange(&initialized, 1) == 0)
> {
> ...initialize <the_global>...
> }
>

Ok, I surrender! :)

But not completely:

SomeClass* the_global;
int initialized;

while (!the_global)
{

if (atomic_exchange(&initialized, 1) == 0)
{

the_global = new SomeClass; // maybe atomic_set()
}
}

I think the loop above is not bad since the chances that you'll be
caught are very small (depending on the speed of the initialization
part, but that would be a drawback in any algorithm.) After the global
is initialized nobody would even reach atomic_exchange(), they'd just
pass one single comparison (!the_global).

>
> I suspect that's why Alexander said "crapola".
>

That was a very descriptive message, you know...

--
Hovik Melikyan

David Butenhof

unread,

Sep 25, 2003, 6:48:34 AM9/25/03

to

Alexander Terekhov wrote:

> For some reason, they just fail to realize that what they
> have currently, can (and ought to) be "expressed" along the lines of:

Sometimes, Alexander, I think you're a fairly smart person who's just not
very good at expressing yourself.

Other times... well, let's just say that I think the person who "fails to
realize" is a bit closer to you than you think.

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

Joe Seigh

unread,

Sep 25, 2003, 6:49:52 AM9/25/03

to

Hovik Melikyan wrote:
>
> Agreed, CAS may be better against mutex, but we were talking about CAS
> vs. something that doesn't loop at all. Loops (be it your own
> `handicrafted' loop or in the mutex implementation) waste time on
> uniprocessor systems and it is not any better on SMP systems. You have
> 100 threads on 2 CPU's which means it's just 50/50 whether you will or
> you won't waste time by spinning around the condition.

There's no spin waiting involved in using CAS to implement interlocked
updates. There's no waiting for a value, it merely grabs the most
recent value.

>
> (There are smarter mutex implementations that are trying to take
> advantage of SMP, btw, and I don't see anything like that in, f.ex.,
> SenderX's code.)

AFAIK that's because he hasn't implemented mutexes. Lock-free generally
involves avoiding mutexes and waiting.

Joe Seigh

David Butenhof

unread,

Sep 25, 2003, 6:53:56 AM9/25/03

to

SenderX wrote:

>> What about the compiler... If the compiler were to interfere with the
>> ordering of instructions, odd things might happen.
>
> You could just code the whole thing in assembler. You don't have to worry
> about your assembler instructions get reordered by the compiler.

Well, some assemblers "optimize", too, but there's always a way to turn it
off.

In any case, my point was that you can't do atomic initialization in
portable C code without putting the whole sequence inside an unconditional
lock.

Most C compilers provide enough machine-specific support (asms, or builtins)
that you CAN do something like this in C rather than resorting to assembly.
But it's still not portable.

Casper H.S. Dik

unread,

Sep 25, 2003, 7:04:56 AM9/25/03

to

Joe Seigh <jsei...@xemaps.com> writes:

>AFAIK that's because he hasn't implemented mutexes. Lock-free generally
>involves avoiding mutexes and waiting.

Lock-free moves the waiting/locking completely to the hardware so that
is isn't obvious from the software perspective; it is not magically cheap.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Joe Seigh

unread,

Sep 25, 2003, 7:34:03 AM9/25/03

to

"Casper H.S. Dik" wrote:
>
> Joe Seigh <jsei...@xemaps.com> writes:
>
> >AFAIK that's because he hasn't implemented mutexes. Lock-free generally
> >involves avoiding mutexes and waiting.
>
> Lock-free moves the waiting/locking completely to the hardware so that
> is isn't obvious from the software perspective; it is not magically cheap.

It isn't magic? :)

The waiting that I am refering to is waiting for another thread to set
a particular value vs. the most recent value no matter what it is. Interlocked
instructions aren't instrinsic to this. You could do lock-free with
distributed algorigthms. RCU and hazard pointers are partly distributed
algorithms.

Anytime you have synchronization or coordination among threads you are going
to have some overhead. The trick is to do less of it or be more efficient
at it when you do.

Joe Seigh

Alexander Terekhov

unread,

Sep 25, 2003, 8:24:07 AM9/25/03

to

David Butenhof wrote:
>
> Alexander Terekhov wrote:
>
> > For some reason, they just fail to realize that what they
> > have currently, can (and ought to) be "expressed" along the lines of:
>
> Sometimes, Alexander, I think you're a fairly smart person who's just not
> very good at expressing yourself.
>
> Other times... well, let's just say that I think the person who "fails to
> realize" is a bit closer to you than you think.

typedef std::thread_specific_ptr<void, __c_TSD_cleanup> * pthread_key_t;

regards,
alexander.

Attila Feher

unread,

Sep 25, 2003, 8:29:16 AM9/25/03

to

Not this "__c_TSD_cleanup" in C++. Any name with a double underscore in it
is reserved. So unless you whish for a collision with name mangling...

--
Attila aka WW

Alexander Terekhov

unread,

Sep 25, 2003, 8:49:39 AM9/25/03

to

Attila Feher wrote:
[...]

> Not this "__c_TSD_cleanup" in C++. Any name with a double underscore in it
> is reserved.

It's reserved for implementation, Attila. pthread_key_t is the
implementation thing... just like:

typedef std::aligned_storage<std::mutex> pthread_mutex_t;

with its corresponding PTHREAD_MUTEX_INITIALIZER "magic". Anyway, in the
name of pedantry,

typedef void * __TSD_key_t; // whatever

int __tsd_key_create(
__TSD_key_t *
, void (* dtor)(void *, void *), void *
) throw();

int __tsd_key_delete(
__TSD_key_t
) throw();

int __tsd_getspecific(
__TSD_key_t
, void * *
) throw();

int __tsd_setspecific(
__TSD_key_t
, void *
) throw();

/* "native" release/dispose/reset aside for a moment */

namespace std {

struct no_cleanup {
void operator()(void *) {
// NOOP
}
};

template<typename cleanup>
bool no_TSD_cleanup(const cleanup &) throw() {
return false;
}

template<>
bool no_TSD_cleanup(const no_cleanup &) throw() {
return true;
}

template<typename T, typename cleanup>
class thread_specific_ptr : cleanup /* noncopyable */ {

__TSD_key_t __key;

static void dtor(void * data, void * THIS) {
static_cast<thread_specific_ptr*>(THIS)->
operator()(static_cast<T*>(data));
}

public:

thread_specific_ptr()
throw(std::bad_alloc, std::try_again);

thread_specific_ptr(const cleanup&)
throw(std::bad_alloc, std::try_again);

~thread_specific_ptr() throw();

T * get() throw();

void set(T *) throw(std::bad_alloc);

T * operator->() throw();

T * release() throw();

void dispose() throw();

void reset(T *) throw(std::bad_alloc);

};

}

Now, here's its possible implementation:

namespace std {

void throw_errno_exception(int error) { /* ... */ }

template<typename T, typename cleanup>
thread_specific_ptr<T, cleanup>::thread_specific_ptr()
throw(std::bad_alloc, std::try_again) {
int status = __tsd_key_create(&__key, no_TSD_cleanup(
*static_cast<cleanup*> (this)) ? 0 : &dtor, this);
if (status) throw_errno_exception(status);
}

template<typename T, typename cleanup>
thread_specific_ptr<T, cleanup>::thread_specific_ptr(
const cleanup& __cleanup) throw(std::bad_alloc,
std::try_again) : cleanup(__cleanup) {
int status = __tsd_key_create(&__key, no_TSD_cleanup(
__cleanup) ? 0 : &dtor, this);
if (status) throw_errno_exception(status);
}

template<typename T, typename cleanup>
thread_specific_ptr<T, cleanup>::~thread_specific_ptr()
throw() {
int status = __tsd_key_delete(__key);
if (status) throw_errno_exception(status);
}

template<typename T, typename cleanup>
T * thread_specific_ptr<T, cleanup>::get() throw() {
void * p;
int status = __tsd_getspecific(__key, &p);
if (status) throw_errno_exception(status);
return static_cast<T *>(p);
}

template<typename T, typename cleanup>
void thread_specific_ptr<T, cleanup>::set(T* p)
throw(std::bad_alloc) {
int status = __tsd_setspecific(__key, p);
if (status) throw_errno_exception(status);
}

template<typename T, typename cleanup>
T * thread_specific_ptr<T, cleanup>::release() throw() {
void * p = get();
set(0);
return static_cast<T *>(p);
}

template<typename T, typename cleanup>
void thread_specific_ptr<T, cleanup>::dispose() throw() {
this->cleanup::operator()(release());
}

template<typename T, typename cleanup>
void thread_specific_ptr<T, cleanup>::reset(T* p)
throw(std::bad_alloc) {
void * old_p = get();
if (old_p != p) {
set(p);
this->cleanup::operator()(old_p);
}
}

}

// Possible implementation for <cthread>
#include <thread>

namespace std {

extern "C" typedef void (* __c_TSD_dtor_t)(void *);
extern "C++" typedef void (* __cpp_TSD_dtor_t)(void *);

struct __cthread_TSD_cleanup {

__cthread_TSD_cleanup(__c_TSD_dtor_t __c_TSD_dtor_) :
__func(__c_TSD_dtor_ ? c : null),
__c_TSD_dtor(__c_TSD_dtor_) {
}

__cthread_TSD_cleanup(__cpp_TSD_dtor_t __cpp_TSD_dtor_) :
__func(__cpp_TSD_dtor_ ? cpp : null),
__cpp_TSD_dtor(__cpp_TSD_dtor_) {
}

void operator()(void * __data) {
if (__data) switch(__func) {
case c: __c_TSD_dtor(__data); break;
case cpp: __cpp_TSD_dtor(__data); break;
}
}

enum { null, c, cpp } __func;

union {
__c_TSD_dtor_t __c_TSD_dtor;
__cpp_TSD_dtor_t __cpp_TSD_dtor;
};

};

template<>
bool no_TSD_cleanup(const __cthread_TSD_cleanup & __cleanup)
throw() {
return __cleanup.__func == __cthread_TSD_cleanup::null;
}

typedef std::thread_specific_ptr<void, __cthread_TSD_cleanup> *
pthread_key_t;

// try { throw; } catch... "idiom"
int __translate_exception_to_error_code() throw();

extern "C" int pthread_key_create(pthread_key_t * key,
void ( * dtor)(void *)) throw() {
try {
// can throw "shall fail" stuff only (std::bad_alloc and
// std::try_again)
*key = new std::thread_specific_ptr<void,
__cthread_TSD_cleanup>(__cthread_TSD_cleanup(dtor));
// "may fail" shall be caught in the std::unexpected() handler
// for details... please click here: <http://tinyurl.com/cu9k>
}
catch(...) {
return __translate_exception_to_error_code();
}
return 0;
}

extern "C++" int pthread_key_create(pthread_key_t * key,
void ( * dtor)(void *)) throw() {
try {
// can throw "shall fail" stuff only (std::bad_alloc and
// std::try_again)
*key = new std::thread_specific_ptr<void,
__cthread_TSD_cleanup>(__cthread_TSD_cleanup(dtor));
// "may fail" shall be caught in the std::unexpected() handler
// for details... please click here: <http://tinyurl.com/cu9k>
}
catch(...) {
return __translate_exception_to_error_code();
}
return 0;
}

extern "C" int pthread_key_delete(pthread_key_t key) throw() {
// "may fail" shall be caught in the std::unexpected() handler
// for details... please click here: <http://tinyurl.com/cu9k>
delete key;
return 0;
}

extern "C" void * pthread_getspecific(pthread_key_t key) throw() {
// "may fail" shall be caught in the std::unexpected() handler
// for details... please click here: <http://tinyurl.com/cu9k>
return key->get();
}

extern "C" int pthread_setspecific(pthread_key_t key,
const void * p) throw(std::bad_alloc) {
try {
// can throw "shall fail" stuff only (std::bad_alloc)
key->set(const_cast<void *>(p));
// "may fail" shall be caught in the std::unexpected() handler
// for details... please click here: <http://tinyurl.com/cu9k>
}
catch(...) {
return __translate_exception_to_error_code();
}
return 0;
}

extern "C" int pthread_resetspecific(pthread_key_t key,
const void * p) throw(std::bad_alloc) {
try {
// can throw "shall fail" stuff only (std::bad_alloc)
key->reset(const_cast<void *>(p));
// "may fail" shall be caught in the std::unexpected() handler
// for details... please click here: <http://tinyurl.com/cu9k>
}
catch(...) {
return __translate_exception_to_error_code();
}
return 0;
}

extern "C" void * pthread_releasespecific(pthread_key_t key) throw() {
// "may fail" shall be caught in the std::unexpected() handler
// for details... please click here: <http://tinyurl.com/cu9k>
return key->release();
}

extern "C" void pthread_disposespecific(pthread_key_t key) throw() {
// "may fail" shall be caught in the std::unexpected() handler
// for details... please click here: <http://tinyurl.com/cu9k>
return key->dispose();
}

}

Kapis?

regards,
alexander.

Hovik Melikyan

unread,

Sep 25, 2003, 9:45:29 AM9/25/03

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3F71E63C...@web.de>...
> Hovik Melikyan wrote:
> [...]

>
> Both are fine. Read up on "basic" thread-safety (also known as
> "thread safe as an int").
>

void String::Append( char c ) {
A: switch ( IntAtomicGet( data_->refs ) ) {
case 00: pthread_refcount_setvalue( &data_->refs, 1 );
case 01: data_->Reserve( data_->used+1 ); break;
B: default: data_ = new StringBuf( AppendStringBufPtr( data_ ) );
} data_->buf[data_->used++] = c;
}

One race condition is here, between A and B: you may end up by copying
a string buffer when it is no longer necessary. If refs > 1 after A,
it may become 1 before you reach B.

This is not fatal, of course, just extra unnecessary copying of string
data. But it is a race condition that results in a performance
penalty.

I think I saw one more that results in memory leaks but am having
trouble finding it again :(

--
Hovik Melikyan

Alexander Terekhov

unread,

Sep 25, 2003, 10:27:12 AM9/25/03

to

Hovik Melikyan wrote:
>
> Alexander Terekhov <tere...@web.de> wrote in message news:<3F71E63C...@web.de>...
> > Hovik Melikyan wrote:
> > [...]
> >
> > Both are fine. Read up on "basic" thread-safety (also known as
> > "thread safe as an int").
> >
>
> void String::Append( char c ) {
> A: switch ( IntAtomicGet( data_->refs ) ) {
> case 00: pthread_refcount_setvalue( &data_->refs, 1 );
> case 01: data_->Reserve( data_->used+1 ); break;
> B: default: data_ = new StringBuf( AppendStringBufPtr( data_ ) );
> } data_->buf[data_->used++] = c;
> }
>
> One race condition is here, between A and B: you may end up by copying
> a string buffer when it is no longer necessary. If refs > 1 after A,
> it may become 1 before you reach B.
>
> This is not fatal, of course, just extra unnecessary copying of string
> data. But it is a race condition that results in a performance
> penalty.

There's no way to get rid of that race... and IntAtomicGet(data_->refs)
helps to AVOID making unnecessary copy that you'd have to make without
looking at reference count prior to copying. The older copy (note that
it can be smaller than "needed") is discarded by the ctor:

inline StringBuf::StringBuf( AppendStringBufPtr other )
: len(0), used(0), buf(0)
{
pthread_refcount_init( &refs, 1 );
Reserve( max( other->len, other->used+1 ) );
memcpy( buf, other->buf, used = other->used );
if ( 0 == IntAtomicDecrement( other->refs ) )
other.destruct();
}

(well, to tell the truth, my "production version" is sort of a bit
more sophisticated with respect to choosing which copy to discard)

>
> I think I saw one more that results in memory leaks but am having
> trouble finding it again :(

Try harder. ;-)

regards,
alexander.

SenderX

unread,

Sep 25, 2003, 6:06:09 PM9/25/03

to

> Agreed, CAS may be better against mutex, but we were talking about CAS
> vs. something that doesn't loop at all. Loops (be it your own
> `handicrafted' loop or in the mutex implementation) waste time on
> uniprocessor systems and it is not any better on SMP systems.

Your making a classic mistake. Lock-free CAS sections are not spin-locks...

Each time a CAS section loops, is means that another thread has made forward
progress.

> (There are smarter mutex implementations that are trying to take
> advantage of SMP, btw, and I don't see anything like that in, f.ex.,
> SenderX's code.)

Probably because I have not implemented a Mutex API.

Joe has posted an elegant lock-free sempahore, I have posted one as well.
( there very different )

You can use those for a fast-pathed ( SMP friendly ) thread and/or process
mutex. The lock-free semaphores can exist in shared-memory.

P.S.

I am thinking about posting my new read/write mutex, that allows for
lock-free reads, and a fast-pathed write.

--
The designer of the experimental, SMP and HyperThread friendly, AppCore
library.

http://AppCore.home.comcast.net
"Hovik Melikyan" <ho...@melikyan.com> wrote in message
news:ad206972.03092...@posting.google.com...

Mike Mowbray

unread,

Sep 25, 2003, 9:21:13 PM9/25/03

to

Hovik Melikyan asked:

>>> [...] I'd also appreciate if you point out a couple of
>>> specific (and clean) examples
>>> [of conditional-CAS applications]

I ventured:

>> OK, how about this...
>>
>> To implement this measurement, each transaction thread notes the
>> (hi-res) time in nanosecs before and after each access, and assigns
>> the difference to a global integer atomically but *only* if the
>> difference is *greater* than the value currently stored in the
>> global integer. Independently, every minute a "sampler" thread wakes
>> up and performs an atomic exchange to grab the global integer's value
>> and replace it with 0, before saving the sampled value somehow.

Hovik Melikyan objected:

> You described a task that requires comparing and setting _literally_.

Sorry, I thought that was what you were asking for - a example that
involved CAS with conditions. But maybe I misunderstood what you
wanted.

> And what if I need average, not max? Or a quadratic bias? :)
> Are you saying that there should be a CPU instruction for
> doing that too?

Zounds! I have no clue how you got the idea I was saying
anything like that.

I now return to lurking.

- MikeM.

Mike Mowbray

unread,

Sep 25, 2003, 9:38:30 PM9/25/03

to

Hovik Melikyan wrote:

> SomeClass* the_global;
> int initialized;
>
> while (!the_global)
> {
> if (atomic_exchange(&initialized, 1) == 0)
> {
> the_global = new SomeClass; // maybe atomic_set()
> }
> }
>
> I think the loop above is not bad since the chances
> that you'll be caught are very small (depending on
> the speed of the initialization part, but that would
> be a drawback in any algorithm.) After the global is
> initialized nobody would even reach atomic_exchange(),
> they'd just pass one single comparison (!the_global).

Let's see... you're relying on construction of the
"new SomeClass" reaching global visibility before the
assignment to <the_global>. But that's exactly the sort
of thing which is invalid to assume on a non-TSO machine.

So in fact, the chances you'll be caught are low in
development and testing, but very high in production.
You're racing your threads - which is always asking
for trouble.

The crucial thing to remember about memory visibility
is that the perceived *sequence* of two operations is
not guaranteed to be the same as seen from two different
threads unless membars are used to guarantee it. In
that case, things like "speed of initialization" are
*not* "a drawback in any algorithm".

- MikeM.

David Schwartz

unread,

Sep 25, 2003, 9:42:47 PM9/25/03

to

"Hovik Melikyan" <ho...@melikyan.com> wrote in message
news:ad206972.03092...@posting.google.com...

> The question was: is there a better way of doing safe init-once than
> this:
>
> int initialized = 0;

>
> if (atomic_exchange(&initialized, 1) == 0)

> // ...
>
> _provided that you do have a portable atomic exchange function handy_,
> you answered "no, this isn't the best way" and then you say atomic
> exchange can't be portable (from the indirect link above). I'm afraid
> this is not an answer.

For one thing, you just set 'initialized' to one even though you didn't
do the initialization.

DS

David Schwartz

unread,

Sep 25, 2003, 9:46:54 PM9/25/03

to

"Hovik Melikyan" <ho...@melikyan.com> wrote in message
news:ad206972.03092...@posting.google.com...

> Ok, I surrender! :)

>
> But not completely:
>
> SomeClass* the_global;
> int initialized;
>
> while (!the_global)
> {
> if (atomic_exchange(&initialized, 1) == 0)
> {
> the_global = new SomeClass; // maybe atomic_set()
> }
> }

Nope, still doesn't work. There's no guarantee all of the effects of
'new SomeClass' are written back from registers to main memory before the
assignment to 'the_global' takes place. So if a second thread comes along
and accesses 'the_global', it may see it partially initialized. You can only
rely upon guarantees you actually have.

DS

Frank Cusack

unread,

Sep 26, 2003, 12:25:32 AM9/26/03

to

On Mon, 22 Sep 2003 07:26:23 -0400 David Butenhof <David.B...@hp.com> wrote:
> The only PORTABLE solution is to use proper synchronization around the whole
> block -- e.g., locking a mutex before you test the value of initialized,
> and unlocking it only after writing initialized.

So acquiring/releasing a lock implies a membar?

/fc

Attila Feher

unread,

Sep 26, 2003, 2:21:42 AM9/26/03

to

Alexander Terekhov wrote:
> Attila Feher wrote:
> [...]
>> Not this "__c_TSD_cleanup" in C++. Any name with a double
>> underscore in it is reserved.
>
> It's reserved for implementation, Attila. pthread_key_t is the
> implementation thing... just like:

[SNIP]

Double underscore names are reserved for the name mangling. That is a
practical fact. If you want to clash with the names the C++ compilers
generate, go ahead.

[__c_TSD_cleanup => char TSD::cleanup()]

--
Attila aka WW

Hovik Melikyan

unread,

Sep 26, 2003, 3:35:52 AM9/26/03

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3F72B660...@web.de>...

>
> First off, strings object can be shared "just like ints". If you
> only call *const* stuff on them (and it's a good idea to make all
> such immutable instances const qualified), then you don't need to
> worry about any sync (except lifetime management, of course). If
> you call any mutators (non-const stuff), then any access to such
> (non-const; mutable) shared string object needs to by synchronized
> with respect to "writes" (non-const stuff), but you can still have
> concurrent "reads" (you'd need a read-write lock or something like
> that). And, of course, you'll have to "pray" that those mysterious
> "memory locations" won't stay in your way (breaking everything...
> false-sharing aside for a moment).
>

A question: how a string buffer can become shared between threads
(that is, shared between string objects owned by different threads)?
In other words, how safe is String::String(const String&) ? If you
guarantee safety then this constructor inevitably will be called at
some point. I think it needs some kind of a usage protocol, besides
trivial locking, of course. It may be obvious to you (and even to me
:) but you kind of hide the COW mechanism from the user so you should
describe how copy ctors and assignments should be used in an MT
environment.

--
Hovik Melikyan

Alexander Terekhov

unread,

Sep 26, 2003, 3:54:22 AM9/26/03

to

Acquiring a {write} lock on a {rw-}mutex (recursive case aside),
or on a semaphore implies "hoist load" + "hoist store" barrier(s).

Releasing a {write} lock on a {rw-}mutex (recursive case aside),
or on a semaphore implies "sink load" + "sink store" barrier(s).

The only barriers that are needed for read locking are "hoist
load" (for acquire) and "sink load" (for release), recursive
case aside, of course.

regards,
alexander.

P.S. No matter what the XBD says, signaling a condition variable
doesn't imply any barriers whatsoever (especially compiler-wise)
because no {strictly} conforming application can detect the lack
of barriers and/or rely upon their presence (I mean with respect
to pthread_cond_signal() and pthread_cond_broadcast()).

--
typedef std::aligned_storage<std::mutex> pthread_mutex_t; // POD
extern "C" int pthread_mutex_destroy(pthread_mutex_t * m) throw()
{
m->object().~mutex();

// "may fail" shall be caught in the std::unexpected() handler

return 0;
}

Alexander Terekhov

unread,

Sep 26, 2003, 4:28:37 AM9/26/03

to

Hovik Melikyan wrote:
[...]

> A question: how a string buffer can become shared between threads
> (that is, shared between string objects owned by different threads)?

You can have multiple copies all sharing the same representation...
or just have a shared string object ("just like int").

> In other words, how safe is String::String(const String&) ?

Same "basic" safety as with "struct Int { int i; }" objects (I mean
implicitly-declared/defined copy constructor Int::Int(const Int & )
that performs a memberwise copy of Int's subobjects -- "int i", in
this case). See the XBD 4.10*** rules to understand this "basic"
safety (ignore the NOT defined term "memory location" and, instead,
simply think of it as "distinct object": <http://tinyurl.com/2tua>).

regards,
alexander.

***) <http://tinyurl.com/2jb9>

Hovik Melikyan

unread,

Sep 26, 2003, 6:23:07 AM9/26/03

to

What do you think about this?

inline String::String( const String& other )
{
switch ( IntAtomicGet( other.data_->refs ) ) {
case 00: data_ = new StringBuf( *other.data_ ); break;
A: default: IntAtomicIncrement( (data_ = other.data_)->refs );
} ++nCopies;
}

inline char& String::operator[]( size_t n ) {

switch ( IntAtomicGet( data_->refs ) ) {

B: case 01: pthread_refcount_setvalue( &data_->refs, 0 ); //
'unshareable'
case 00: break;
default: data_ = new StringBuf( CharRefStringBufPtr( data_ ) );
} return data_->buf[n];
}

String::String() first sees that the object being copied has refs == 1
and it thinks it should increment it. At the same time (before refs is
incremented) refs is being set to 0 by operator[]. I think
`unshareability' is broken here.

--
Hovik Melikyan

Alexander Terekhov <tere...@web.de> wrote in message news:<3F72B660...@web.de>...

Alexander Terekhov

unread,

Sep 26, 2003, 6:50:02 AM9/26/03

to

Hovik Melikyan wrote:
[...]
> String::String()
^^^^^^^^^^^^^^^^

String::String(const String &)

> first sees that the object being copied has refs == 1
> and it thinks it should increment it. At the same time (before refs is
> incremented) refs is being set to 0 by operator[].

^^^^^^^^^^
char & String::operator[](size_t n)

> I think `unshareability' is broken here.

No. Application is broken here. "char & String::operator[](size_t n)"
isn't const; it's a "mutating" operation (just like "operator=()" or
something like that). You can't mutate an int and copy it concurrently.
The same rules apply to String copying and mutating. This is "basic"
thread-safety. It's thread-safe as an int. It's thread-safe as an int.
(But this applies only to *distinct* int objects, of course ;-) )

regards,
alexander.

Alexander Terekhov

unread,

Sep 26, 2003, 6:58:36 AM9/26/03

to

Hovik Melikyan wrote:
>
> What do you think about this?
>
> inline String::String( const String& other )
> {
> switch ( IntAtomicGet( other.data_->refs ) ) {
> case 00: data_ = new StringBuf( *other.data_ ); break;
> A: default: IntAtomicIncrement( (data_ = other.data_)->refs );
> } ++nCopies;
> }
>
> inline char& String::operator[]( size_t n ) {
> switch ( IntAtomicGet( data_->refs ) ) {
> B: case 01: pthread_refcount_setvalue( &data_->refs, 0 ); //

Note <http://www.terekhov.de/pthread_refcount_t/draft-edits.txt>:

"The results are undefined if pthread_refcount_setvalue() is
called while other thread is operating on the reference count."

regards,
alexander.

Hovik Melikyan

unread,

Sep 26, 2003, 7:26:08 AM9/26/03

to

(I'm sorry for responding to a wrong message, that's google's delays)

Alexander Terekhov <tere...@web.de> wrote in message news:<3F72B660...@web.de>...
>

> First off, strings object can be shared "just like ints". If you
> only call *const* stuff on them (and it's a good idea to make all
> such immutable instances const qualified), then you don't need to
> worry about any sync (except lifetime management, of course). If
> you call any mutators (non-const stuff), then any access to such
> (non-const; mutable) shared string object needs to by synchronized
> with respect to "writes" (non-const stuff), but you can still have
> concurrent "reads" (you'd need a read-write lock or something like
> that). And, of course, you'll have to "pray" that those mysterious
> "memory locations" won't stay in your way (breaking everything...
> false-sharing aside for a moment).
>

So if you have a string object S and its life-time is guaranteed to be
ok for some threads (say, S is simply static) there's no way you can
call any non-const method from _any_ thread! Because any other thread
can issue a copy operation on it at the same time. What can I do with
that string then?

That's why I was asking is there any kind of 'usage protocol' for your
String class. If that's the protocol it looks almost useless or, ok,
too limited. There are implementations that allow you to copy a string
without worrying about what kind of method (const/non-const) can be
called on that string concurrently, though such implementatios
(obviously) lack `unshareability'.

I'd simply give up the unshareability concept and just change the
return type of operator[] to const char&. From my experience you
rarely want to modify a character in the string, instead, you almost
always insert, delete or append something, or otherwise you transform
a string by copying it to another buffer. Trivial uppercase/lowercase
transformations can be done in a pretty optimal way without triggering
unshareability.

(I guess in the fully-featured implementation you will have to trigger
unshareability in some other methods too, not only in operator[] ?)

--
Hovik Melikyan

Alexander Terekhov

unread,

Sep 26, 2003, 7:49:51 AM9/26/03

to

Hovik Melikyan wrote:
>
> (I'm sorry for responding to a wrong message, that's google's delays)

http://news.individual.net

[...]

> So if you have a string object S and its life-time is guaranteed to be
> ok for some threads (say, S is simply static) there's no way you can
> call any non-const method from _any_ thread!

Sure there's a way to call any non-const method from _any_ thread.
Just use a {rw-}mutex (or whatever synchronization protocol).

> Because any other thread
> can issue a copy operation on it at the same time. What can I do with
> that string then?

All the same things as with "struct Int { int i; };".

>
> That's why I was asking is there any kind of 'usage protocol' for your
> String class. If that's the protocol it looks almost useless or, ok,
> too limited. There are implementations that allow you to copy a string
> without worrying about what kind of method (const/non-const) can be
> called on that string concurrently, though such implementatios
> (obviously) lack `unshareability'.

That's "strong" thread-safety. It's pretty much useless ("just a
few" exceptions aside ;-) ).

>
> I'd simply give up the unshareability concept and just change the

> return type of operator[] to const char&. ....

See the C++ standard. Here's a short quote

<quote>

// 21.3.4 element access:
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
const_reference at(size_type n) const;
reference at(size_type n);

</quote>

regards,
alexander.

Joe Seigh

unread,

Sep 26, 2003, 8:17:23 AM9/26/03

to

Hovik Melikyan wrote:
> >
>
> A question: how a string buffer can become shared between threads
> (that is, shared between string objects owned by different threads)?
> In other words, how safe is String::String(const String&) ? If you
> guarantee safety then this constructor inevitably will be called at
> some point. I think it needs some kind of a usage protocol, besides
> trivial locking, of course. It may be obvious to you (and even to me
> :) but you kind of hide the COW mechanism from the user so you should
> describe how copy ctors and assignments should be used in an MT
> environment.
>

The String implementations behave as if there was no buffer sharing,
ie. the buffer sharing is transparent. The only reason that thread-safety
bacame a concern was that the implementors came to realize that decrementing
reference counts really is asynchronous and they had to do synchronization
on the decrement.

If you look at reference counting in general, you realize that incrementing
the reference count (copying a pointer) has a race condition where the
reference count can go to zero (and the object deleted) before you get
a chance to increment the count (in which case you've incremented something
else you probably shouldn't have). But, if you own the pointer you are
copying, so you know it isn't going to be deleted, then you know that
at least one refcount exists and therefore the count cannot go to zero.
So race condition eliminated.

String objects are the same way. You own, or otherwise have exclusive
access by a lock perhaps, the String object that has a reference to
the internal COW buffer while you are performing string ops on it. Thus
the reference count cannot go to zero during those ops, and race
conditions due to refcount going to zero cannot happen.

Joe Seigh

unread,

Sep 26, 2003, 8:22:04 AM9/26/03

to

Alexander Terekhov wrote:
>
> Frank Cusack wrote:
> >
> > On Mon, 22 Sep 2003 07:26:23 -0400 David Butenhof <David.B...@hp.com> wrote:
> > > The only PORTABLE solution is to use proper synchronization around the whole
> > > block -- e.g., locking a mutex before you test the value of initialized,
> > > and unlocking it only after writing initialized.
> >
> > So acquiring/releasing a lock implies a membar?
>
> Acquiring a {write} lock on a {rw-}mutex (recursive case aside),
> or on a semaphore implies "hoist load" + "hoist store" barrier(s).
>
> Releasing a {write} lock on a {rw-}mutex (recursive case aside),
> or on a semaphore implies "sink load" + "sink store" barrier(s).
>
> The only barriers that are needed for read locking are "hoist
> load" (for acquire) and "sink load" (for release), recursive
> case aside, of course.
>

Alexander, you need to use "standard" membar mnemonics. You're
the only one who knows what the above terms mean.

Joe Seigh

Alexander Terekhov

unread,

Sep 26, 2003, 8:59:43 AM9/26/03

to

Joe Seigh wrote:
[...]

> Alexander, you need to use "standard" membar mnemonics. You're
> the only one who knows what the above terms mean.

Read/write mnemonics don't convey asymmetrical nature. Acq/rel
mnemonics don't convey the compound nature (constituents are
"hoist" and "sink" barriers) and are way too "heavy" for some
things (read locking, for example). "sink" and "hoist" (with or
without data-dependency "hints"... and acq/rel as shortcuts for
compound stuff) coupled with an atomic ops, is the way to go.

struct msync {
enum hlb_t { hlb }; // hoist-load barrier
enum ddhlb_t { ddhlb }; // hoist-load barrier with dd "hint"
enum hsb_t { hsb }; // hoist-store barrier
enum slb_t { slb }; // sink-load barrier
enum ddslb_t { ddslb }; // sink-load barrier with dd "hint"
enum ssb_t { ssb }; // sink-store barrier
enum acq_t { acq }; // hoist-load + hoist-store barrier
enum rel_t { rel }; // sink-load + sink-store barrier
enum none_t { none }; // naked
};

template<class T>
struct atomic {
T load(msync::none_t) const;
T load(msync::hlb_t) const;
T load(msync::ddhlb_t) const;
T load(msync::acq_t) const;
void store(T n, msync::none_t);
void store(T n, msync::ssb_t);
void store(T n, msync::acq_t);
void store(T n, msync::rel_t);
bool attempt_update(T o, T n, msync::none_t);
bool attempt_update(T o, T n, msync::ssb_t);
bool attempt_update(T o, T n, msync::acq_t);
bool attempt_update(T o, T n, msync::rel_t);
// ...
};

template<typename numeric>
class refcount<numeric, basic> {
public:

enum may_not_store_min_t { may_not_store_min };

private:

atomic<numeric> m_value;

template<typename min_msync, typename update_msync>
bool decrement(min_msync mms, update_msync ums) throw() {
numeric val;
do {
val = m_value.load(msync::none);
assert(min() < val);
if (min() + 1 == val) {
m_value.store(min(), mms);
return false;
}
} while (!m_value.attempt_update(val, val - 1, ums));
return true;
}

and so forth.

http://www.terekhov.de/pthread_refcount_t/experimental/refcount.cpp

regards,
alexander.

P.S. Actually, to my knowledge, the inventor of "hoist" and "sink"
mnemonics (but WITHOUT coupling to an atomic operation; which was
his mistake) is Tom Payne. Never mind that he was kinda "confused"
by David Butenhof and backed off.

http://google.com/groups?selm=9cp839%24ffg%244%40glue.ucr.edu
http://google.com/groups?selm=3AF142BF.A439C8DA%40compaq.com
http://google.com/groups?selm=9ctisg%24fbr%241%40glue.ucr.edu

Joe Seigh

unread,

Sep 26, 2003, 9:39:12 AM9/26/03

to

Alexander Terekhov wrote:
>
> Joe Seigh wrote:
> [...]
> > Alexander, you need to use "standard" membar mnemonics. You're
> > the only one who knows what the above terms mean.
>
> Read/write mnemonics don't convey asymmetrical nature. Acq/rel
> mnemonics don't convey the compound nature (constituents are
> "hoist" and "sink" barriers) and are way too "heavy" for some
> things (read locking, for example). "sink" and "hoist" (with or
> without data-dependency "hints"... and acq/rel as shortcuts for
> compound stuff) coupled with an atomic ops, is the way to go.
>

(snip)

You're trying to distinquish between membars that synchronize
memory state vs. ones that just enforce ordering of stores and
loads. I think. Nobody knows how to do formal semantics anyway,
so just stick to "load" and "store" membars and I can usually
figure out from context. "acquire" and "release" are also ok
since I can equate them to lock and unlock memory semantics
even though they are also undefined.

Also, "hoist" and "sink" are terrible mnemonics. They may be
ok for Talk Like a Pirate Day but otherwise, really.

Joe Seigh

Hovik Melikyan

unread,

Sep 26, 2003, 11:40:20 AM9/26/03

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3F741BDC...@web.de>...

>
> Note <http://www.terekhov.de/pthread_refcount_t/draft-edits.txt>:
>
> "The results are undefined if pthread_refcount_setvalue() is
> called while other thread is operating on the reference count."
>

Tell me please, how is pthread_refcount_setvalue() different from a
plain int assignment then?

"Alexander Terekhov" <tere...@web.de> wrote in message news:3F7427DF...@web.de...

>
> Hovik Melikyan wrote:
> >
> > So if you have a string object S and its life-time is guaranteed to be
> > ok for some threads (say, S is simply static) there's no way you can
> > call any non-const method from _any_ thread!
>
> Sure there's a way to call any non-const method from _any_ thread.
> Just use a {rw-}mutex (or whatever synchronization protocol).
>

What? Back to the Stone Age? :)

From what I said (above) and from what you answered (above) follows
that your String is not thread safe, because it forces me to use
locking. Again, there are [in fact, lighter] implementations that
allow me to safely assign or construct a string object from another
shared string object. These implementations are _not making the string
object thread-safe_, they only guarantee safe (lock-free!) assignment,
because it is very practical.

That was the question: if the String class is thread-safe, then how a
string buffer can become shared between threads? Through locking? Nah,
thanks.

> > can issue a copy operation on it at the same time. What can I do with
> > that string then?
>
> All the same things as with "struct Int { int i; };".
>

This Int class is less safe than a good RC/COW string... Because a
good RC/COW string wraps actual data buffer and allows manipulation
only through a safe protocol (or interface, whatever you call it).

> > I'd simply give up the unshareability concept and just change the
> > return type of operator[] to const char&. ....
>
> See the C++ standard. Here's a short quote
>

I know that it is possible to overload const and non-const, but what I
was saying is that you can get rid of `unshareability' and thus
simplify things and finally make the String interface thread-safe. But
then you'll have to sacrifice the non-const char* String::operator[]
and provide only its const equivalent instead.

--
Hovik Melikyan

Alexander Terekhov

unread,

Sep 26, 2003, 12:31:12 PM9/26/03

to

Hovik Melikyan wrote:
>
> Alexander Terekhov <tere...@web.de> wrote in message news:<3F741BDC...@web.de>...
> >
> > Note <http://www.terekhov.de/pthread_refcount_t/draft-edits.txt>:
> >
> > "The results are undefined if pthread_refcount_setvalue() is
> > called while other thread is operating on the reference count."
> >
>
> Tell me please, how is pthread_refcount_setvalue() different from a
> plain int assignment then?

http://www.terekhov.de/pthread_refcount_t/poor-man/beta2/prefcnt.c

int pthread_refcount_setvalue(
pthread_refcount_t* refcount
, size_t value
)
{
int status = pthread_mutex_trylock(&refcount->mutex);
assert(!really_bad(status));
assert(PTHREAD_REFCOUNT_MAX >= value);
if (!status) {
refcount->value = value;
status = pthread_mutex_unlock(&refcount->mutex);
assert(!status);
}
return status;
}

;-)

[...]

> From what I said (above) and from what you answered (above) follows

> that your String is not thread safe, ....

http://groups.google.com/groups?selm=3D52CD8A.1070009%40tellabs.com
(Subject: Re: Thread-Safe Classes)

regards,
alexander.

Alexander Terekhov

unread,

Sep 26, 2003, 12:38:34 PM9/26/03

to

Joe Seigh wrote:
[...]

> Also, "hoist" and "sink" are terrible mnemonics. They may be
> ok for Talk Like a Pirate Day but otherwise, really.

For the record, I disagree. ;-)

regards,
alexander.

-- ("brilliant thoughts"...)

"The issue that keeps me, and our customers up at night isn't
whether we have a Linux strategy--it is whether we have a Java
and Web services strategy. Customers ask me all the time "where
should I be building applications and developing skills?" C or
Posix don't make any sense as a target, at least for enterprise
applications. Java and Web services do make sense--for obvious
reasons."

-- http://www.eweek.com/article2/0,4149,1300161,00.asp
(Schwartz Seeks to Clarify Sun's Linux Strategy)

SenderX

unread,

Sep 26, 2003, 7:53:33 PM9/26/03

to

> But not completely:
>
> SomeClass* the_global;
> int initialized;
>
> while (!the_global)
> {
> if (atomic_exchange(&initialized, 1) == 0)
> {
> the_global = new SomeClass; // maybe atomic_set()
> }
> }

Nope.

This will work on Win32, because the atomic-ops will have membars on non-TSO
systems:

volatile SomeClass* the_global = NULL;

volatile int initialized = 0;

if ( InterlockedCompareExchange
( &initialized,
1,
0 ) == 0 )
{
the_global = new SomeClass;

InterlockedIncrement( &initialized );
}

else
{
while( InterlockedExchangeAdd
( &initialized,
0 ) == 1 )
{
Sleep( 0 );