The Inventor of Portable DCI-aka-DCL (using TSD) is... ;-)

32 views
Skip to first unread message

Alexander Terekhov

unread,
May 9, 2003, 5:42:31 AM5/9/03
to

Joseph Seigh wrote:
>
> Alexander Terekhov wrote:
> >
> > witoldk wrote:
> > >
> > > Scott Meyers <Use...@aristeia.com> wrote in message
> > > news:<MPG.191efc0a3...@news.hevanet.com>...
> > >
> > > > Please remember that my contention here is that there is no
> > > > portable and reliable way to make double-checked locking work.
> > >
> > > Please note that double-checked locking works reliably, and is
> > > portable among POSIX-compliant OSs (environments). There is no
> > > way to make it work otherwise, other than using thread private
> > > storage (the idea is credited to A. Terekhov), as mentioned in
> > > the doc referred in this thread (and many other threads here and
> > > on other NGs :) about double-checked locking being broken.
> >
> > Well,
> >
> > http://google.com/groups?threadm=3AB949A3.45B2371A%40willden.org
> > (Subject: DCL -- is there a FAQ?)
>
> Found it here in a posting by John Hickin
> http://www.google.com/groups?threadm=6kuldj$4sk%40bmtlh10.bnr.ca
> Subject: Re: MP safe Singleton using double-checked locking
>
> Sorry for the redundant posting. There seems to be a law that
> no matter how long you spend looking for something, you will find
> it just after you post to a newsgroup about it.
>
> >
> > regards,
> > alexander.
>
> Ok, now you can invoke Godwin's Law.
> >
> > P.S. Hitler hitler hitler.

and {pthread_}create a new one... in some TSD dtor. ;-)

regards,
alexander.

Joseph Seigh

unread,
May 9, 2003, 8:33:40 AM5/9/03
to

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:
> >
> > Alexander Terekhov wrote:
> > >
> > > witoldk wrote:
> > > >
> > > > Scott Meyers <Use...@aristeia.com> wrote in message
> > > > news:<MPG.191efc0a3...@news.hevanet.com>...
> > > >
> > > > > Please remember that my contention here is that there is no
> > > > > portable and reliable way to make double-checked locking work.
> > > >
> > > > Please note that double-checked locking works reliably, and is
> > > > portable among POSIX-compliant OSs (environments). There is no
> > > > way to make it work otherwise, other than using thread private
> > > > storage (the idea is credited to A. Terekhov), as mentioned in
> > > > the doc referred in this thread (and many other threads here and
> > > > on other NGs :) about double-checked locking being broken.
> > >
> > > Well,
> > >
> > > http://google.com/groups?threadm=3AB949A3.45B2371A%40willden.org
> > > (Subject: DCL -- is there a FAQ?)
> >
> > Found it here in a posting by John Hickin
> > http://www.google.com/groups?threadm=6kuldj$4sk%40bmtlh10.bnr.ca
> > Subject: Re: MP safe Singleton using double-checked locking
> >

...


>
> and {pthread_}create a new one... in some TSD dtor. ;-)
>

comp.lang.c++.moderated a little too moderate, i.e. slow?

I think the response to Scott Myers' OP demonstrated how problematic DCL
is though most of the respondents don't realize it.

Anyway, David Butenhof has also mentioned TSD as a fix for DCL. Perhaps he can
recall where it first showed up.

Joe Seigh

Alexander Terekhov

unread,
May 9, 2003, 8:51:03 AM5/9/03
to

Joseph Seigh wrote:
[...]

> comp.lang.c++.moderated a little too moderate, i.e. slow?

Yeah, but it's a little bit faster than also-moderated comp.std.c++. ;-)

<Forward-Inline>

-------- Original Message --------
Newsgroups: comp.std.c++
Subject: Re: scoped static singleton question

Scott Meyers wrote:
>
> On Mon, 5 May 2003 18:33:49 +0000 (UTC), Alexander Terekhov wrote:
> > Scott Meyers wrote:
> > [...]
> > > I was under the impression that memory barriers applied only to
> > > multiprocessor machines.
> >
> > That's true of you're concerned with "ordinal" memory only and don't
> > care about memory-mapped I/O space and devices.
>
> Which suggests that memory barriers are not helpful for solving the general
> problem of double-checked locking in a threaded environment.

Let me put it this way: once you solve all problems with respect to
atomicity and compiler-induced reordering, you don't need to worry
about hardware-induced reordering on a uniprocessor... if you're
concerned with "ordinal" memory only and don't care about memory-
mapped I/O space and devices.

Well, I do believe that "memory barriers" (embedded into atomic<>
stuff) COULD be very, very helpful for solving all sorts of
synchronization problems. The only problem I see is that reaching
consensus wouldn't be an easy task, I'm afraid.

Consider the following "illustration"... also an initialization
pattern, but I wouldn't call it DCI (or DCL-if-you-so-like-it):

atomic<stuff*> instance_ptr = ATOMIC_INITIALIZER(0); // static

stuff & instance() {
stuff * ptr;
if (0 == (ptr = instance_ptr.load_ddrmb())) {
ptr = new stuff();
if (!instance_ptr.attempt_update_wmb(ptr, 0)) { // too late
delete ptr;
if (0 == (ptr = instance_ptr.load_ddrmb()))
abort();
}
else { // only one thread can reach here
static deleter<stuff> cleanup(ptr);
}
}
return *ptr;
}

Well,

http://google.com/groups?threadm=3E60CF71.9784884F%40web.de
(Subject: Re: Acquire/Release memory synchronization....)

regards,
alexander.

-------- Original Message --------
Newsgroups: comp.std.c++
Subject: Re: scoped static singleton question

Mirek Fidler wrote:
[...]
> Now the interesting topic could be how such tool might look like...

New keywords aside for a moment, such tool might look like
"__attribute__((thread-shared))" or something like that. For example:

int you_name_it() {
static int i __attribute__((thread-shared)) = calculate(/*...*/);
return i;
}

regards,
alexander.

--
http://groups.google.com/groups?threadm=3EB82EA0.F40E66C4%40web.de
(Subject: __attribute__((cleanup(function)) versus try/finally)

</Forward-Inline>

regards,
alexander.

Joseph Seigh

unread,
May 9, 2003, 1:02:05 PM5/9/03
to

attempt_update? You been reading Doug Lea's JSR 166 stuff
http://gee.cs.oswego.edu/dl/concurrent/dist/docs/index.html
see AtomicReference class.

That's roughly the logic I used here
http://groups.google.com/groups?selm=3E4CE47B.4EDBC7%40xemaps.com
in initFreeQueue

Also note this posting by Hans Boehm, the bit about lazy instantiation
http://groups.google.com/groups?as_umsgid=98m2b6%24nsp%241%40hplms2.hpl.hp.com

Similarly here
http://groups.google.com/groups?selm=1998May28.082712%40bose.com
except you apparently cannot count on the java compiler putting
the monitorexit's where you expect them.


>
> Well,
>
> http://google.com/groups?threadm=3E60CF71.9784884F%40web.de
> (Subject: Re: Acquire/Release memory synchronization....)
>


Acquire/Release should become more commonplace and familiar once
Microsoft starts distributing them. The semantics are will understood
since mutexes are using them already.

If you have a simple non refcounted atomic ptr with acquire/release
semantics, stores of the pointer have release semantics and loads
have acquire sematics. So

a = b;

would be equivalent to

lock();
a = b;
unlock();

for non-atomic pointers.
and dereferencing

... = a->data;

would be the equivalent of the non-atomic

lock();
local_temp = a;
unlock();
... = local_temp->data;


except that the atomic versions are faster and lock-free. If you understand
the locked versions then you should be able to understand atomic pointers
with acquire/release semantics.

Joe Seigh

Alexander Terekhov

unread,
May 9, 2003, 1:33:31 PM5/9/03
to

Joseph Seigh wrote:
[...]

> > atomic<stuff*> instance_ptr = ATOMIC_INITIALIZER(0); // static
> >
> > stuff & instance() {
> > stuff * ptr;
> > if (0 == (ptr = instance_ptr.load_ddrmb())) {
> > ptr = new stuff();
> > if (!instance_ptr.attempt_update_wmb(ptr, 0)) { // too late
> > delete ptr;
> > if (0 == (ptr = instance_ptr.load_ddrmb()))
> > abort();
> > }
> > else { // only one thread can reach here
> > static deleter<stuff> cleanup(ptr);
> > }
> > }
> > return *ptr;
> > }
>
> attempt_update? You been reading Doug Lea's JSR 166 stuff
> http://gee.cs.oswego.edu/dl/concurrent/dist/docs/index.html
> see AtomicReference class.

Yup.

>
> That's roughly the logic I used here
> http://groups.google.com/groups?selm=3E4CE47B.4EDBC7%40xemaps.com
> in initFreeQueue

Yeah, I know. Well, consider:

<Forward-Inline>

G'Day,

I wrote:

----
Betreff: [off-band] Re: [PATCH] PPC64 nptl fixes
Von: "Alexander Terekhov" <tere...@web.de>
An: sjmunroe@<snip>
Datum: 31.03.03 23:32:49

sjmunroe@<snip> schrieb am 31.03.03 21:48:26:
>
> Finally have a working nplt enabled kernel to test with. With nplt-0.32 found
> only two problems. ...

You might want to take a closer look at old_pthread_cond_*.c stuff. I don't
think that memory synchronization (WRT "cond->cond") is done correct there.
----

Here's some details (selective copy&paste from a recent discussion that
I had with <snip>):

----
> barriers. From the kernel source I know the they are strange.

#define atomic_full_barrier() __asm("sync" ::: "memory")
#ifdef __powerpc64__
#define atomic_read_barrier() __asm("lwsync" ::: "memory")
#else
#define atomic_read_barrier() __asm("sync" ::: "memory")
#endif
#define atomic_write_barrier() __asm("eieio" ::: "memory")

Note that the recent IBM paper (covering Power4 stuff) says:

<quote>

Although eieio orders stores to caching-enabled storage, lwsync
is the preferred instruction for this purpose. It's recommended
that eieio not be used for this purpose.

</quote>

I find this kinda funny because with respect to "system memory"
(IO space/"device memory" aside) eieio does *ONLY* "store/store"
barrier -- is less expensive [in terms of the amount of "system
memory" reordering constraints] than lwsync (which actually does
"load/load" and "load/store" and "store/store" barriers)... if
all you need is nothing but a lonely "write memory barrier" (aka
membar_producer()[1]).

Also, the PPC can kinda-"emulate" IA64's acquire and release CAS
operations using isync (for acquire) and sync/lwsync (for release)
in conjunction with load-locked/store-conditional.

> You have to synchronize before your do the cond->cond == NULL
> check

Yeah, sort of. (or as a part of cond->cond's load operation... IA64's
ld.acq, for example, would also work here... in conjunction with
"cas.rel")

> and in atomic_compare_and_exchange_acq_bool.

Well, pls see below.

> Whats needs to be added is a read memory barrier like the rmb()
> from the kernel source.

Yes, {dependent_}rmb() after the load would also work. In
terms of read and write barriers (aka consumer and producer
barriers) and CAS-with-a-write-memory-barrier we can have the
following: < note that "dependent_" means that on some archs it
can be optimized away due to implied load-load barrier as a
result of data dependency >

int
__pthread_cond_signal_2_0 (cond)
pthread_cond_2_0_t *cond;
{
pthread_cond_t *cond_ = cond->cond;

dependent_rmb();

if (cond_ == NULL)
{
/* ...Allocation/Initialization... */

if (atomic_compare_and_exchange_bool_WMB (&cond->cond, cond_, NULL))
{
/* Somebody else just initialized the condvar */

/* ...Cleanup/destroy "cond_" instance... */

cond_ = cond->cond; // reload
dependent_rmb();
if (cond_ == NULL)
abort();
}

}

return __pthread_cond_signal (cond_);
}

Here's a-sort-of-more-IA64-"friendly" version:

int
__pthread_cond_signal_2_0 (cond)
pthread_cond_2_0_t *cond;
{
pthread_cond_t *cond_ = ATOMIC_LOAD_ACQ(&cond->cond);

if (cond_ == NULL)
{
/* ...Allocation/Initialization... */

if (atomic_compare_and_exchange_bool_REL (&cond->cond, cond_, NULL))
{
/* Somebody else just initialized the condvar */

/* ... Cleanup/destroy "cond_" instance... */

cond_ = ATOMIC_LOAD_ACQ(&cond->cond); // reload
assert(cond_ != NULL);
}

}

return __pthread_cond_signal (cond_);
}

Note that in terms of "amount" of constraints, ACQ/REL doesn't
seem to be the optimal solution here; producer/consumer [rmb()/
wmb()] seems to be the optimal one; cond->cond reloading after
"failed-to-update" cas() can be avoided on some archs if cas()
can provide the newer/recent value... but we'd still need the
dependent_rmb() and the check (or something like that) in order
to ensure "load-load" reordering constraint, I guess.
----

</Forward-Inline>

Finally, the following might also be of interest to you, Joe. ;-)

(no reply yet)

<Forward-Inline>

> Alexander Terekhov writes
>
> > P.S. Do you really have some problems fixing broken NPTL's condvars? ;-)
>
> Still discussing this with the PPC64 kernel maintainer and other experts.
> The change you suggester impacts POWER3 systems negatively while
> potentially improving POWER4. It is not clear if the improvement in
> POWER4 is large enough to offset the negative impact on older
> platforms...

Interesting. Could you please shortly outline the negative impact w.r.t.
POWER3? I'm currently in kinda-"gathering material"-stage for a would-be-
proposal on a portable "<atomic>"-stuff... things ala kinda-upcoming-
<iohw.h> for plain C[1] and "<hardware>" for C++[2] but with "a slightly
different focus" -- threading, I mean. The reason I'm looking at this is
that "a lot" of feedback I've got with respect to [3] was along the lines
of "it would be nice to have a lower level interface providing atomic
operations and memory synchronization that would allow to code things
like nonblocking ref.counting (and other things as well) in 'a portable
fashion' and 'most efficiently'". Well, I'm aware of Linux-atomic-stuff
[and the upcoming-Java's-JSR-133-stuff as well] but I have the feeling
that probably, a 'better' solution is possible...

regards,
alexander.

[1] http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n972.pdf

[2] http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1430.pdf

[3] http://www.terekhov.de/pthread_refcount_t/draft-edits.txt
http://www.terekhov.de/pthread_refcount_t/poor-man/beta2/prefcnt.h
http://www.terekhov.de/pthread_refcount_t/poor-man/beta2/prefcnt.c

</Forward-Inline>

regards,
alexander.

Joseph Seigh

unread,
May 10, 2003, 1:01:17 PM5/10/03
to

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:
> >

> > attempt_update? You been reading Doug Lea's JSR 166 stuff
> > http://gee.cs.oswego.edu/dl/concurrent/dist/docs/index.html
> > see AtomicReference class.
>
> Yup.

I heard the JSR 166 stuff is going into JDK 1.5. Doug has been
hanging around architects too long. Architects don't like to
publish semantics (rationale), just the syntax. I can figure out
what most of the stuff is for, but...

AtomicStampedReference? What is that for? A way around the
ABA problem for lock-free stacks? I've been studying how to
deal with the ABA problem in conjunction with GC so I'm probably
aware of some issues that nobody else is aware of yet. It will
be fun to see how they react when they do discover them. That's
why I probably won't be providing an equivalent interface for that
in atomic_ptr, even though it's been discussed. BTW, I came up
with another way of solving the ABA problem besides the GC based and
the versioning/stamping solutions. It's more limited and a lot less
efficient but has some interesting properties. But I'll have
to study it some more to see if it's worthwhile. It's kind of nice
to have an whole area of applictation (atomic_ptr) to yourself.
I get to "discover" all the obvious implications of its usage.
It's a good thing for me Detlefs doesn't believe it can work. Which is
fair enough, he and others are off "discovering" applications
of an instruction, DCAS, which doesn't exist (as far as the rest
of us are concerned).

And while there's compare and swap capability, there's no memory barrier stuff.
Using that for lock-free is pretty much useless without that.
Unless JSR 133 is also going into JDK 1.5 which I haven't heard.


(snip)

As to the rest of your posting... The membar stuff should be part
of a low level primatives package which can be used to construct
the higher level api. So if you need a CAS which needs bot acquire
and release semantics, you can do it. I'd also just keep it
simple, just acquire and release types of barriers. At the platform
level, there is all kinds of stuff here, some actual barriers, some
just memory access ordering directives. Different enough so that
you couldn't use them as a basis for a portable interface. Acquire
(aka rmb) and release (aka wmb) could be constructed out of them though.

Joe Seigh

Alexander Terekhov

unread,
May 10, 2003, 1:37:33 PM5/10/03
to

Joseph Seigh wrote:
>
> Alexander Terekhov wrote:
> >
> > Joseph Seigh wrote:
> > >
> > > attempt_update? You been reading Doug Lea's JSR 166 stuff
> > > http://gee.cs.oswego.edu/dl/concurrent/dist/docs/index.html
> > > see AtomicReference class.
> >
> > Yup.
>
> I heard the JSR 166 stuff is going into JDK 1.5. Doug has been
> hanging around architects too long. Architects don't like to
> publish semantics (rationale), just the syntax. I can figure out
> what most of the stuff is for, but...
>
> AtomicStampedReference? What is that for?

I don't know. That's something "new" (to me).

[...]


> And while there's compare and swap capability, there's no memory barrier stuff.

Do you mean JSR-166? Well,

> Using that for lock-free is pretty much useless without that.

http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000023.html
http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000024.html
(Subject: AtomicX classes and VOLATILE-ACQUIRE/RELEASE memory synchronization semantics)

> Unless JSR 133 is also going into JDK 1.5 which I haven't heard.

They've recently discovered a {major} problem with respect to
"premature" GC finalizations... other than that, well, it's
pretty "done" stuff, AFAIK.

http://www.cs.umd.edu/~pugh/java/memoryModel/archive/1234.html
(Subject: JavaMemoryModel: Another Java threading issue (finalization))

[...]


> Acquire (aka rmb) and release (aka wmb) could be constructed out
> of them though.

Yeah, sort of.

http://www.cs.umd.edu/~pugh/java/memoryModel/archive/1220.html
http://www.cs.umd.edu/~pugh/java/memoryModel/archive/1222.html
(Subject: JavaMemoryModel: Cookbook: barriers)

regards,
alexander.

Joseph Seigh

unread,
May 10, 2003, 2:51:27 PM5/10/03
to

I don't believe you need to resort to a memory model to formally
define concurrency semantics and refuse to go there. It's
unnecessarily complicated and confusing. Not just from a useability
point of veiw but for discussing concurrency issues as well.

Joe Seigh

SenderX

unread,
May 10, 2003, 8:44:52 PM5/10/03
to
> AtomicStampedReference? What is that for? A way around the
> ABA problem for lock-free stacks?

I hope it is...

I always wanted a version stamped load-linked and store-conditional.

The store-conditional increments a version count, to avoid aba.


Think if you could chain load-linked and store-conditionals together?

typedef union S_LockFreeStack
{
unsigned __int64 Value64;

struct
{
struct S_Node *pFront;
unsigned __int32 iVersion;
};
};

Then do something like...

union S_LockFreeStack *global_stack;

<Atomic Chained Load-Linked>

load_linked ( global_stack->pFront );
load_linked_chain ( global_stack->iVerson );

</Atomic Chained Load-Linked>

<Non-Atomic>

modify the version number and push/pop the node locally.

</Non-Atomic>

<Atomic Chained Store-Conditional>

store_conditional_chain( global_stack->iVersion )
store_conditional( global_stack->pFront );

</Atomic Chained Store-Conditional>

That would be so nice if the hardware would do LL/SC chaining!

> I've been studying how to
> deal with the ABA problem in conjunction with GC so I'm probably
> aware of some issues that nobody else is aware of yet.

I belive your atomicptr with the 16-bit version stamp that i added should
work great for Lock-Free GC. Are there issues?

> BTW, I came up
> with another way of solving the ABA problem besides the GC based and
> the versioning/stamping solutions.

Really! Nice... Do tell ;)


By the way...

Look how the asm code for the powerpc that handles the following lock-free
stack:

http://cvs.grame.fr/cgi-bin/midishare-cvs/src/common/Headers/lflifo.h?rev=1.
5&co

It gets around the ABA problem by using LL/SC combined with instruction
sync's.

Could AtomicPtr be implemented with that same scheme?

It seems to emulate a CAS that works on double the pointer size using LL/SC
with the sync opcode.


Your thoughts?

=)

--
The designer of the SMP and HyperThread friendly, AppCore library.

http://AppCore.home.attbi.com


Joseph Seigh

unread,
May 11, 2003, 6:20:15 PM5/11/03
to

SenderX wrote:
>
>
> > I've been studying how to
> > deal with the ABA problem in conjunction with GC so I'm probably
> > aware of some issues that nobody else is aware of yet.
>
> I belive your atomicptr with the 16-bit version stamp that i added should
> work great for Lock-Free GC. Are there issues?

Well, the 16 bit counter is kind of small. When the version stamping technique
for avoiding the ABA problem was devised for IBM S370, it took more than a year
to overflow a 32 bit counter. On today's processors it's a lot quicker, probably
only in the order of a few hours. So, 16 bits is nothing. When you use version
counting, you are placing a time limit on the safety of the operation. If it
goes beyond that time limit, you can't guarantee the safety of the operation.
It might be interesting for experimentation, but I would hesitate to recommend
it for production code.

The other issue is you can't mix version stamping and reference counting/GC. The
two techniques are not aware of each other. Take a hypothetical but unlikely
example. Suppose you wanted to use version stamping to pop from a queue and
use refernce counting/GC to allow traversing the queue without a lock. That would
not work. The latter depends on the deleted queue elements not being reused until
all references complete, but the version stamping part will ignore this. Items
could be put back onto the queue while they are still being referenced from
a prior existence on the queue.

Given that they appear to be imcompatible, it's better to do the version stamping
the way you are doing it now, without reference counted pointers. At least until
we better understand the problem.

>
> > BTW, I came up
> > with another way of solving the ABA problem besides the GC based and
> > the versioning/stamping solutions.
>
> Really! Nice... Do tell ;)

I need to work on it some more to determine whether it is a limited as it
appears to be.

>
> By the way...
>
> Look how the asm code for the powerpc that handles the following lock-free
> stack:
>
> http://cvs.grame.fr/cgi-bin/midishare-cvs/src/common/Headers/lflifo.h?rev=1.
> 5&co
>
> It gets around the ABA problem by using LL/SC combined with instruction
> sync's.
>
> Could AtomicPtr be implemented with that same scheme?

It would have much more limited portability. I think we need to wait until
until the problem I mentioned earlier is better understood. It it turns out
that there is use for it then it merits serious consideration.

BTW, LL/SC is basically equivalent to versioning under the covers. So for systems
that LL/SC didn't exist on, versioning could be used to to implement api instead,
provided the other issues w/ versioning could be worked out.


Joe Seigh

SenderX

unread,
May 11, 2003, 10:14:37 PM5/11/03
to
> Well, the 16 bit counter is kind of small.

Yep, but it seems to be the only size compatible with your AtomicPtr.

> When the version stamping technique
> for avoiding the ABA problem was devised for IBM S370, it took more than a
year
> to overflow a 32 bit counter.

So the problem is the version stamp rolling over to zero? I really believe I
have a way around that.

You can detect the overflow of the 16-bit version counter inside the CAS
critical section. So, you can alter it before you increment the shared data!

If you were to change it to a unique value per-thread, it would cause
contention and not allow the stack or queue to break.


Now, lets say 6 threads were inside the following CAS critical section
pseudo-code:

unsigned __int16 Comp, New;

do
{
/* Read */
Comp = Dest;


/* DETECT OVERFLOW */


/* Check for a 16-bit overflow on increment */
if ( Comp == 65,535 )
{


/* FIX OVERFLOW */


/* We MUST set this value to a unique value for this stack */
New = GetCurrentThreadId();

/* Or we could use Thread Local Storage to store uniqueue 16-nit
numbers */
}

else
{
/* Modify version counter */
New = Comp + 1;
}

/* Validate */
_ASSERTE( New > 0 && Comp >= 0 );
}

/* Try to update the version count */
while( ! CAS( &Dest, Comp, New ) );


Now, the 16-bit counter would overflow into a unique value for each thread.

That should solve the 16-bit counter overflow issue.

What do you think about it?

> but I would hesitate to recommend
> it for production code.

Not in production code, it's in a test library. ;)

David Holmes

unread,
May 12, 2003, 1:17:34 AM5/12/03
to
"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3EBD315E...@xemaps.com...

> I heard the JSR 166 stuff is going into JDK 1.5. Doug has been
> hanging around architects too long. Architects don't like to
> publish semantics (rationale), just the syntax. I can figure out
> what most of the stuff is for, but...
>
> AtomicStampedReference? What is that for?

AtomicStampedReference is a wrapper class for defining a "pointer" and an
"int" that can be manipulated atomically using the wide-CAS instructions
that are available on most of the platforms of main interest to J2SE. The
intent is to provide a class that allows you to implement wait-free
algorithms that rely on wide-CAS.

AtomicMarkableReference is a wrapper class for defining a "pointer" and a
"mark-bit" that can manipulated atomically using CAS. (The intent being that
an optimising implementation will suitably align the "pointer" so that
one-bit can be stolen for the mark-bit. Again this is to facilitate
implementation of wait-free algorithms found in the literature.

JSR-166 is still under development and subject to change.

David Holmes


David Holmes

unread,
May 12, 2003, 1:43:32 AM5/12/03
to
"Alexander Terekhov" <tere...@web.de> wrote in message
news:3EBB7807...@web.de...

> > Found it here in a posting by John Hickin
> > http://www.google.com/groups?threadm=6kuldj$4sk%40bmtlh10.bnr.ca
> > Subject: Re: MP safe Singleton using double-checked locking

:-) Oh what a painful education I had in that thread.

David Holmes


Alexander Terekhov

unread,
May 12, 2003, 4:30:15 AM5/12/03
to

How about:

OP ("naked" atomic op)

OP+HOIST_LOAD_BARRIER (with "data dependency" option)

OP+HOIST_STORE_BARRIER (with "data dependency" option)

OP+SINK_LOAD_BARRIER (with "data dependency" option)

OP+SINK_STORE_BARRIER (with "data dependency" option)

OP+ACQURE_BARRIER (w/o "data dependency" option)

OP+RELEASE_BARRIER (w/o "data dependency" option)

STORE_FENCE (standalone)

LOAD_FENCE (standalone)

FENCE (standalone)

<?>

Well, here's an illustration ("hlb" stands for "hoist load
barrier", "ssb" stands for "sink store barrier" and "dd"
stands for "data dependancy"):

atomic<stuff*> instance_ptr = ATOMIC_INITIALIZER(0); // static

stuff & instance() {
stuff * ptr;

if (0 == (ptr = instance_ptr.load_ddhlb())) {
ptr = new stuff();
if (!instance_ptr.attempt_update_ssb(ptr, 0)) { // too late
delete ptr;
if (0 == (ptr = instance_ptr.load_ddhlb()))


abort();
}
else { // only one thread can reach here
static deleter<stuff> cleanup(ptr);
}
}
return *ptr;
}

This seems to be the most effective solution for this type
of non-blocking producer/consumer scenario (in terms of the
amount of reordering constraints imposed on both software/
compiler and "hardware").

regards,
alexander.

Joseph Seigh

unread,
May 12, 2003, 7:26:48 AM5/12/03
to

David Holmes wrote:
>
> "Joseph Seigh" <jsei...@xemaps.com> wrote in message
> news:3EBD315E...@xemaps.com...
> > I heard the JSR 166 stuff is going into JDK 1.5. Doug has been
> > hanging around architects too long. Architects don't like to
> > publish semantics (rationale), just the syntax. I can figure out
> > what most of the stuff is for, but...
> >
> > AtomicStampedReference? What is that for?
>
> AtomicStampedReference is a wrapper class for defining a "pointer" and an
> "int" that can be manipulated atomically using the wide-CAS instructions
> that are available on most of the platforms of main interest to J2SE. The
> intent is to provide a class that allows you to implement wait-free
> algorithms that rely on wide-CAS.

I'm aware of one algorithm to popping items from a stack or queue using this.
Are there others?

>
> AtomicMarkableReference is a wrapper class for defining a "pointer" and a
> "mark-bit" that can manipulated atomically using CAS. (The intent being that
> an optimising implementation will suitably align the "pointer" so that
> one-bit can be stolen for the mark-bit. Again this is to facilitate
> implementation of wait-free algorithms found in the literature.

I seem to recall seeing an algorithm that needed this but I forgot the title
so I haven't been able to find it again. A lock-free way of deleting from
a doubly linked list. Claimed it scaled well.

I'm only aware of this because I've scanned the literature. What do they
think the general public is supposed to make of these new capabilities?

Joe Seigh

Joseph Seigh

unread,
May 12, 2003, 7:47:39 AM5/12/03
to

SenderX wrote:
>
> > Well, the 16 bit counter is kind of small.
>
> Yep, but it seems to be the only size compatible with your AtomicPtr.
>
> > When the version stamping technique
> > for avoiding the ABA problem was devised for IBM S370, it took more than a
> year
> > to overflow a 32 bit counter.
>
> So the problem is the version stamp rolling over to zero? I really believe I
> have a way around that.
>

No, it's not the overflow that is the problem, it's the range that is the problem.
How long before you start reusing the version numbers.

...


>
> If you were to change it to a unique value per-thread, it would cause
> contention and not allow the stack or queue to break.
>
> Now, lets say 6 threads were inside the following CAS critical section
> pseudo-code:
>

(snip)

That wouldn't work, but you could use a thread unique id to implement a load
reserved/store conditional scheme that wouldn't have the problems that a 16
bit version number would have. Good idea. But we still need to decide whether
it makes sense to combine this with a reference counted pointer. It's (JSR 166)
is in Java because there is one and only one kind of pointer, GC pointers, not because
it makes any kind of sense.

Joe Seigh

SenderX

unread,
May 12, 2003, 9:08:23 PM5/12/03
to
> That wouldn't work, but you could use a thread unique id to implement a
load
> reserved/store conditional scheme that wouldn't have the problems that a
16
> bit version number would have. Good idea.

Mr. Seigh, thank you for AtomicPtr. Hardware makers should be forced to make
a CAS designed for AtomicPtr. ;)

Turns our that my version idea has been working with your AtomicPtr. You
don't need version counters at all!!!

You need a unique id per-thread. It forces contention, and simply makes ABA
a moot point.

You don't need to increment the version stamp. you set the version stamp to
the current threads id or TLS stored unique id.

It seems to work! I have been running a lock-free FIFO queue using the
Almighty AtomicPtr and my code to avoid ABA using TLS!!!

I will post source code that runs on Windows platforms on my site in a
couple of days!

I think I have solved ABA without version increment's!

When I post the code, please test the sh%T out of it like I have been doing
=)

David Holmes

unread,
May 12, 2003, 9:23:47 PM5/12/03
to
"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3EBF8607...@xemaps.com...

> I'm aware of one algorithm to popping items from a stack or queue using
this.
> Are there others?
<snip>

> I seem to recall seeing an algorithm that needed this but I forgot the
title
> so I haven't been able to find it again. A lock-free way of deleting from
> a doubly linked list. Claimed it scaled well.

I'm not familiar with all of the work but some related references can be
found here: http://www.research.ibm.com/people/m/michael/pubs.htm

Some of the algorithms are designed for GC'ed environments like Java.

> I'm only aware of this because I've scanned the literature. What do they
> think the general public is supposed to make of these new capabilities?

Well they are not for the "general public" that's why they are isolated in
the atomic subpackage. The intent is to provide enabling technology that
allows these state-of-the-art wait-free algorithms to be implemented in
Java.

David Holmes


SenderX

unread,
May 14, 2003, 4:30:24 AM5/14/03
to
> AtomicMarkableReference is a wrapper class for defining a "pointer" and a
> "mark-bit" that can manipulated atomically using CAS. (The intent being
that
> an optimising implementation will suitably align the "pointer" so that
> one-bit can be stolen for the mark-bit. Again this is to facilitate
> implementation of wait-free algorithms found in the literature.

Is this operation safe, and can it use normal CAS instead of a CAS that
works on double the pointer size?

Is it the high-order-bit of the pointer?

David Holmes

unread,
May 15, 2003, 12:45:54 AM5/15/03
to
"SenderX" <x...@xxx.com> wrote in message
news:A8nwa.318937$Si4.2...@rwcrnsc51.ops.asp.att.net...

> > AtomicMarkableReference is a wrapper class for defining a "pointer" and
a
> > "mark-bit" that can manipulated atomically using CAS. (The intent being
> that
> > an optimising implementation will suitably align the "pointer" so that
> > one-bit can be stolen for the mark-bit. Again this is to facilitate
> > implementation of wait-free algorithms found in the literature.
>
> Is this operation safe, and can it use normal CAS instead of a CAS that
> works on double the pointer size?

As far as I am aware this is "safe" - but what safety issue were you
concerned about? The VM has to know to mask the mark-bit of course.

Yes this can use normal CAS instead of wide-CAS.

> Is it the high-order-bit of the pointer?

I think more typically it is the low-order bit. (Hmmm - does that depend on
the endianness??)

But please follow the link to the papers. I'm only the messenger, not an an
expert on lock-free algorithms.

David Holmes


Joseph Seigh

unread,
May 15, 2003, 7:45:31 AM5/15/03
to

I think the paper I was thinking of is the first one in that list. I had
looked at the problem of lock-free deletion of nodes from doubly linked lists,
but it seemed to me that trying to find live nodes to relink to was too
problematic. I'll definitely have to read that paper to see how that problem
was solved.

Joe Seigh

Reply all
Reply to author
Forward
0 new messages