A question about atomic

Peter Dimov

unread,

Apr 10, 2005, 5:49:22 AM4/10/05

to

atomic_ptr (atomic-ptr-plus.sf.net), as most of you know, is Joe Seigh's
strongly thread safe reference-counted smart pointer.

My understanding (based on atomic_ptr.txt) is that it is intended to be used
in the GC phase of the reader/writer problem as follows:

atomic_ptr<X> px; // shared object

atomic_ptr<X> read()
{
return px; // competing
}

void write()
{
lock();

atomic_ptr<X> tmp( new X( *px ) );

// update *tmp

px = tmp; // competing

unlock();
}

boost::shared_ptr can't be used in the above manner because of the two
competing accesses that violate its "basic thread safety" requirement.

However (again, if my understanding is correct), all atomic_ptr read
accesses need to have acquire semantics, while shared_ptr read accesses need
not. (local_ptr aside for a moment.)

My question is:

Can I get the best of both worlds by providing the following two methods in
shared_ptr:

template<class T> class shared_ptr
{
// ...

shared_ptr<T> copy() const;
void replace( shared_ptr<T> const & pt );
};

with the guarantee that competing accesses to 'copy' and 'replace' work (but
other competing accesses are still disallowed in order to avoid an acquire
except in 'copy')?

--
Peter Dimov
http://www.pdimov.com

Joe Seigh

unread,

Apr 10, 2005, 7:38:40 AM4/10/05

to

On Sun, 10 Apr 2005 12:49:22 +0300, Peter Dimov <pdi...@gmail.com> wrote:

> atomic_ptr (atomic-ptr-plus.sf.net), as most of you know, is Joe Seigh's
> strongly thread safe reference-counted smart pointer.
>
> My understanding (based on atomic_ptr.txt) is that it is intended to be used
> in the GC phase of the reader/writer problem as follows:
>
> atomic_ptr<X> px; // shared object
>
> atomic_ptr<X> read()
> {
> return px; // competing
> }
>
> void write()
> {
> lock();
>
> atomic_ptr<X> tmp( new X( *px ) );
>
> // update *tmp
>
> px = tmp; // competing
>
> unlock();
> }
>
> boost::shared_ptr can't be used in the above manner because of the two
> competing accesses that violate its "basic thread safety" requirement.
>
> However (again, if my understanding is correct), all atomic_ptr read
> accesses need to have acquire semantics, while shared_ptr read accesses need
> not. (local_ptr aside for a moment.)

atomic_ptr wouldn't be very useful or convenient without acquire and
release semantics. shared_ptr doesn't need them since acquire and
release semantics are provided by the mechanism used to ensure
"ownership" of shared_ptr's.

>
> My question is:
>
> Can I get the best of both worlds by providing the following two methods in
> shared_ptr:
>
> template<class T> class shared_ptr
> {
> // ...
>
> shared_ptr<T> copy() const;
> void replace( shared_ptr<T> const & pt );
> };
>
> with the guarantee that competing accesses to 'copy' and 'replace' work (but
> other competing accesses are still disallowed in order to avoid an acquire
> except in 'copy')?
>

You could, though you've in effect merged atomic_ptr and local_ptr together.
As a practical matter, I'd think you'd want some mechanism to distinguish
between shared global pointers and non-shared local pointers.

--
Joe Seigh

Chris Thomasson

unread,

Apr 10, 2005, 8:30:32 AM4/10/05

to

> Can I get the best of both worlds by providing the following two methods
> in
> shared_ptr:
>
> template<class T> class shared_ptr
> {
> // ...
>
> shared_ptr<T> copy() const;
> void replace( shared_ptr<T> const & pt );
> };
>
> with the guarantee that competing accesses to 'copy' and 'replace' work
> (but
> other competing accesses are still disallowed in order to avoid an acquire
> except in 'copy')?

You can. How are you going to allow for "competing access"? Are you using
mutexs, or going for a lock-free method such as SMR?

I am currently experimenting with using SMR to protect the reference counts
of a quick lock-free atomic pointer API I did in C. Surprisingly, it seems
to be more light-weight than atomic_ptr "on x86" despite the fact that it
relies on the expensive SMR API in order to provide "strong" thread-safety.
DWCAS can be expensive on x86. I have not compared it against Joes PPC
reference counting algorithm that uses LL/SC directly, its very efficient.

--
http://appcore.home.comcast.net/
(portable lock-free data-structures)

Chris Thomasson

unread,

Apr 10, 2005, 8:51:07 AM4/10/05

to

> As a practical matter, I'd think you'd want some mechanism to distinguish
> between shared global pointers and non-shared local pointers.

Agreed. For instance, I had to separate my atomic pointer into two
distinguishable classes to address a highly critical performance issue:

shared_smr_ptr<T>
local_smr_ptr<T>

shared_smr_ptr<T> alone is "way to expensive" to be used for a general
purpose smart pointer. I needed the ability to access the reference count
without using expensive SMR API all the time, this is where local_smr_ptr
saves the day. Knowing that you don't have to use SMR in order to access
the reference count on local_smr_ptr access actually makes SMR seem a bit
less expensive than it actually is... ;)

Peter Dimov

unread,

Apr 10, 2005, 9:23:39 AM4/10/05

to

Chris Thomasson wrote:
>> As a practical matter, I'd think you'd want some mechanism to
>> distinguish between shared global pointers and non-shared local
>> pointers.
>
> Agreed. For instance, I had to separate my atomic pointer into two
> distinguishable classes to address a highly critical performance
> issue:
> shared_smr_ptr<T>
> local_smr_ptr<T>
>
>
> shared_smr_ptr<T> alone is "way to expensive" to be used for a general
> purpose smart pointer.

That's what people tell me all the time, that shared_ptr is too expensive
and they need an "unsynchronized" variant.

But it's not quite the same as Joe's local_ptr, I think; it still atomically
updates its count. I think I'll reserve the local_ptr name for a pointer
that uses a "local" count and non-atomic updates (but is convertible from
and to a shared_ptr) so that people are free to shoot themselves in the
foot, if they so insist.

Peter Dimov

unread,

Apr 10, 2005, 9:16:46 AM4/10/05

to

Chris Thomasson wrote:
>> Can I get the best of both worlds by providing the following two
>> methods in
>> shared_ptr:
>>
>> template<class T> class shared_ptr
>> {
>> // ...
>>
>> shared_ptr<T> copy() const;
>> void replace( shared_ptr<T> const & pt );
>> };
>>
>> with the guarantee that competing accesses to 'copy' and 'replace'
>> work (but
>> other competing accesses are still disallowed in order to avoid an
>> acquire except in 'copy')?
>
> You can. How are you going to allow for "competing access"? Are you
> using mutexs, or going for a lock-free method such as SMR?

I was thinking DWCAS or mutex pool... I'm new to these lock-free things.
Need to read about SMR, I guess.

Chris Thomasson

unread,

Apr 10, 2005, 10:05:27 AM4/10/05

to

"Peter Dimov" <pdi...@gmail.com> wrote in message
news:d3b9b5$qea$1...@domitilla.aioe.org...

> Chris Thomasson wrote:
>>> As a practical matter, I'd think you'd want some mechanism to
>>> distinguish between shared global pointers and non-shared local
>>> pointers.
>>
>> Agreed. For instance, I had to separate my atomic pointer into two
>> distinguishable classes to address a highly critical performance
>> issue:
>> shared_smr_ptr<T>
>> local_smr_ptr<T>
>>
>>
>> shared_smr_ptr<T> alone is "way to expensive" to be used for a general
>> purpose smart pointer.
>
> That's what people tell me all the time, that shared_ptr is too expensive
> and they need an "unsynchronized" variant.

The overhead generated by the strict memory barrier loop required to access
the reference count wrt shared_smr_ptr's really do require a "local access"
class of sorts. Just something that avoids the SMR API for every reference
count increment...

:O

> I think I'll reserve the local_ptr name for a pointer
> that uses a "local" count and non-atomic updates (but is convertible from
> and to a shared_ptr) so that people are free to shoot themselves in the
> foot, if they so insist.

That would be a pseudo distributed count. each local_ptr would contain a
"separate" count. When a shared_ptr gets copied into a local_ptr an atomic
increment with acquire semantics would be applied to the "shared" count, and
the "local" count would be set to 1. The acquire semantics here would be
used to sync with the "local_ptr drop to zero transition". When the "local"
count drops to zero an atomic decrement with release semantics would be
applied to the shared count to ensure that all the local affects on shared
memory are applied before the decrement of the reference count.

local_ptr<X> = shared_ptr<X>
-------------------------------
atomic_inc_acquire( &shared_ptr<X>::refs );
local_ptr<X>::refs = 1;

local_ptr<X> = 0
-------------------------------
--local_ptr<X>::refs
if ( ! local_ptr<X>::refs )
{
if ( ! atomic_dec_acquire( &shared_ptr<X>::refs ) )
{
// whatever
}
}

I could see where this would be useful wrt performance. It actually might be
worth the trouble and the extra documentation... ;)

Joe Seigh

unread,

Apr 10, 2005, 10:22:18 AM4/10/05

to

local_ptr atomically updates the reference counts. It has less overhead
than atomic_ptr since it doesn't have to maintain pointer atomicity, just
reference count atomicity. The main difference is you don't have to use
an interlocked primative to swap pointer values on an pointer assignment
which atomic_ptr does. So local_ptr looks like the thread-safe version
of shared_ptr somewhat.

The main difference between local_ptr and atomic_ptr is the swap method
and the function used to safely increment the reference count.
atomic_ptr::swap() is atomically thread-safe, local_ptr::swap() is not.
If I get around to merging the compare and swap and (ll/sc, lwarx/stwcx) versions
of atomic_ptr it will become a little clearer since I'll have to
clean up the abstraction boundaries a little better.

--
Joe Seigh

Chris Thomasson

unread,

Apr 10, 2005, 6:48:31 PM4/10/05

to

DOH!

> local_ptr<X> = 0
> -------------------------------
> --local_ptr<X>::refs
> if ( ! local_ptr<X>::refs )
> {
> if ( ! atomic_dec_acquire( &shared_ptr<X>::refs ) )

^^^^^^^^^^^^^^^^

That's:

if ( ! atomic_dec_release( &shared_ptr<X>::refs ) )

> {
> // whatever
> }
> }
>
>
>
>
> I could see where this would be useful wrt performance. It actually might
> be worth the trouble and the extra documentation... ;)

:)

Chris Thomasson

unread,

Apr 10, 2005, 10:52:06 PM4/10/05

to

> I was thinking DWCAS or mutex pool... I'm new to these lock-free things.

I would go for a strict pthread solution "first"; Quickly address the
portability factor. I guess you could simply hash the address of a
shared_ptr reference count into an index of an array of pthread_mutex_t's.
There is a simple solution for this:

http://msdn.microsoft.com/msdnmag/issues/01/08/Concur/

You can take advantage of the fact that the mutex table would be never be
destroyed.

DWCAS solutions will work fine, just remember that once you use DWCAS with
pointers you fall into the portability trap. Most 64-bit processors do not
have DWCAS.

> Need to read about SMR, I guess.

SMR needs a rather extensive underlying support system in order to be
useful. You need to heavily optimize the allocation scheme that provides the
nodes you use for your collections and/or reference counts. Some key SMR
API's need to be coded in assembly language; they are too sensitive to an
over aggressive optimizing compiler. Also, you need to examine the method in
which SMR utilizes Thread-Local-Storage. My AppCore library has an efficient
hazard pointer scheme that you might want to take a quick look at. Any
questions are welcome...

Peter Dimov

unread,

Apr 11, 2005, 5:53:59 AM4/11/05

to

Chris Thomasson wrote:
> local_ptr<X> = shared_ptr<X>
> -------------------------------
> atomic_inc_acquire( &shared_ptr<X>::refs );
> local_ptr<X>::refs = 1;
>
>
> local_ptr<X> = 0
> -------------------------------
> --local_ptr<X>::refs
> if ( ! local_ptr<X>::refs )
> {
> if ( ! atomic_dec_acquire( &shared_ptr<X>::refs ) )
> {
> // whatever
> }
> }

Yes, exactly, but why acquire in the increment? Won't the usual release on
nonzero/acquire on zero decrement do?

Alexander Terekhov

unread,

Apr 11, 2005, 7:00:59 AM4/11/05

to

Chris Thomasson wrote:
>
> DOH!
>
> > local_ptr<X> = 0
> > -------------------------------
> > --local_ptr<X>::refs
> > if ( ! local_ptr<X>::refs )
> > {
> > if ( ! atomic_dec_acquire( &shared_ptr<X>::refs ) )
> ^^^^^^^^^^^^^^^^
>
> That's:
>
> if ( ! atomic_dec_release( &shared_ptr<X>::refs ) )

Depending on value, you'll need either acquire or release here.

regards,
alexander.

Alexander Terekhov

unread,

Apr 11, 2005, 6:50:11 AM4/11/05

to

Alexander Terekhov wrote:

>
> Peter Dimov wrote:
> >
> > Chris Thomasson wrote:
> > > local_ptr<X> = shared_ptr<X>
> > > -------------------------------
> > > atomic_inc_acquire( &shared_ptr<X>::refs );
> > > local_ptr<X>::refs = 1;
> > >
> > >
> > > local_ptr<X> = 0
> > > -------------------------------
> > > --local_ptr<X>::refs
> > > if ( ! local_ptr<X>::refs )
> > > {
> > > if ( ! atomic_dec_acquire( &shared_ptr<X>::refs ) )

That must be upcoming Apple's OSAtomicDecrementRelAcq32() or
OSAtomicDecrementSlbHsb32() (when X is immutable while managed
by shared_ptr<>). ;-)

> > > {
> > > // whatever
> > > }
> > > }
> >
> > Yes, exactly, but why acquire in the increment?
>

> In your vocabulary, he means copy(). Your copy() does need acquire
> semantics to ensure visibility of client object "published" by your
> replace().
>
> BTW, strong thread-safety aside for a moment, I think that you can
> simply provide a "move to/from" operation for shared_ptr<T, D,
> thread_safety::basic>, shared_ptr<T, D, thread_safety::unsafe>, and
> std::auto_ptr<T> (when your D deleter on the other side is
> empty/default). It can throw something if the source is "!empty()
> || !unique()". Or just make it a precondition. Frankly, it all
> boils down to release() call which you don't provide currently. ;-)

regards,
alexander.

Peter Dimov

unread,

Apr 11, 2005, 7:30:45 AM4/11/05

to

Alexander Terekhov wrote:
>>>> if ( ! atomic_dec_acquire( &shared_ptr<X>::refs ) )
>
> That must be upcoming Apple's OSAtomicDecrementRelAcq32() or
> OSAtomicDecrementSlbHsb32() (when X is immutable while managed
> by shared_ptr<>). ;-)

Speaking of which, is it possible to implement the ...SlbHsb variant in a
more efficient manner on a PPC? On something else? (More effcient than the
...RelAcq, of course.)

Alexander Terekhov

unread,

Apr 11, 2005, 7:48:36 AM4/11/05

to

Peter Dimov wrote:

[... local_shared_ptr ...]

> It's like intrusive_ptr_st< shared_ptr_mt<X> >.

I gather that it's like

shared_ptr<
T, idle_functor< shared_ptr< T, D, thread_safety::basic > >,
thread_safety::unsafe,
weak_ptr::no_thank_you
> local_shared_ptr(get_pointer(shared_ptr), shared_ptr);

right?

regards,
alexander.

Alexander Terekhov

unread,

Apr 11, 2005, 6:36:53 AM4/11/05

to

Peter Dimov wrote:
>
> Chris Thomasson wrote:
> > local_ptr<X> = shared_ptr<X>
> > -------------------------------
> > atomic_inc_acquire( &shared_ptr<X>::refs );
> > local_ptr<X>::refs = 1;
> >
> >
> > local_ptr<X> = 0
> > -------------------------------
> > --local_ptr<X>::refs
> > if ( ! local_ptr<X>::refs )
> > {
> > if ( ! atomic_dec_acquire( &shared_ptr<X>::refs ) )
> > {
> > // whatever
> > }
> > }
>
> Yes, exactly, but why acquire in the increment?

In your vocabulary, he means copy(). Your copy() does need acquire

Peter Dimov

unread,

Apr 11, 2005, 7:26:44 AM4/11/05

to

Alexander Terekhov wrote:
> Peter Dimov wrote:
>>
>> Chris Thomasson wrote:
>>> local_ptr<X> = shared_ptr<X>
>>> -------------------------------
>>> atomic_inc_acquire( &shared_ptr<X>::refs );
>>> local_ptr<X>::refs = 1;
>>>
>>>
>>> local_ptr<X> = 0
>>> -------------------------------
>>> --local_ptr<X>::refs
>>> if ( ! local_ptr<X>::refs )
>>> {
>>> if ( ! atomic_dec_acquire( &shared_ptr<X>::refs ) )
>>> {
>>> // whatever
>>> }
>>> }
>>
>> Yes, exactly, but why acquire in the increment?
>
> In your vocabulary, he means copy(). Your copy() does need acquire
> semantics to ensure visibility of client object "published" by your
> replace().

No, no.

This is about local_shared_ptr, not about copy/replace. The
local_shared_ptrs in an ownership group (which needs to be local to a
specific thread) use a local unsynchronized reference count to keep track of
their single shared_ptr-like reference into the shared_ptr group. ChrisX got
it right, but I still think that there's no need to acquire in the
increment, it's an ordinary 'add_ref_strong'.

> BTW, strong thread-safety aside for a moment, I think that you can
> simply provide a "move to/from" operation for shared_ptr<T, D,
> thread_safety::basic>, shared_ptr<T, D, thread_safety::unsafe>, and
> std::auto_ptr<T> (when your D deleter on the other side is
> empty/default). It can throw something if the source is "!empty()
> || !unique()". Or just make it a precondition. Frankly, it all
> boils down to release() call which you don't provide currently. ;-)

The above local_shared_ptr does not need to steal ownership away from the
source shared_ptr, nor does it need an unique source. And you can also get a
real shared_ptr back, again without the 'unique' requirement on the local
source.

Joe Seigh

unread,

Apr 11, 2005, 7:50:52 AM4/11/05

to

On Mon, 11 Apr 2005 14:26:44 +0300, Peter Dimov <pdi...@gmail.com> wrote:

> Alexander Terekhov wrote:
>> Peter Dimov wrote:
>>>
>>> Yes, exactly, but why acquire in the increment?
>>
>> In your vocabulary, he means copy(). Your copy() does need acquire
>> semantics to ensure visibility of client object "published" by your
>> replace().
>
> No, no.
>
> This is about local_shared_ptr, not about copy/replace. The
> local_shared_ptrs in an ownership group (which needs to be local to a
> specific thread) use a local unsynchronized reference count to keep track of
> their single shared_ptr-like reference into the shared_ptr group. ChrisX got
> it right, but I still think that there's no need to acquire in the
> increment, it's an ordinary 'add_ref_strong'.

As a practical matter I don't think you're going to have have more than
one local pointer copy of a global shared pointer often enought to justify
the extra overhead of maintaining a separate reference count and of the extra
indirection.

--
Joe Seigh

Chris Thomasson

unread,

Apr 11, 2005, 7:04:46 AM4/11/05

to

>> if ( ! atomic_dec_release( &shared_ptr<X>::refs ) )
>
> Depending on value, you'll need either acquire or release here.

Ahh yes... Release on > 0 and Acquire for < 1 correct?

;)

Peter Dimov

unread,

Apr 11, 2005, 8:04:48 AM4/11/05

to

Right.

Alexander Terekhov

unread,

Apr 11, 2005, 7:22:06 AM4/11/05

to

Well, given that in my (unwritten) book, mad std::string cows "eat"
("OS" comes from Apple) OSAtomicDecrementIfGreaterThanOneSlbHsb32()
with checking and no msync for zero ("unsharable" indicator), I'd
say "almost". Okay? ;-)

regards,
alexander.

Peter Dimov

unread,

Apr 11, 2005, 8:40:30 AM4/11/05

to

You may well be right. It's just one idea for a possible extension of
shared_ptr that could pacify the "we want our unsynchronized pointers"
portion of the "C++ community". I'm not sure yet whether it's a _good_ idea.

Alexander Terekhov

unread,

Apr 11, 2005, 8:47:49 AM4/11/05

to

Peter Dimov wrote:
>
> Alexander Terekhov wrote:
> >>>> if ( ! atomic_dec_acquire( &shared_ptr<X>::refs ) )
> >
> > That must be upcoming Apple's OSAtomicDecrementRelAcq32() or
> > OSAtomicDecrementSlbHsb32() (when X is immutable while managed
> > by shared_ptr<>). ;-)
>
> Speaking of which, is it possible to implement the ...SlbHsb variant in a
> more efficient manner on a PPC?

SlbHsb version doesn't seem to need trailing isync, and maybe can
achieve the effect of leading {lw}sync with respect to loads->store
ordering using "branch never taken" trick. Consider: (excerpt from
Book II)

----
B.2.3 Safe Fetch

If a load must be performed before a subsequent store (e.g., the
store that releases a lock protecting a shared data structure), a
technique similar to the following can be used.

In this example it is assumed that the address of the storage
operand to be loaded is in GPR 3, the contents of the storage
operand are returned in GPR 4, and the address of the storage
operand to be stored is in GPR 5.

lwz r4,0(r3) #load shared data
cmpw r4,r4 #set CR0 to "equal"
bne- $-8 #branch never taken
stw r7,0(r5) #store other shared data
----

In our case, we load and check the count and do take a conditional
branch which leads to stores which we want to order with respect to
load and check. That's trailing isync. As for leading {lw}sync, we
can add similar "branch never taken" trick to order it with respect
to preceding loads, but I'm not sure whether register dependency
(r4 for both load and compare above) can play some role here in
addition to "branch never taken".

regards,
alexander.

Joe Seigh

unread,

Apr 11, 2005, 1:10:24 PM4/11/05

to

On Mon, 11 Apr 2005 15:40:30 +0300, Peter Dimov <pdi...@gmail.com> wrote:

> Joe Seigh wrote:
>>
>> As a practical matter I don't think you're going to have have more
>> than one local pointer copy of a global shared pointer often enought to
>> justify the extra overhead of maintaining a separate reference count
>> and of the extra indirection.
>
> You may well be right. It's just one idea for a possible extension of
> shared_ptr that could pacify the "we want our unsynchronized pointers"
> portion of the "C++ community". I'm not sure yet whether it's a _good_ idea.
>
>

There's a bit of a learning curve here. I don't think the general
programming community has enough experience in multi-threaded
programming, compared to the experience we have here in c.p.t, to have
a good feel for all the issues yet.

For example, when you're doing reference counting in a single threaded
program, you don't need to distinguish between read access and write
access. But when you start using multi-threading, the performance
implications of distinquishing between read and write access become
more important, which is why I've been emphasizing smart pointers
as a solution to the readers/writers problem rather than a generic
resource management tool.

There seems to be some attraction in c.l.c++.m to smart pointers that
have almost no overhead, the deferred reference counting being
promoted by Andrei but there's no such thing a a free lunch. Deferred
reference counting does Boehm style GC against the thread stacks and
that requires a "stop the world while we GC" which has huge performance
implications in multi-threading. We're well aware of the problem
of checking or monitoring threads local references. That's what makes
RCU or SMR so tricky. If there was an easy way around it, we'd be
using it. There isn't. What you have is a bunch of different techniques
with different trade-offs. What you do is use the one that gives
you the most benefit depending on the situation. There won't be one
size fits all.

What you will want to do for Boost is allow different smart pointer
techniques for the time being until multi-processor systems become
ubiquitous enough that everyone has a feel for the issues. Then
you can start winnowing the smart pointers down. But I don't think
you will have one grand unified smart pointer. I've been at this
for a while and I'm not willing to discount any of current techniques
yet. Even the ones I didn't invent. :)

--
Joe Seigh

Peter Dimov

unread,

Apr 11, 2005, 6:45:10 PM4/11/05

to

Joe Seigh wrote:
>
> You could, though you've in effect merged atomic_ptr and local_ptr
> together. As a practical matter, I'd think you'd want some mechanism
> to distinguish between shared global pointers and non-shared local
> pointers.

I'm not sure.

I can see how "thread safety is part of the type" can be useful.

But on the other hand "int" offers "basic" thread safety by default and
"strong" thread safety if I only touch it with atomic_load_acq and
atomic_store_rel.

Of course in Java "int" is basic and "volatile int" is strong.

For some reason I think that the "non-Java int" (and raw pointer) semantics
make more sense for shared_ptr; it seems sufficiently primitive to me. I may
be wrong, of course.

This is also why I'm not particularly fond of Alexander's atomic<> dedicated
wrapper type. Providing atomic access via ordinary functions seems more
natural.

Alexander Terekhov

unread,

Apr 11, 2005, 7:07:08 PM4/11/05

to

Peter Dimov wrote:
[...]

> But on the other hand "int" offers "basic" thread safety by default and
> "strong" thread safety if I only touch it with atomic_load_acq and
> atomic_store_rel.

Only if you properly isolate (align and pad if nessesary) it.

That's one reason for atomic<> wrapper.

[...]

> This is also why I'm not particularly fond of Alexander's atomic<> dedicated
> wrapper type. Providing atomic access via ordinary functions seems more
> natural.

Another reason is to "enforce" proper labeling for all accesses by
making the type opaque. An exclusive access on atomic<int> doesn't
need to be isolated/atomic at all, it can be combined with other
accesses. How are you going to do it all with "extended-for-threads"
C/C++ volatiles (also mentioned by you in another message not here),
for example?

What else?

regards,
alexander.

Joe Seigh

unread,

Apr 11, 2005, 7:15:37 PM4/11/05

to

I meant it wouldn't be clear from the code context what was safe or not.
You'd have to know that a particular instance of shared_ptr is used
is a certain way to know that methods were safe or not. If sp1 and
sp2 are shared pointers how do you know which of the following to use?
sp1 = sp2;
sp1 = sp2.copy();
sp1.replace(sp2);
sp1.replace(sp2.copy());

Whereas if you keep them as two separate types you can keep the methods
consistent for each type. So for example, given atomic_ptr ap and
local_ptr lp which by the rules for local_ptr is always local or "owned".
ap = lp;
lp = ap;

or any combination of atomic_ptr and local_ptr is safe, always. You
don't have to know anything about ap or lp.

atomic<T> is the same idea applied to dumb (raw) pointers or other native
types.

--
Joe Seigh

Alexander Terekhov

unread,

Apr 11, 2005, 7:15:51 PM4/11/05

to

Alexander Terekhov wrote:
>
> Peter Dimov wrote:
> [...]
> > But on the other hand "int" offers "basic" thread safety by default and
> > "strong" thread safety if I only touch it with atomic_load_acq and
> > atomic_store_rel.
>
> Only if you properly isolate (align and pad if nessesary) it.

And even properly isolated built-in "int" can be accessed on
per-bit basis (or per-byte/whatnot, non-atomically) by conforming
implementations. Just to be anal, so to speak.

Peter Dimov

unread,

Apr 11, 2005, 9:03:45 PM4/11/05

to

Joe Seigh wrote:
> I meant it wouldn't be clear from the code context what was safe or
> not. You'd have to know that a particular instance of shared_ptr is
> used is a certain way to know that methods were safe or not.

Yes, I understood that.

> If sp1 and
> sp2 are shared pointers how do you know which of the following to use?
> sp1 = sp2;
> sp1 = sp2.copy();
> sp1.replace(sp2);
> sp1.replace(sp2.copy());
>
>
> Whereas if you keep them as two separate types you can keep the
> methods consistent for each type. So for example, given atomic_ptr
> ap and local_ptr lp which by the rules for local_ptr is always local
> or "owned".
>
> ap = lp;
> lp = ap;
>
> or any combination of atomic_ptr and local_ptr is safe, always. You
> don't have to know anything about ap or lp.

No; both lp = ap and ap = lp are unsafe if I don't know anything about lp
(if another thread can read or write lp), if I understand their semantics
correctly.

sp1.replace( sp2 );
sp2 = sp1.copy();

are equally safe (and more explicit) if I have the same information about
sp2 as I do for lp (i.e. it's not visible to other threads.) I admit that
sometimes encoding the intent that lp is not shared into its type may help
or appeal to some programmers.

But regardless...

My point was that (low level) concurrent accesses without proper
synchronization are always "unsafe", so it's a waste of time to try to make
them safe. "Unsafe" in the sense that it's very, very hard for a non-expert
to get a lock-free algorithm right. And the expert is unlikely to need his
pointers color-coded.

But I may be wrong. :-)

Peter Dimov

unread,

Apr 11, 2005, 9:11:50 PM4/11/05

to

Alexander Terekhov wrote:
> Peter Dimov wrote:
> [...]
>> But on the other hand "int" offers "basic" thread safety by default
>> and "strong" thread safety if I only touch it with atomic_load_acq
>> and atomic_store_rel.
>
> Only if you properly isolate (align and pad if nessesary) it.
>
> That's one reason for atomic<> wrapper.

Do you mean that on some systems a 32 bit int aligned on a 32 bit boundary
may not allow atomic accesses? Or in more general terms, a type T's natural
alignment may not be sufficient to ensure atomicity?

>> This is also why I'm not particularly fond of Alexander's atomic<>
>> dedicated wrapper type. Providing atomic access via ordinary
>> functions seems more natural.
>
> Another reason is to "enforce" proper labeling for all accesses by
> making the type opaque. An exclusive access on atomic<int> doesn't
> need to be isolated/atomic at all, it can be combined with other
> accesses. How are you going to do it all with "extended-for-threads"
> C/C++ volatiles (also mentioned by you in another message not here),
> for example?

volatiles? Who said anything about volatiles?

T atomic_load_acq( T * pt );
void atomic_store_rel( T * pt, T v );
// et al

and ordinary non-volatile loads and stores for non-competing accesses...
what am I missing? "naked_competing" loads? Add atomic_load_nsync, then.

Chris Thomasson

unread,

Apr 11, 2005, 10:07:43 PM4/11/05

to

> This is about local_shared_ptr, not about copy/replace. The
> local_shared_ptrs in an ownership group (which needs to be local to a
> specific thread) use a local unsynchronized reference count to keep track
> of
> their single shared_ptr-like reference into the shared_ptr group. ChrisX
> got
> it right, but I still think that there's no need to acquire in the
> increment, it's an ordinary 'add_ref_strong'.

If add_ref_strong already provides the necessary memory barriers you would
be ok. However, explicit acquire/release semantics would be necessary wrt
copy/replace...

Chris Thomasson

unread,

Apr 11, 2005, 10:25:02 PM4/11/05

to

> There's a bit of a learning curve here. I don't think the general
> programming community has enough experience in multi-threaded
> programming, compared to the experience we have here in c.p.t, to have
> a good feel for all the issues yet.

That's for sure.

> For example, when you're doing reference counting in a single threaded
> program, you don't need to distinguish between read access and write
> access. But when you start using multi-threading, the performance
> implications of distinquishing between read and write access become
> more important, which is why I've been emphasizing smart pointers
> as a solution to the readers/writers problem rather than a generic
> resource management tool.

I guess the smart-pointer's themselves could be considered "generic resource
management tools" ;)

> There seems to be some attraction in c.l.c++.m to smart pointers that
> have almost no overhead, the deferred reference counting being
> promoted by Andrei but there's no such thing a a free lunch. Deferred
> reference counting does Boehm style GC against the thread stacks and
> that requires a "stop the world while we GC" which has huge performance
> implications in multi-threading.

Actually, you mentioned a scheme that would allow for "deferred reference
counting" using per-thread lock-free single-producer/consumer queues.
Pushing a node ( reference adjustment request ) into a "local" per-thread
queue is certainly less overhead than performing an interlocked-operation on
a "shared" variable. I don't think you would need to use DWCAS either. The
logic could be isolated in the "collection phase"...

There is just one problem, if the collection cycles could not keep up with
user allocs/frees the applications memory usage could climb out of control.

> We're well aware of the problem
> of checking or monitoring threads local references. That's what makes
> RCU or SMR so tricky.

Indeed. The "collection phase" of basically all distributed garbage
collection algorithms can be very tricky.

> If there was an easy way around it, we'd be
> using it. There isn't. What you have is a bunch of different techniques
> with different trade-offs. What you do is use the one that gives
> you the most benefit depending on the situation. There won't be one
> size fits all.

;)

Alexander Terekhov

unread,

Apr 12, 2005, 4:45:46 AM4/12/05

to

Chris Thomasson wrote:
>
> > This is about local_shared_ptr, not about copy/replace. The
> > local_shared_ptrs in an ownership group (which needs to be local to a
> > specific thread) use a local unsynchronized reference count to keep track
> > of their single shared_ptr-like reference into the shared_ptr group. ChrisX
> > got it right, but I still think that there's no need to acquire in the
> > increment, it's an ordinary 'add_ref_strong'.
>
> If add_ref_strong already provides the necessary memory barriers

What for? add_ref_strong() is naked atomic.

regards,
alexander.

Alexander Terekhov

unread,

Apr 12, 2005, 4:37:42 AM4/12/05

to

Peter Dimov wrote:
>
> Alexander Terekhov wrote:
> > Peter Dimov wrote:
> > [...]
> >> But on the other hand "int" offers "basic" thread safety by default
> >> and "strong" thread safety if I only touch it with atomic_load_acq
> >> and atomic_store_rel.
> >
> > Only if you properly isolate (align and pad if nessesary) it.
> >
> > That's one reason for atomic<> wrapper.
>
> Do you mean that on some systems a 32 bit int aligned on a 32 bit boundary
> may not allow atomic accesses? Or in more general terms, a type T's natural
> alignment may not be sufficient to ensure atomicity?

Show me first chapter and verb for "natural alignment" in C/C++ or
even POSIX, please.

>
> >> This is also why I'm not particularly fond of Alexander's atomic<>
> >> dedicated wrapper type. Providing atomic access via ordinary
> >> functions seems more natural.
> >
> > Another reason is to "enforce" proper labeling for all accesses by
> > making the type opaque. An exclusive access on atomic<int> doesn't
> > need to be isolated/atomic at all, it can be combined with other
> > accesses. How are you going to do it all with "extended-for-threads"
> > C/C++ volatiles (also mentioned by you in another message not here),
> > for example?
>
> volatiles? Who said anything about volatiles?

Well, you said: "Why wrap the thing in its own atomic<> type (I'm
reminded of volatiles here) instead of just providing labeled
access functions?"

>
> T atomic_load_acq( T * pt );
> void atomic_store_rel( T * pt, T v );
> // et al
>
> and ordinary non-volatile loads and stores for non-competing accesses...
> what am I missing? "naked_competing" loads? Add atomic_load_nsync, then.

First off, an access can be either atomic or not atomic. Non-competing
load accesses need not be atomic. Non-competing store accesses must be
atomic (because it is non-competing with respect to stores, but it is
"competing" with respect to loads), exclusive store or load accesses
need not be atomic. Competing accesses may need msync to impose proper
ordering.

Now, as for your functions, consider:

char ab[2];

Thread A:

atomic_store_rel( &ab[0] );
atomic_load_acq( &ab[1] );

Thread B:

atomic_load_acq( &ab[0] );
atomic_store_rel( &ab[1] );

How is this supposed to work on a processor that doesn't have atomic
byte access instructions? Even if it has, consider

char ab[2];

Thread A:

a[0] = 1;
.
.
.
atomic_load_acq( &ab[1] );

Thread B:

atomic_store_rel( &ab[1] );

regards,
alexander.

Chris Thomasson

unread,

Apr 12, 2005, 4:57:23 AM4/12/05

to

>> If add_ref_strong already provides the necessary memory barriers
>
> What for? add_ref_strong() is naked atomic.

I need to study the boost::shared_ptr implementation a bit more. I only
skimmed through it a while ago.

Peter Dimov

unread,

Apr 12, 2005, 5:54:45 AM4/12/05

to

Alexander Terekhov wrote:
> Peter Dimov wrote:
>> Do you mean that on some systems a 32 bit int aligned on a 32 bit
>> boundary may not allow atomic accesses? Or in more general terms, a
>> type T's natural alignment may not be sufficient to ensure atomicity?
>
> Show me first chapter and verb for "natural alignment" in C/C++ or
> even POSIX, please.

Search for "alignment" in C++03. It doesn't specifically say "natural"
because for the standard non-natural alignments do not exist and unaligned
accesses are undefined.

>> volatiles? Who said anything about volatiles?
>
> Well, you said: "Why wrap the thing in its own atomic<> type (I'm
> reminded of volatiles here) instead of just providing labeled
> access functions?"

Reminded of Java volatiles... where I need to change the type of the
variable in order to do atomic operations on it.

>> T atomic_load_acq( T * pt );
>> void atomic_store_rel( T * pt, T v );
>> // et al
>>
>> and ordinary non-volatile loads and stores for non-competing
>> accesses... what am I missing? "naked_competing" loads? Add
>> atomic_load_nsync, then.
>
> First off, an access can be either atomic or not atomic. Non-competing
> load accesses need not be atomic. Non-competing store accesses must be
> atomic (because it is non-competing with respect to stores, but it is
> "competing" with respect to loads), exclusive store or load accesses
> need not be atomic.

Hmm, I must be missing something. When I say "non-competing" I mean
"exclusive" in your terminology. The PLn models use the same definition for
"competing" - if there is a race.

> Competing accesses may need msync to impose proper ordering.
>
> Now, as for your functions, consider:
>
> char ab[2];
>
> Thread A:
>
> atomic_store_rel( &ab[0] );
> atomic_load_acq( &ab[1] );
>
> Thread B:
>
> atomic_load_acq( &ab[0] );
> atomic_store_rel( &ab[1] );
>
> How is this supposed to work on a processor that doesn't have atomic
> byte access instructions?

It isn't. It will fail to compile. The atomic_* family will work on int,
long, T* and PODs with sizeof(int), sizeof(long), sizeof(T*) and maybe with
2*sizeof(int) etc.

Why is atomic<char> useful? And how is atomic<p2> (where p2 is a struct of
two pointers) supposed to work without DWCAS? Contain a spinlock?

Peter Dimov

unread,

Apr 12, 2005, 8:00:36 AM4/12/05

to

Chris Thomasson wrote:
>> There seems to be some attraction in c.l.c++.m to smart pointers that
>> have almost no overhead, the deferred reference counting being
>> promoted by Andrei but there's no such thing a a free lunch. Deferred
>> reference counting does Boehm style GC against the thread
>> stacks and that requires a "stop the world while we GC" which has huge
>> performance implications in multi-threading.
>
> Actually, you mentioned a scheme that would allow for "deferred
> reference counting" using per-thread lock-free
> single-producer/consumer queues. Pushing a node ( reference
> adjustment request ) into a "local" per-thread queue is certainly
> less overhead than performing an interlocked-operation on a "shared"
> variable. I don't think you would need to use DWCAS either. The logic
> could be isolated in the "collection phase"...

Thread-local count update queues only work if you are willing to postpone
the destruction for an unspecified amount of time, IIUC.

p1 = 0; // thread 1

p2 = 0; // thread 2

Now if these were the only two references to *p1 and if thread 1 and thread
2 don't do anything smart pointer related in the next few hours, *p1 will be
kept alive.

And if you drop the deterministic destruction requirement, reference
counting is usually less efficient than other garbage collection
algorithms... not all of them need to stop the world (if my understanding is
correct).

I tried to come up with a deferred counting scheme where only non-essential
updates would be queued but the final transition from non-zero refs to zero
was always done "on time", but I failed. It might be possible, I haven't
studied the refcounting research much. But even if it is, it would probably
be less efficient on today's hardware for the typical shared_ptr uses in
C++, where most non-essential count updates are manually eliminated (or just
not introduced) by the programmer.

(Now that I think about it, it may be possible to use such a scheme for
local_ptr, which is supposed to never be shared between threads, but not for
shared_ptr, which can be shared if accesses are properly synchronized.)

Alexander Terekhov

unread,

Apr 12, 2005, 12:54:25 PM4/12/05

to

Peter Dimov wrote:
>
> Alexander Terekhov wrote:
> > Peter Dimov wrote:
> >> Do you mean that on some systems a 32 bit int aligned on a 32 bit
> >> boundary may not allow atomic accesses? Or in more general terms, a
> >> type T's natural alignment may not be sufficient to ensure atomicity?
> >
> > Show me first chapter and verb for "natural alignment" in C/C++ or
> > even POSIX, please.
>
> Search for "alignment" in C++03. It doesn't specifically say "natural"
> because for the standard non-natural alignments do not exist and unaligned
> accesses are undefined.

Would you please quote the relevant passage(s)?

>
> >> volatiles? Who said anything about volatiles?
> >
> > Well, you said: "Why wrap the thing in its own atomic<> type (I'm
> > reminded of volatiles here) instead of just providing labeled
> > access functions?"
>
> Reminded of Java volatiles... where I need to change the type of the
> variable in order to do atomic operations on it.

Please elborate.

>
> >> T atomic_load_acq( T * pt );
> >> void atomic_store_rel( T * pt, T v );
> >> // et al
> >>
> >> and ordinary non-volatile loads and stores for non-competing
> >> accesses... what am I missing? "naked_competing" loads? Add
> >> atomic_load_nsync, then.
> >
> > First off, an access can be either atomic or not atomic. Non-competing
> > load accesses need not be atomic. Non-competing store accesses must be
> > atomic (because it is non-competing with respect to stores, but it is
> > "competing" with respect to loads), exclusive store or load accesses
> > need not be atomic.
>
> Hmm, I must be missing something. When I say "non-competing" I mean
> "exclusive" in your terminology. The PLn models use the same definition for

PLn models say that "two operations conflict if they are to the same
location and at least one of them is a write" and that "given a pair
of conflicting operations u and v in an execution, operation u
competes with operation v if and only if there is no ordering chain
(as defined above) between u and v (either from u to v or vice versa).
An operation u is a competing operation if and only if there exists
at least one conflicting operation v in this execution that competes
with u. Otherwise, the operation is a non-competing operation."

> "competing" - if there is a race.

And what I meant was that for loads, "competing" means that it can
collide with other store. For stores, "competing" means that it can
collide with other store, "non-competing" means that it can collide
(and survive ;-) ) with other load (but not store), and "exclusive"
means no collision at all.

>
> > Competing accesses may need msync to impose proper ordering.
> >
> > Now, as for your functions, consider:
> >
> > char ab[2];
> >
> > Thread A:
> >
> > atomic_store_rel( &ab[0] );
> > atomic_load_acq( &ab[1] );
> >
> > Thread B:
> >
> > atomic_load_acq( &ab[0] );
> > atomic_store_rel( &ab[1] );
> >
> > How is this supposed to work on a processor that doesn't have atomic
> > byte access instructions?
>
> It isn't. It will fail to compile. The atomic_* family will work on int,
> long, T* and PODs with sizeof(int), sizeof(long), sizeof(T*) and maybe with
> 2*sizeof(int) etc.
>
> Why is atomic<char> useful?

Well, neighboring ints can also be manupulated by one "wide"
instruction (see the other case). Without atomic<>, that is. ;-)

> And how is atomic<p2> (where p2 is a struct of
> two pointers) supposed to work without DWCAS? Contain a spinlock?

It will fail to compile. atomic_pair<T> might work, though.

regards,
alexander.

Joe Seigh

unread,

Apr 12, 2005, 1:18:07 PM4/12/05

to

On Mon, 11 Apr 2005 19:25:02 -0700, Chris Thomasson <_no_damn_spam_cristom@_no_damn_comcast.net_spam> wrote:

>
>
>
>
>> There seems to be some attraction in c.l.c++.m to smart pointers that
>> have almost no overhead, the deferred reference counting being
>> promoted by Andrei but there's no such thing a a free lunch. Deferred
>> reference counting does Boehm style GC against the thread stacks and
>> that requires a "stop the world while we GC" which has huge performance
>> implications in multi-threading.
>
> Actually, you mentioned a scheme that would allow for "deferred reference
> counting" using per-thread lock-free single-producer/consumer queues.
> Pushing a node ( reference adjustment request ) into a "local" per-thread
> queue is certainly less overhead than performing an interlocked-operation on
> a "shared" variable. I don't think you would need to use DWCAS either. The
> logic could be isolated in the "collection phase"...
>
> There is just one problem, if the collection cycles could not keep up with
> user allocs/frees the applications memory usage could climb out of control.
>
>

The deferred reference counting mentioned by Andrei seems to be a different
scheme. I think there's a way to solve most of the problems here. I'll
just add it to my list of things I don't have time to do. Actually, I
have time, just not time to do pointless things. I mean, just how
many more GC schemes do *I* need?

--
Joe Seigh

David Hopwood

unread,

Apr 12, 2005, 2:15:58 PM4/12/05

to

Alexander Terekhov wrote:
> Peter Dimov wrote:

>>Alexander Terekhov wrote:
>>
>>Do you mean that on some systems a 32 bit int aligned on a 32 bit boundary
>>may not allow atomic accesses? Or in more general terms, a type T's natural
>>alignment may not be sufficient to ensure atomicity?
>
> Show me first chapter and verb for "natural alignment" in C/C++ or
> even POSIX, please.

"Natural alignment" is defined by the ABI of each platform, not by the
language or by POSIX. But it *is* defined.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Peter Dimov

unread,

Apr 12, 2005, 3:36:44 PM4/12/05

to

Alexander Terekhov wrote:
> Peter Dimov wrote:
>> Search for "alignment" in C++03. It doesn't specifically say
>> "natural" because for the standard non-natural alignments do not
>> exist and unaligned accesses are undefined.
>
> Would you please quote the relevant passage(s)?

3.8/1

"The lifetime of an object of type T begins when:
- storage with the proper alignment and size for type T is obtained, and
- if T is a class type with a non-trivial constructor (12.1), the
constructor call has completed."

3.9/5

"Object types have alignment requirements (3.9.1, 3.9.2). The alignment of a
complete object type is an implementation-defined integer value representing
a number of bytes; an object is allocated at an address that meets the
alignment requirements of its object type."

>>>> volatiles? Who said anything about volatiles?
>>>
>>> Well, you said: "Why wrap the thing in its own atomic<> type (I'm
>>> reminded of volatiles here) instead of just providing labeled
>>> access functions?"
>>
>> Reminded of Java volatiles... where I need to change the type of the
>> variable in order to do atomic operations on it.
>
> Please elborate.

I meant that on a specific platform, some types can be read/written/updated
atomically, and some cannot. But the requirement to "tag" objects that will
participate in such operations, whether with "volatile", or with atomic<>,
doesn't seem natural or necessary to me.

>> Hmm, I must be missing something. When I say "non-competing" I mean
>> "exclusive" in your terminology. The PLn models use the same
>> definition for
>
> PLn models say that "two operations conflict if they are to the same
> location and at least one of them is a write" and that "given a pair
> of conflicting operations u and v in an execution, operation u
> competes with operation v if and only if there is no ordering chain
> (as defined above) between u and v (either from u to v or vice versa).
> An operation u is a competing operation if and only if there exists
> at least one conflicting operation v in this execution that competes
> with u. Otherwise, the operation is a non-competing operation."
>
>> "competing" - if there is a race.
>
> And what I meant was that for loads, "competing" means that it can
> collide with other store. For stores, "competing" means that it can
> collide with other store, "non-competing" means that it can collide
> (and survive ;-) ) with other load (but not store), and "exclusive"
> means no collision at all.

Right. I was using PLn terminology, where stores that compete with loads are
"competing".

>> Why is atomic<char> useful?
>
> Well, neighboring ints can also be manupulated by one "wide"
> instruction (see the other case). Without atomic<>, that is. ;-)

I don't think that such platforms are in widespread use (unless IA64 or
x86-64 Windows compilers do 64 bit stores for ints... I doubt it), but if
they do, atomics for ints won't be defined there. Such an int runs counter
to the spirit of C, though. Stores on ints are not supposed to be carried
out in "double precision", so to speak. Chars may do that, this is well
known.

Anyway, I see your point. atomic<> can be useful.

Joe Seigh

unread,

Apr 13, 2005, 7:51:14 AM4/13/05

to

On Tue, 12 Apr 2005 13:18:07 -0400, Joe Seigh <jsei...@xemaps.com> wrote:

>>
> The deferred reference counting mentioned by Andrei seems to be a different
> scheme. I think there's a way to solve most of the problems here. I'll
> just add it to my list of things I don't have time to do. Actually, I
> have time, just not time to do pointless things. I mean, just how
> many more GC schemes do *I* need?
>

On further note, I think the major issue here is expectations. That is
where the learning curve comes in. Unless you know what the trade offs
are, it's difficult to keep expectations reasonable. There's probably
an optimal solution for all this, but there's no point in implementing
it if everyone believes there is some magical solution for all this.

--
Joe Seigh

Peter Dimov

unread,

Apr 13, 2005, 8:41:01 AM4/13/05

to

Joe Seigh wrote:

> The deferred reference counting mentioned by Andrei seems to be a
> different scheme. I think there's a way to solve most of the problems
> here. I'll just add it to my list of things I don't have time to do.

Deferred reference counting, as most of the research, is a "Java-centric"
scheme to deal with non-essential count updates.

"Java-centric" in the sense that its primary target is a fully collected
language.

The problem with counting collectors is that these languages generate lots
of count updates each time a reference is copied or "destroyed".

This is much, much less of a problem in C++, where most local references to
counted objects are ordinary C++ references or pointers and incur no
overhead.

I don't believe that C++ code can benefit from a traditional deferred
counting scheme (which avoids counting the stack and the registers and only
counts heap-allocated references.)

Your local_ptr can possibly benefit from a deferring the counting somewhat,
but I'm not sure whether it will be worth doing.

Weighted reference counting is another interesting algorithm (on paper). I
think that it is also not worth doing. The idea there is to keep a local
count (weight) in each pointer. The pointer starts with 65536 (say) and on
each copy the weight is shared between the source and the copy. When a
pointer is destroyed its weight is subtracted from the reference count (its
initial value is also 65536).

The reason I believe that this scheme is not a gain is that it eliminates
count increments at the expense of an additional integer in every pointer.
But count increments are "naked" and in the uncontended case are almost as
fast as ordinary increments; and the additional size increase in every
pointer will likely decrease the overall performance of the application.
(The problem with the copy mutating the source aside for a moment.)

> Actually, I have time, just not time to do pointless things. I mean, just
> how many more GC schemes do *I* need?

From a C++ viewpoint, I don't believe that you need anything else beyond
(atomic|shared)_ptr-style counting and hazard pointers. But of course I may
be wrong. ;-)

Alexander Terekhov

unread,

Apr 13, 2005, 10:40:08 AM4/13/05

to

Peter Dimov wrote:

[... weighted reference counting ...]

> The reason I believe that this scheme is not a gain is that it eliminates
> count increments at the expense of an additional integer in every pointer.

It does bring contention down a bit (at expense of extra space),
but you still need (naked) atomic operation. Consider: (basic
thread-safety)

weighted_reference_counting_ptr w;

Thread A:

weighted_reference_counting_ptr a(w);

Thread B:

weighted_reference_counting_ptr b(w);

Both threads need to update w's weight, oder?

regards,
alexander.

David Hopwood

unread,

Apr 13, 2005, 10:47:15 AM4/13/05

to

Peter Dimov wrote:
> Joe Seigh wrote:
>
>>The deferred reference counting mentioned by Andrei seems to be a
>>different scheme. I think there's a way to solve most of the problems
>>here. I'll just add it to my list of things I don't have time to do.
>
> Deferred reference counting, as most of the research, is a "Java-centric"
> scheme to deal with non-essential count updates.
>
> "Java-centric" in the sense that its primary target is a fully collected
> language.

ITYM "anything except C/C++".

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Joe Seigh

unread,

Apr 13, 2005, 11:32:25 AM4/13/05

to

On Wed, 13 Apr 2005 15:41:01 +0300, Peter Dimov <pdi...@gmail.com> wrote:

> Joe Seigh wrote:
...

> I don't believe that C++ code can benefit from a traditional deferred
> counting scheme (which avoids counting the stack and the registers and only
> counts heap-allocated references.)
>
> Your local_ptr can possibly benefit from a deferring the counting somewhat,
> but I'm not sure whether it will be worth doing.
>
> Weighted reference counting is another interesting algorithm (on paper). I
> think that it is also not worth doing. The idea there is to keep a local
> count (weight) in each pointer. The pointer starts with 65536 (say) and on
> each copy the weight is shared between the source and the copy. When a
> pointer is destroyed its weight is subtracted from the reference count (its
> initial value is also 65536).

To make pointers copyable you have to divide up the weight somehow. If you
divide by two, after as little as 16 copies, you resulting pointer becomes
uncopyable. There's another weighted reference counting scheme that uses
infinite precision fractions for the weight. It keeps extending the precision
at the expense of space boundedness.

atomic_ptr/local_ptr is a modified weighted reference counting scheme. local_ptr
looks a lot like pure weighted reference counting.

>
>> Actually, I have time, just not time to do pointless things. I mean, just
>> how many more GC schemes do *I* need?
>
> From a C++ viewpoint, I don't believe that you need anything else beyond
> (atomic|shared)_ptr-style counting and hazard pointers. But of course I may
> be wrong. ;-)
>

And RCU of course. :)

And it depends on what trade-offs you want to make. I have other unimplemented
schemes I can use if I get into a situation where I need a different set of
trade-offs.

--
Joe Seigh

Peter Dimov

unread,

Apr 13, 2005, 11:32:24 AM4/13/05

to

Yes (I posted something along these lines in clc++m).

Peter Dimov

unread,

Apr 13, 2005, 11:53:50 AM4/13/05

to

Joe Seigh wrote:
> On Wed, 13 Apr 2005 15:41:01 +0300, Peter Dimov <pdi...@gmail.com>
> wrote:
>> Joe Seigh wrote:
> ...
>> I don't believe that C++ code can benefit from a traditional deferred
>> counting scheme (which avoids counting the stack and the registers
>> and only counts heap-allocated references.)
>>
>> Your local_ptr can possibly benefit from a deferring the counting
>> somewhat, but I'm not sure whether it will be worth doing.
>>
>> Weighted reference counting is another interesting algorithm (on
>> paper). I think that it is also not worth doing. The idea there is
>> to keep a local count (weight) in each pointer. The pointer starts
>> with 65536 (say) and on each copy the weight is shared between the
>> source and the copy. When a pointer is destroyed its weight is
>> subtracted from the reference count (its initial value is also
>> 65536).
>
> To make pointers copyable you have to divide up the weight somehow. If you
> divide by two, after as little as 16 copies, you resulting
> pointer becomes uncopyable.

I think that when you need to copy a source with weight 1, you can just
increment the total weight by 65536 and create a weighted_ptr with a weight
65536, without modifying the source. Or you could also increase the source's
weight to 65536 and increment the total weight by 65536+65535.

> There's another weighted reference counting scheme that uses infinite
> precision fractions for the weight. It keeps extending the precision at
> the expense of space boundedness.

Seems like overkill to me. ;-)

> atomic_ptr/local_ptr is a modified weighted reference counting
> scheme. local_ptr looks a lot like pure weighted reference counting.

From looking at the source, local_ptr doesn't seem weighted to me. There is
no weight member and copies just increment a shared reference count (the
ephemeral count). But maybe I'm not looking at the latest version (the one
on atomic-ptr-plus.sf.net is 0.0.2).

Joe Seigh

unread,

Apr 13, 2005, 1:38:23 PM4/13/05

to

Copying an atomic_ptr to a local ptr is a tranfer of 1 weighted reference
count without updating the central reference count. local_ptr orginally
weren't copyable since they only have a weight of 1 but I added some logic
from atomic_ptr to atomic_ptr copying, it's just not as efficient. The
extra logic obfuscates the weighted ptr logic somewhat so it's hard to
see.

--
Joe Seigh

Peter Dimov

unread,

Apr 13, 2005, 2:33:52 PM4/13/05

to

I see it now, thank you; there is an ecount in atomic_ptr. But even though I
understand what the code does, I'm still puzzled as to what this scheme buys
you.

Joe Seigh

unread,

Apr 13, 2005, 3:20:43 PM4/13/05

to

Atomicity, i.e. strong thread safety.

--
Joe Seigh

Joe Seigh

unread,

Apr 13, 2005, 9:33:54 PM4/13/05

to

I was going to say it is probably the only working atomic ptr out there.
If you google for "Lock-free Reference Counting" you'll find a paper on
on an implementation that requires an instruction no longer in existence.
However google dragged up an article in the Dec 2004 CUJ titled "Atomic
Reference Counting Pointers" by William K. Reinholtz which going by the
code examples appears to be the same as the PPC version of atomic_ptr.
I don't have access to the article itself so I don't know where he got
the algorithm. Anyone know?

--
Joe Seigh

Peter Dimov

unread,

Apr 14, 2005, 7:01:08 AM4/14/05

to

Joe Seigh wrote:
> I was going to say it is probably the only working atomic ptr out
> there. If you google for "Lock-free Reference Counting" you'll find a
> paper on on an implementation that requires an instruction no longer in
> existence. However google dragged up an article in the Dec 2004 CUJ
> titled "Atomic Reference Counting Pointers" by William K. Reinholtz
> which going by the code examples appears to be the same as the PPC
> version of atomic_ptr. I don't have access to the article itself so I
> don't know where he got
> the algorithm. Anyone know?

Since it's in CUJ, you should probably ask in clc++m. William K. (Kirk)
Reinholtz has several RT-related papers, so if he doesn't give you credit in
the article, he probably invented it.

By looking at

ftp://ftp.cuj.com/pub/2004/cujdec2004.zip

I'd say that it doesn't have much in common with your atomic_ptr. Macros vs
templates and the definition of CAS aside, atomic_ptr is non-intrusive,
Reinholtz's 'rcp' is intrusive; it relies on a cntr_ member in the pointee
to keep the count. It's like boost::intrusive_ptr with an atomic exchange in
the 'swap' member, except with most of the code inlined and some
over-atomicity in the constructors and the destructor. No match for your
atomic_ptr, IMO.

Joe Seigh

unread,

Apr 14, 2005, 7:21:11 AM4/14/05

to

The ppc version of atomic_ptr uses different interlocked logic since ppc
doesn't have double wide CAS. The template vs. macros and intrusive
vs. non-intrusive are just details. It's the underlying algorithms that
matter. But I'll ask in clc++m.

--
Joe Seigh

Peter Dimov

unread,

Apr 14, 2005, 8:08:55 AM4/14/05

to

Joe Seigh wrote:
> The ppc version of atomic_ptr uses different interlocked logic since
> ppc doesn't have double wide CAS.

Is your PPC version available?

> The template vs. macros and intrusive vs. non-intrusive are just details.

I'm not sure. It's the details that make your atomic_ptr unique. A
straightforward smart pointer with an atomic swap() seems pretty trivial,
even I was able to "invent" it once (with the same lack of memory barriers
as in Reinholz's code).

Joe Seigh

unread,

Apr 14, 2005, 8:48:32 AM4/14/05

to

On Thu, 14 Apr 2005 15:08:55 +0300, Peter Dimov <pdi...@gmail.com> wrote:

> Joe Seigh wrote:
>> The ppc version of atomic_ptr uses different interlocked logic since
>> ppc doesn't have double wide CAS.
>
> Is your PPC version available?

On http://sourceforge.net/projects/atomic-ptr-plus/ in the atomic_ptr
package there's a ppc32 version available. I didn't get a Mac until
recently, so I wasn't able to publish a working implementation of the
algorithm until now.

>
>> The template vs. macros and intrusive vs. non-intrusive are just details.
>
> I'm not sure. It's the details that make your atomic_ptr unique. A
> straightforward smart pointer with an atomic swap() seems pretty trivial,
> even I was able to "invent" it once (with the same lack of memory barriers
> as in Reinholz's code).
>

The atomic swap is one part. The other part is safely incrementing the
reference count without "owning" it first. There's the DWCAS solution,
the PPC solution, and various GC based solutions using RCU, SMR, etc...
And of course the 2CAS solution which only works on MC68020 and MC68030
processors.

--
Joe Seigh

Peter Dimov

unread,

Apr 14, 2005, 9:37:56 AM4/14/05

to

Joe Seigh wrote:

> The ppc version of atomic_ptr uses different interlocked logic since
> ppc doesn't have double wide CAS.

I thought it had ldarx/stdcx.?

Chris Thomasson

unread,

Apr 14, 2005, 9:45:25 AM4/14/05

to

>> The ppc version of atomic_ptr uses different interlocked logic since
>> ppc doesn't have double wide CAS.
>
> I thought it had ldarx/stdcx.?

You use ll/sc to create atomic primitives.

Peter Dimov

unread,

Apr 14, 2005, 9:53:13 AM4/14/05

to

Joe Seigh wrote:
> The atomic swap is one part. The other part is safely incrementing
> the reference count without "owning" it first. There's the DWCAS
> solution, the PPC solution, and various GC based solutions using RCU,
> SMR, etc... And of course the 2CAS solution which only works on
> MC68020 and MC68030 processors.

You mean the race between

atomic_ptr<T> p2( p1 );

and

p1 = 0;

?

I think that you have a bug in your PPC implementation, but I may be missing
something. In inc_ref, you reload the pointer and check for change, however,
the pointer is a local function argument (int* refcnt) and will never
change.

And you are right that Reinholz uses the same algorithm, except he passes an
int** (the offset hack aside).

Chris Thomasson

unread,

Apr 14, 2005, 9:56:32 AM4/14/05

to

Arggrhrgh. I read lwarx/stwcx, did not notice the doubleword version. Sorry!

Anyway, I think that lwarx/stwcx and ldarx/stdcx "both" work on 64-bit cpu,
work on 64-bit values... I am not totally sure.

Joe Seigh

unread,

Apr 14, 2005, 10:13:47 AM4/14/05

to

In 64 bit mode they're not double wide.

--
Joe Seigh

Alexander Terekhov

unread,

Apr 14, 2005, 10:30:11 AM4/14/05

to

Peter Dimov wrote:
[...]

> Since it's in CUJ, you should probably ask in clc++m. William K. (Kirk)
> Reinholtz has several RT-related papers, so if he doesn't give you credit in
> the article, he probably invented it.

He wrote in the article that it took him several days when kids were
out of house and he was relaxing on the couch enjoying the silence.

Yoga Meister. ;-)

>
> By looking at
>
> ftp://ftp.cuj.com/pub/2004/cujdec2004.zip

Looks totally busted.

regards,
alexander.

Joe Seigh

unread,

Apr 14, 2005, 11:06:38 AM4/14/05

to

On Thu, 14 Apr 2005 16:53:13 +0300, Peter Dimov <pdi...@gmail.com> wrote:

> Joe Seigh wrote:
>> The atomic swap is one part. The other part is safely incrementing
>> the reference count without "owning" it first. There's the DWCAS
>> solution, the PPC solution, and various GC based solutions using RCU,
>> SMR, etc... And of course the 2CAS solution which only works on
>> MC68020 and MC68030 processors.
>
> You mean the race between
>
> atomic_ptr<T> p2( p1 );
>
> and
>
> p1 = 0;
>
> ?

Yes.

>
> I think that you have a bug in your PPC implementation, but I may be missing
> something. In inc_ref, you reload the pointer and check for change, however,
> the pointer is a local function argument (int* refcnt) and will never
> change.
>
> And you are right that Reinholz uses the same algorithm, except he passes an
> int** (the offset hack aside).
>
>

You're right, that's a bug. Good catch. It should be int **. I think I'll throw
in a extra temp register. I don't trust GCC to form the memory operand
for that correctly. As far as the offset goes, I hardcoded the offset
in my version since I have a non-intrusive refcount and I control the
format and there didn't seem to be an offset_of macro around. It's
not really a finished version, just a pre-beta proof of concept so
there's some quick and dirty stuff in it still.

--
Joe Seigh

Chris Thomasson

unread,

Apr 14, 2005, 11:28:34 AM4/14/05

to

> I think that you have a bug in your PPC implementation, but I may be
> missing
> something. In inc_ref, you reload the pointer and check for change,
> however,
> the pointer is a local function argument (int* refcnt) and will never
> change.

Bingo! Good eye.

Peter Dimov

unread,

Apr 14, 2005, 11:36:28 AM4/14/05

to

Joe Seigh wrote:
> As far as the offset goes, I hardcoded the offset
> in my version since I have a non-intrusive refcount and I control the
> format and there didn't seem to be an offset_of macro around.

There should be an 'offsetof' in stddef.h.

Thanks for the explanations, BTW.

Alexander Terekhov

unread,

Apr 14, 2005, 11:51:48 AM4/14/05

to

Peter Dimov wrote:
>
> Joe Seigh wrote:
> > As far as the offset goes, I hardcoded the offset
> > in my version since I have a non-intrusive refcount and I control the
> > format and there didn't seem to be an offset_of macro around.
>
> There should be an 'offsetof' in stddef.h

Can you post the pseudo code for that "atomic_ptr"/PPC thing? (Joe
sources are unreadable to me. ;-) )

Does it try to access (via stwcx) reclaimed memory or not?

regards,
alexander.

Chris Thomasson

unread,

Apr 14, 2005, 12:03:29 PM4/14/05

to

> Can you post the pseudo code for that "atomic_ptr"/PPC thing? (Joe
> sources are unreadable to me. ;-) )
>
> Does it try to access (via stwcx) reclaimed memory or not?

I believe it can.

Chris Thomasson

unread,

Apr 14, 2005, 12:14:44 PM4/14/05

to

> Does it try to access (via stwcx) reclaimed memory or not?

http://groups.google.ca/groups?hl=en&lr=&selm=7I0Ea.854894%24OV.808546%40rwcrnsc54

Alexander Terekhov

unread,

Apr 14, 2005, 12:39:31 PM4/14/05

to

Chris Thomasson wrote:
>
> > Does it try to access (via stwcx) reclaimed memory or not?
>
> http://groups.google.ca/groups?hl=en&lr=&selm=7I0Ea.854894%24OV.808546%40rwcrnsc54

Grr. CAS64 aside for a moment, Joe says (in that "refcount in object"
thread) "... decrement refcount to zero, and then destroy object. If
it's after the load locked, the reservation is lost." Reservation is
lost, that for sure. However, store-conditional can raise all sorts
of "bad" exceptions (if the EA is "not good" by that time) even if
reservation is lost. It's implementation-defined on Power, IIRC.

regards,
alexander.

Alexander Terekhov

unread,

Apr 14, 2005, 12:54:12 PM4/14/05

to

Alexander Terekhov wrote:
>
> Chris Thomasson wrote:
> >
> > > Does it try to access (via stwcx) reclaimed memory or not?
> >
> > http://groups.google.ca/groups?hl=en&lr=&selm=7I0Ea.854894%24OV.808546%40rwcrnsc54
>
> Grr. CAS64 aside for a moment, Joe says (in that "refcount in object"

It can go boom on LL/LR step too.

> thread) "... decrement refcount to zero, and then destroy object. If
> it's after the load locked, the reservation is lost." Reservation is
> lost, that for sure. However, store-conditional can raise all sorts
> of "bad" exceptions (if the EA is "not good" by that time) even if
> reservation is lost. It's implementation-defined on Power, IIRC.

Ok, Joe said that "It needs the virtual storage to always be valid so
the load locked won't segfault."

Not sexy.

regards,
alexander.

Joe Seigh

unread,

Apr 14, 2005, 1:51:55 PM4/14/05

to

Well, we take what we can get. When you get lemons, you make lemonade.
When you get hardware, you make software. BTW, it's the same restriction
lock-free LIFO stack using DWCAS w/ version counts has. Not sexy but
that doesn't stop anyone from using it.

Original post of algorihm was here
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=1998Jun23.155614%40bose.com
if you've forgotten. I did once.

Algorithm works by setting the reserve before the verifying pointer load.
If the pointer hasn't changed and the reserve hasn't been lost, then you
have a valid reserve on a non-zero reference count.

--
Joe Seigh

Peter Dimov

unread,

Apr 14, 2005, 5:42:08 PM4/14/05

to

Joe Seigh wrote:
> On Thu, 14 Apr 2005 16:53:13 +0300, Peter Dimov <pdi...@gmail.com>
> wrote:
>> Joe Seigh wrote:
>>> The atomic swap is one part. The other part is safely incrementing
>>> the reference count without "owning" it first. There's the DWCAS
>>> solution, the PPC solution, and various GC based solutions using
>>> RCU, SMR, etc... And of course the 2CAS solution which only works on
>>> MC68020 and MC68030 processors.
>>
>> You mean the race between
>>
>> atomic_ptr<T> p2( p1 );
>>
>> and
>>
>> p1 = 0;
>>
>> ?
>
> Yes.

I must be missing something because I think this can be fixed easily:

atomic_ptr<>::assign( atomic_ptr & other ) // other is unshared

atomic_ptr temp( *this );

atomically set *this to other
set other to empty

// destroy temp

The 'atomically set *this to other' is hard for shared_ptr because it stores
two pointers, but not a problem for a single pointer implementation.

operator=( atomic_ptr const & other ) is of course

atomic_ptr temp( other );
this->assign( other );

What am I missing?

Joe Seigh

unread,

Apr 14, 2005, 6:20:57 PM4/14/05

to

It's not broke at least as far as atomic_ptr is concerned. As long as you
have a thread-safe copy ctor and an atomic swap, it's straight forward.

atomic_ptr & operator=(atomic_ptr<T> & src) {
atomic_ptr<T> temp(src);
swap(temp);
return *this;
}

Swap atomically sets *this to the new value and temp to the old value. When
temp goes out of scope, it dtors the old value. The tricky stuff
is in the copy ctor and depends on what trick you're using to
safely increment the reference count.

I don't use const on the src because some implementations change the
source ptr and there's memory barriers so your not getting much
optmization anyway.

It also can be this which I didn't use because vc++ gave me problems.
Pass by value copies the source so it's equivalent to above.

atomic_ptr & operator=(atomic_ptr<T> src) {
swap(src);
return *this;
}

This is pretty generic operator=. You could use any type with
copy ctor and swap method here.

But like I said, the tricky bits are elsewhere and there it's
more of an algorithm rather than a c++ thing.

--
Joe Seigh

Peter Dimov

unread,

Apr 14, 2005, 6:52:19 PM4/14/05

to

The point is that you no longer need tricks in the copy constructor. The
original race is only a problem when the count is 1 so that you can have the
assignment destroy the count and then the copy constructor attempt to
increment it.

The modified version avoids this because the count is always at least two at
the race because of 'temp'.

Joe Seigh

unread,

Apr 14, 2005, 7:24:04 PM4/14/05

to

You have to be able to safely increment the refcount from 1 to 2 in order to
"create" the temp. The memory containing the refcount can be reclaimed between
the time you load the address of the refcount and when you attempt to increment
the refcount. You can't go by what the refcount looks like at the time you
attempt to increment it since it could be undefined, meaning it could look like
a valid refcount but actually be something else.

--
Joe Seigh

Joe Seigh

unread,

Apr 14, 2005, 8:35:53 PM4/14/05

to

On Thu, 14 Apr 2005 16:53:13 +0300, Peter Dimov <pdi...@gmail.com> wrote:

>
> I think that you have a bug in your PPC implementation, but I may be missing
> something. In inc_ref, you reload the pointer and check for change, however,
> the pointer is a local function argument (int* refcnt) and will never
> change.
>

I put a fix changing int * refcnt to int ** refcnt. It's out as
atomic_ptr_0-0-1-ppc32.tar.gz.

--
Joe Seigh

Chris Thomasson

unread,

Apr 14, 2005, 9:59:42 PM4/14/05

to

> The point is that you no longer need tricks in the copy constructor. The
> original race is only a problem when the count is 1 so that you can have
> the
> assignment destroy the count and then the copy constructor attempt to
> increment it.

No. The race-condition occurs when a thread that does not "previously" own a
reference tries to increment the count. shared_ptr overcomes this
race-condition by "requiring" that a thread own a reference in order to make
a copy...

Peter Dimov

unread,

Apr 15, 2005, 5:37:41 AM4/15/05

to

Chris Thomasson wrote:
>> The point is that you no longer need tricks in the copy constructor.
>> The original race is only a problem when the count is 1 so that you
>> can have the
>> assignment destroy the count and then the copy constructor attempt to
>> increment it.
>
> No.

Yes, it's broken. A non-tricky copy constructor cannot be made
race-resistant, no matter how I fiddle with the assignment. Its thread can
always be preempted after it loads the pointer for just enough time for some
other thread to destroy the count.

Alexander Terekhov

unread,

Apr 15, 2005, 6:30:01 AM4/15/05

to

Joe Seigh wrote:
[...]

> Original post of algorihm was here
> http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=1998Jun23.155614%40bose.com
> if you've forgotten.

I'm here since 2001. Not hard to "forget" something that occurred in
c.p.t.-1998. Anyway, intrusive approach is no-no here (from general
safety POV) and I'd keep hold on control blocks (reuse it internally
not reclaiming memory) for the sake of safe sex, so to speak.

regards,
alexander.

Joe Seigh

unread,

Apr 15, 2005, 7:37:11 AM4/15/05

to

That's all right. I forgot it also. :)
http://groups.google.com/groups?selm=3EE0769C...@xemaps.com

As far as memory safety, you can't use it to point to things like shared
segments but it's as safe as some other lock-free algorithms.

If you could add instructions to the architecture, you could add a
"set reserve" instruction along with a load conditional and fix
the side effect thing with stwarx. But if you could add instructions
I'd be adding a lot more useful and cooler instructions.

--
Joe Seigh

Chris Thomasson

unread,

Apr 15, 2005, 9:23:41 PM4/15/05

to

>>> The point is that you no longer need tricks in the copy constructor.
>>> The original race is only a problem when the count is 1 so that you
>>> can have the
>>> assignment destroy the count and then the copy constructor attempt to
>>> increment it.
>>
>> No.

>
> Yes, it's broken.

The race can occur even if count > 1. Lets say the count is 3, the
"increment thread" could be preempted and the three threads that own
references could be allowed to run, drop to zero and destroy the count. Then
the poor increment thread runs again and wrecks everything!

:O

> A non-tricky copy constructor cannot be made
> race-resistant, no matter how I fiddle with the assignment. Its thread can
> always be preempted after it loads the pointer for just enough time for
> some
> other thread to destroy the count.

Yup. :)

Chris Thomasson

unread,

Apr 15, 2005, 10:12:28 PM4/15/05

to

> A non-tricky copy constructor cannot be made
> race-resistant, no matter how I fiddle with the assignment.

Here is a "tricky" way to increment a reference count wrt strong thread
safety:

< snippet from my library >

static AC_INLINE void AC_APIDECL
ac_i686_lfgc_refcount_load
( ac_thread_t *_tls,
ac_i686_lfgc_refcount_t **pdest,
ac_i686_lfgc_refcount_t **psrc )
{
register ac_intword_t refs, refs_cmp;
register ac_lfgc_smr_hazard_t hazard;
register ac_i686_lfgc_refcount_t *src, *old = *pdest;

/* grab tls and hazard pointer */
if ( ! _tls ) { _tls = ac_thread_self(); };
hazard = ac_lfgc_smr_get( _tls, 0 );

for( ;; )
{
/* expensive smr load */
src = (ac_i686_lfgc_refcount_t*)
ac_i686_lfgc_smr_activate
( hazard,
(ac_i686_node_t**)psrc );
if ( ! src ) { break; }

/* addref if greater than zero */
refs = src->refs;

while( ( refs & 0x7FFFFFFF ) > 0 )
{
refs_cmp = refs;

refs = ac_atomic_cas_acquire
( &src->refs,
refs_cmp,
refs_cmp + 1 );

if ( refs == refs_cmp )
{
/* we now own a reference to src */
ac_i686_lfgc_smr_deactivate( hazard );
goto ac_i686_lfgc_refcount_load_ok;
}
}
}

/* store src and release the old */
ac_i686_lfgc_refcount_load_ok:
ac_mb_storeptr_release( pdest, src );
ac_i686_lfgc_refcount_release( old );
}

If you want full source check out the following files my AppCore library
uses:

ac_i686.h
ac_i686_lfgc_smr.h
ac_i686_lfgc_refcount.h
ac_smr.hpp

ac_i686.c
ac_i686_lfgc_smr.c
ac_i686_gcc.asm

http://appcore.home.comcast.net/appcore_4_1_2005.zip
( full source to AppCore. )

Any questions regarding my implementation of SMR and/or single-word atomic
reference counting are welcome and encouraged...

:)

Chris Thomasson

unread,

Apr 16, 2005, 12:22:01 AM4/16/05

to

> Well, we take what we can get. When you get lemons, you make lemonade.
> When you get hardware, you make software.

LOL!

<hardware rant>
Isn't assembly language great! Its basically created by experienced hard
working hardware folk who "seem" to think an efficient implementation of TAS
should be "enough" for all types of threaded programming. I could picture a
hardware guy thinking that DWCAS is basically only useful for 32-bit
programs that want to have 64-bit atomic ops, to be 64-bit ready so to
speak...

They give the ability for advanced lock-free algorithms on 32-bit and did
not event seem to realize it. That can "possibly" be a real reason that
would account for the fact that DWCAS is *not "reliably" ported to 64-bit
systems.
</hardware rant>

* It is on Intel extensions, and "some" AMD's; Per-cpu "features", damn...

;)

<rant2>
What the heck was that cmp8xchg16 thing all about!
</rant2>

Chris Thomasson

unread,

Apr 16, 2005, 1:03:14 AM4/16/05

to

>> Ok, Joe said that "It needs the virtual storage to always be valid so
>> the load locked won't segfault."
>>
>> Not sexy.
>>

[...]

> BTW, it's the same restriction lock-free LIFO stack using DWCAS w/ version
> counts has.

So, atomic_ptr for PPC-64 needs to have some sort of static/tracked storage
wrt atomic_ptr_ref's? You can get around all of that by allocating static or
"collected" arrays of atomic_ptr_ref's. Then you would use the index to
atomic_ptr_ref as the "pointer" to it.

Here is some quick sketch for a 32-bit solution of a CAS based atomic_ptr
using win32 atomic ops:

#define BLOCK_DEPTH ( USHRT_MAX )
#define BLOCK_NULL BLOCK_DEPTH

template< typename T >
class atomic_ptr_ref_block
{
union stack_anchor_t
{
struct
{
USHORT m_aba;
USHORT m_front;

} internal;

LONG m_value;
};

public:

struct block_t
{
atomic_ptr_ref< T > m_ref;
USHORT m_next;
USHORT m_this_index;
};

private:

block_t m_blocks[BLOCK_DEPTH];
stack_anchor_t m_stack;

public:

typedef block_t *block_ptr_t;

public:

atomic_ptr_ref_block()
{
m_stack.internal.m_front = BLOCK_NULL;

for ( unsigned short i; i < BLOCK_DEPTH; ++i )
{
m_blocks[i].m_this_index = i;
m_blocks[i].m_next = m_stack.internal.m_front;
m_stack.internal.m_front = i;
}
}

public:

block_ptr_t pop()
{
register block_ptr_t block;
register stack_anchor_t old, cmp, xchg;

old.m_value = m_stack.m_value;
// read_barrier_depends(); need's dependant load here

do
{
if ( old.internal.m_front == BLOCK_NULL )
{ return 0; }

block = &m_blocks[old.internal.m_front];

xchg.internal.m_aba = old.internal.m_aba + 1;
xchg.internal.m_front = block->m_next;

cmp.m_value = old.m_value;
old.m_value =
InterlockedCompareExchangeAcquire
( &m_stack.m_value,
xchg.m_value,
cmp.m_value );

} while ( cmp.m_value != old.m_value );

return block;
}

void push( block_ptr_t block )
{
register stack_anchor_t old, cmp, xchg;

xchg.internal.m_front = block->m_this_index;
old.m_value = m_stack.m_value;

do
{
xchg.internal.m_aba = old.internal.m_aba;

block->m_next = old.internal.m_front;

cmp.m_value = old.m_value;
old.m_value =
InterlockedCompareExchangeRelease
( &m_stack.m_value,
xchg.m_value,
cmp.m_value );

} while ( cmp.m_value != old.m_value );
}

};

> Not sexy but that doesn't stop anyone from using it.

The above scheme could be easily used as portable lock-free method for
atomic_ptr to allocate its atomic_ptr_ref's from. It would work fine on any
32/64 bit system that has a "normal" CAS. 64-bit could have much larger aba
count, that's pretty sexy...

:)

You could even protect the "blocks" with shared_ptr's... ;)

Joe Seigh

unread,

Apr 16, 2005, 7:31:48 AM4/16/05

to

On Fri, 15 Apr 2005 22:03:14 -0700, Chris Thomasson <_no_damn_spam_cristom@_no_damn_comcast.net_spam> wrote:

>>> Ok, Joe said that "It needs the virtual storage to always be valid so
>>> the load locked won't segfault."
>>>
>>> Not sexy.
>>>
>
> [...]
>
>> BTW, it's the same restriction lock-free LIFO stack using DWCAS w/ version
>> counts has.
>
> So, atomic_ptr for PPC-64 needs to have some sort of static/tracked storage
> wrt atomic_ptr_ref's? You can get around all of that by allocating static or
> "collected" arrays of atomic_ptr_ref's. Then you would use the index to
> atomic_ptr_ref as the "pointer" to it.

PPC-64 has ldarx/stdcx. It just needs the storage to always be valid for
loading from event if its no longer a valid object. Same as the lock-free
LIFO stack needs when it does a load of the next pointer field in what it
thinks is the item on the top of the stack. Not until it does the dwcas
on the stack head/version number does it validate that the value it
fetched was good.

This is different than the lock-free FIFO queue which needs to pool the
nodes since the storage value fetched does matter.

DWCAS based refcounting doesn't have this problem since the refcount is
incremented atomically with the fetch of the pointer.

--
Joe Seigh

Peter Dimov

unread,

Apr 16, 2005, 7:57:12 AM4/16/05

to

Chris Thomasson wrote:
> So, atomic_ptr for PPC-64 needs to have some sort of static/tracked
> storage wrt atomic_ptr_ref's? You can get around all of that by
> allocating static or "collected" arrays of atomic_ptr_ref's. Then you
> would use the index to atomic_ptr_ref as the "pointer" to it.

But what happens when the static array overflows?

Joe Seigh

unread,

Apr 16, 2005, 8:08:11 AM4/16/05

to

--
Joe Seigh

Joe Seigh

unread,

Apr 16, 2005, 8:15:14 AM4/16/05

to

I think Chris is talking about the using 32 bit offsets into a fixed storage pool
instead of 64 bit pointers to get around a lack of space in interlocked instructions
for the latter. PPC-64 has ldarx/stdcx so it's not an issue. You have to use
this trick on systems that don't have DWCAS (CMPXCHG16B). You can make the
storage pool up to 4 GB with the restriction of however many refcount meta objects
fit in there (if you're using non-intrusive refcounting).

--
Joe Seigh

Joe Seigh

unread,

Apr 16, 2005, 8:35:01 AM4/16/05

to

On Fri, 15 Apr 2005 21:22:01 -0700, Chris Thomasson <_no_damn_spam_cristom@_no_damn_comcast.net_spam> wrote:

>> Well, we take what we can get. When you get lemons, you make lemonade.
>> When you get hardware, you make software.
>
> LOL!

I'm tempted to make that my sig. That ought to go over well on comp.arch,
you think?

>
> <hardware rant>
> Isn't assembly language great! Its basically created by experienced hard
> working hardware folk who "seem" to think an efficient implementation of TAS
> should be "enough" for all types of threaded programming. I could picture a
> hardware guy thinking that DWCAS is basically only useful for 32-bit
> programs that want to have 64-bit atomic ops, to be 64-bit ready so to
> speak...
>
> They give the ability for advanced lock-free algorithms on 32-bit and did
> not event seem to realize it. That can "possibly" be a real reason that
> would account for the fact that DWCAS is *not "reliably" ported to 64-bit
> systems.
> </hardware rant>
>
>
>
>
> * It is on Intel extensions, and "some" AMD's; Per-cpu "features", damn...
>
> ;)
>
>
>
>
> <rant2>
> What the heck was that cmp8xchg16 thing all about!
> </rant2>
>

Well, it will eventually be on all Intel and AMD cpus. The company that's
going to be in trouble will be Sun since they're going to highly multi-threaded
processors, Niagra and Rock, which are Sparc architectures and don't have
DWCAS, and won't be able to exploit DWCAS. What's strange is Sun Research
has a group (Moire, Herlihy, ...) working on lock-free algorithms using the
mythical 2CAS (CAS on 2 separate memory locations) and Sun ignores it. Yet
Sun is flogging the Throughput Computing as the hot new thing. WTF is Sun
funding research that it deliberately ignores? Someone need to take a
cluebat to Sun's corporate executives. Sun is not a commodity company
and they're going to get creamed. It doesn't matter how many cores they
put on their multi-core ships since they're not going to be able to uniquely
exploit them. Commodity cpu's will be cheaper.

--
Joe Seigh

Peter Dimov

unread,

Apr 16, 2005, 8:52:09 AM4/16/05

to

Chris Thomasson wrote:
>> A non-tricky copy constructor cannot be made
>> race-resistant, no matter how I fiddle with the assignment.
>
> Here is a "tricky" way to increment a reference count wrt strong
> thread safety:

[ hazard pointers algorithm ]

> http://appcore.home.comcast.net/appcore_4_1_2005.zip
> ( full source to AppCore. )

Totally incomprehensible. ;-) I'll probably need a day or two to navigate
that.

ac_i686_lfgc_smr_activate_reload:
mov eax, [ecx]
mov [edx], eax
mfence
cmp eax, [ecx]
jne ac_i686_lfgc_smr_activate_reload

Have you tried lock mov [edx], eax or lock cmp eax, [ecx]?

From what I can understand, it looks like a textbook hazard pointer
implementation. (Except that you are using the high bit of the reference
count for something that I haven't deciphered yet.)

For me, the most painful part of a hazard pointer implementation is the
pointer-per-thread table (ac_thread.c). (There are some issues with
destruction and deallocation but they probably can be solved.)

But there's one other thing: shared_ptr is two pointers, not one. Without
DWCAS, or at least double-wide atomic loads and stores, even hazards aren't
enough to make it atomic. :-(

Joe Seigh

unread,

Apr 16, 2005, 9:12:10 AM4/16/05

to

On Sat, 16 Apr 2005 15:52:09 +0300, Peter Dimov <pdi...@gmail.com> wrote:
> For me, the most painful part of a hazard pointer implementation is the
> pointer-per-thread table (ac_thread.c). (There are some issues with
> destruction and deallocation but they probably can be solved.)

Making the table dynamically resizable would be a little tricky but
doable. I don't think you'd need more than one or two hazard pointers
though. Big problem is getting rid of the hazard pointer memory barrier
but that's doable also.

>
> But there's one other thing: shared_ptr is two pointers, not one. Without
> DWCAS, or at least double-wide atomic loads and stores, even hazards aren't
> enough to make it atomic. :-(
>

Sounds like it's using separate refcount and object pointers to make dereferencing
the object less expensive, as opposed to keeping the object pointer with the
refcount which is what I do with atomic_ptr. It's an implementation issue.
If you want atomicity you do whatever it takes. You need a compiler that's
smarter about optimization maybe.

--
Joe Seigh

Peter Dimov

unread,

Apr 16, 2005, 11:54:48 AM4/16/05

to

Joe Seigh wrote:
> On Sat, 16 Apr 2005 15:52:09 +0300, Peter Dimov <pdi...@gmail.com>
> wrote:
>> But there's one other thing: shared_ptr is two pointers, not one.
>> Without DWCAS, or at least double-wide atomic loads and stores, even
>> hazards aren't enough to make it atomic. :-(
>
> Sounds like it's using separate refcount and object pointers to make
> dereferencing the object less expensive, as opposed to keeping the
> object pointer with the refcount which is what I do with atomic_ptr. It's
> an implementation issue.

It's also an interface issue; a separate pointer is needed for pointer
conversions (shared_ptr<derived> to shared_ptr<base> or shared_ptr<void>).

But if I place a limit of at most one writer... would I be able to get away
with something like:

l1:

load pointer to object
load pointer to count
set hazard pointer

fence

compare pointer to object
compare pointer to count
compare hazard pointer

if mismatch goto l1

?

Then

l2:

load count
if zero goto l3

attempt to store count+1, goto l2 if not successful
return (pobject, pcount)

l3:

set hazard pointer to zero
scan hazards and deallocate count if not referenced
return (NULL, NULL)

?

Joe Seigh

unread,

Apr 16, 2005, 1:19:44 PM4/16/05

to

On Sat, 16 Apr 2005 18:54:48 +0300, Peter Dimov <pdi...@gmail.com> wrote:

>
> But if I place a limit of at most one writer... would I be able to get away
> with something like:

Single writer on the target shared ptr and multiple concurrent readers
on the source shared ptr, the one that you're copying.

>
> l1:
>
> load pointer to object
> load pointer to count
> set hazard pointer
>
> fence
>
> compare pointer to object
> compare pointer to count
> compare hazard pointer
>
> if mismatch goto l1
>
> ?

(...)

You're need to atomically load the object and count ptrs at the same time
so you know their associated with each other. This is a problem if you're
trying to avoid atomic double word loads because not all platforms have it.
Put an extra ptr to object in refcount object. So you get

l1:
load pointer to count // atomic
if null return (null, null) // this works because "null object" is never deallocated
// no need to increment refcount for it.
set hazard pointer // atomic
fence
compare hazard ptr to count ptr
if mismatch goto l1

atomically increment refcount if not zero
if zero goto l1

unset hazard ptr
return (pcount, pcount->pobject) // to be stored in target shared pointer

The setting of the pcount and pobject fields in the target shared pointer
each need to be atomic respectively. You'll want acquire and release
semantics since even though you have single writer, you still have
multiple concurrent readers.

For read access

l1:
load pointer to object // atomic
set hazard pointer // atomic
fence
compare hazard ptr to object ptr
if mismatch goto l1

... read access to object ...

unset hazard ptr

I left out the details on the hazard ptr logic since there's multiple
ways of doing that and it's not really important here.

--
Joe Seigh

Peter Dimov

unread,

Apr 16, 2005, 4:20:17 PM4/16/05

to

Joe Seigh wrote:
> You're need to atomically load the object and count ptrs at the same
> time so you know their associated with each other. This is a problem if
> you're trying to avoid atomic double word loads because not all platforms
> have it. Put an extra ptr to object in refcount object.

The count object already contains an extra pointer. The problem is that, in
general, it isn't the same as the object pointer. If I have something like

struct derived: base1, base2

shared_ptr<derived> p( new derived );

shared_ptr<base1> p1( p );
shared_ptr<base2> p2( p );

the object pointers in p1 and p2 may not match the object pointer in p
(which is also stored in the count).

Looks like shared_ptr can't be strengthened without atomic DW ops (or
locks).

On the other hand... looks like it doesn't need to.

shared_ptr<T> * ppt;

shared_ptr<T> reader()
{
return *atomic_load_ddacq( ppt );
}

void writer( ... )
{
lock();

shared_ptr<T> tmp( new T(**ppt) );

tmp->update( ... );

T * pp2 = new shared_ptr<T>( tmp );

atomic_exchange_rel( ppt, pp2 );

delete pp2;

unlock();
}

Peter Dimov

unread,

Apr 16, 2005, 4:26:46 PM4/16/05

to

Peter Dimov wrote:
> shared_ptr<T> * ppt;
>
> shared_ptr<T> reader()
> {
> return *atomic_load_ddacq( ppt );

Except that I now need a hazard here...

> }
>
> void writer( ... )
> {
> lock();
>
> shared_ptr<T> tmp( new T(**ppt) );
>
> tmp->update( ... );
>
> T * pp2 = new shared_ptr<T>( tmp );
>
> atomic_exchange_rel( ppt, pp2 );
>
> delete pp2;

... and a check here. Back to square one. :-/

> unlock();
> }

Chris Thomasson

unread,

Apr 17, 2005, 3:22:18 AM4/17/05

to

> Totally incomprehensible. ;-) I'll probably need a day or two to navigate
> that.

:P

> From what I can understand, it looks like a textbook hazard pointer
> implementation.

The "original" SMR algorithm uses a global lock-free LIFO coupled with
per-TLS spinlocks to manage its TLS storage scheme. My implementation uses a
global rw-spinlock( ac_rwspinlock_algo1.h ) to handle the internal SMR TLS
infrastructure; IMHO this logic does not need to be lock-free, in fact
lock-free here could "greatly" reduce the overall flexibility of the SMR
algorithm ( especially in general user-space apps ). AppCore uses the
rw-spinlock to grant write access to TLS ctor/dtor's, and allow read access
solely for "hazard pointer scans". Applications that rapidly create threads
for extended periods of time can possibly render this solution suboptimal;
Frequent thread creation is basically more damaging to performance than a
rw-spinlock crapping out. User applications that deliberately choose to do
"stupid things" will reap what they sow.

>(Except that you are using the high bit of the reference
> count for something that I haven't deciphered yet.)

High-bit indicates that the node is static; Never can be freed. Its an
optimization for "non-garbage-collected" algorithms. This is another example
of creating a special case to avoid calling into SMR API, static objects do
not need to be collected in a vast majority of cases. The (
ac_i686_node_cache_push function located in ac_i686.c ) can set this bit.

The atomic reference count API ( ac_i686_lfgc_refcount.h ) needs to mask
this bit off in order to get at the correct count. The API needs to do this
because it acquires its reference count structure( ac_i686_lfgc_refcount_t )
from the same cache's the lock-free nodes( ac_i686_node_t ) are stored in
( ac_i686_node_t ).

> For me, the most painful part of a hazard pointer implementation is the
> pointer-per-thread table (ac_thread.c). (There are some issues with
> destruction and deallocation but they probably can be solved.)

Yeah. That's a main reason why I am experimenting with SMR based reference
counting... This allows me to rely on a single hazard per-thread to protect
basically any number of reference counts; hazard pointers are only used to
provide the ability for "competing / strong thread-safe" count increments.
This is more efficient than adding additional logic in the (
ac_i686_lfgc_smr_activate ) "stage" of the SMR algorithm; Making the number
of hazard-pointers adjustable at runtime would just introduce additional
memory barrier overhead...

> But there's one other thing: shared_ptr is two pointers, not one. Without
> DWCAS, or at least double-wide atomic loads and stores, even hazards
> aren't
> enough to make it atomic. :-(

Humm... There should be a way. I will try to apply my SMR implementation to
your shared_ptr< T > and see what I can come up with.

;)

Chris Thomasson

unread,

Apr 17, 2005, 3:48:34 AM4/17/05

to

>> But there's one other thing: shared_ptr is two pointers, not one. Without
>> DWCAS, or at least double-wide atomic loads and stores, even hazards
>> aren't
>> enough to make it atomic. :-(
>
> Humm... There should be a way. I will try to apply my SMR implementation
> to your shared_ptr< T > and see what I can come up with.

I think I may be getting somewhere. I made the following additions to
shared_ptr<T> API:

// *** detail::sp_counted_base ***

1. *Add this to private section:
-----------------

static void AC_CDECL lfgc_dtor( void *s )
{
ac_cpu_node_t *node = (ac_cpu_node_t*)s;
delete (sp_counted_base*)node->state;
ac_thread_cpu_node_cache_push( ac_thread_self(), node );
}

struct ac_lfgc_node_t
{
ac_cpu_node_t *m_node;

void init( sp_counted_base *_base )
{
m_node = ac_thread_cpu_node_cache_pop
( ac_thread_self(),
_base );
}

void collect()
{
if ( m_node )
{
ac_cpu_node_t *node = m_node;

m_node = 0;

ac_i686_lfgc_smr_collect
( ac_thread_self(),
lfgc_dtor,
node );
}
}

};

ac_lfgc_node_t m_node;

2. *Add this to public section:
-----------------

bool add_ref_lock_if_greater_than_zero()
{
#if defined(BOOST_HAS_THREADS)
mutex_type::scoped_lock lock(mtx_);
#endif

if(use_count_ == 0) { return false; }
++use_count_;
return true;
}

3. *Change the ctor to:
-----------------

sp_counted_base(): use_count_(1), weak_count_(1)
{
m_node.init( this );
}

4. *Change the destruct to:
-----------------

virtual void destruct()
{
m_node.collect();
}

// *** detail::shared_count ***

1. *Add this to public section:
-----------------

void tricky_copy( shared_count &r ) // nothrow
{
sp_counted_base *src, *old = pi_;

ac_thread_t *_tls = ac_thread_self();

ac_i686_lfgc_smr_hazard_t hazard =
ac_lfgc_smr_get( _tls, 0 );

do
{
src = (sp_counted_base*)
ac_lfgc_smr_activate
( hazard,
(ac_cpu_node_t**)&r.pi_ );

if ( ! src ) { break; }

} while ( ! src->add_ref_lock_if_greater_than_zero() );

pi_ = src;
if ( old ) { old->release(); }
if ( src ) { ac_lfgc_smr_deactivate( hazard ); }
}

// *** shared_ptr ***

1. *Add this to public section:
-----------------

void tricky_copy( shared_ptr &r )
{
pn.tricky_copy( r.pn );
px = r.px;
}

I believe shared_ptr can work well with SMR using these ad-hoc tweaks... I
am going to run some tests and study the shared_ptr impl a bit more to be
totally sure...

:)

Chris Thomasson

unread,

Apr 17, 2005, 4:07:49 AM4/17/05

to

> 1. *Add this to private section:
> -----------------
>
> static void AC_CDECL lfgc_dtor( void *s )
> {
> ac_cpu_node_t *node = (ac_cpu_node_t*)s;
> delete (sp_counted_base*)node->state;
> ac_thread_cpu_node_cache_push( ac_thread_self(), node );
> }

DOH!

lfgc_dtor needs to be public in order to tinker around with this hack.

Chris Thomasson

unread,

Apr 17, 2005, 4:34:22 AM4/17/05

to

I needed to tweak it in order to handle the static storage bit correctly.
Here is the hack:

> static void AC_CDECL lfgc_dtor( void *s )
> {
> ac_cpu_node_t *node = (ac_cpu_node_t*)s;
> delete (sp_counted_base*)node->state;
> ac_thread_cpu_node_cache_push( ac_thread_self(), node );
> }

Needs to be:

static void AC_CDECL lfgc_dtor( void *s )
{
ac_cpu_node_t *node = (ac_cpu_node_t*)s;
delete (sp_counted_base*)node->state;

if ( node->next == (ac_cpu_node_t*)0x00000001 )
{
node->fp_dtor = (ac_fp_dtor_t)0x00000001;
}

ac_thread_cpu_node_cache_push( ac_thread_self(), node );
}

> void init( sp_counted_base *_base )
> {
> m_node = ac_thread_cpu_node_cache_pop
> ( ac_thread_self(),
> _base );
> }

Needs to be:

void init( sp_counted_base *_base )
{
m_node = ac_thread_cpu_node_cache_pop( ac_thread_self(), _base );

if ( m_node->fp_dtor == (ac_fp_dtor_t)0x00000001 )
{ m_node->next = (ac_cpu_node_t*)0x00000001; }
}

So far it looks like this may work! I will tinker around with it some more
and report back soon. I should really post full modified boost sources...
Humm...

;)

Chris Thomasson

unread,

Apr 17, 2005, 4:57:30 AM4/17/05

to

> So far it looks like this may work!

I need to study the locking scheme that shared_ptr uses. There has to be a
way to do this.

A question about atomic_ptr

Peter Dimov

Joe Seigh

Chris Thomasson

Chris Thomasson

Peter Dimov

Peter Dimov

Chris Thomasson

Joe Seigh

Chris Thomasson

Chris Thomasson

Peter Dimov

Alexander Terekhov

Alexander Terekhov

Peter Dimov

Alexander Terekhov

Alexander Terekhov

Peter Dimov

Joe Seigh

Chris Thomasson

Peter Dimov

Alexander Terekhov

Peter Dimov

Alexander Terekhov

Joe Seigh

Peter Dimov

Alexander Terekhov

Joe Seigh

Alexander Terekhov

Peter Dimov

Peter Dimov

Chris Thomasson

Chris Thomasson

Alexander Terekhov

Alexander Terekhov

Chris Thomasson

Peter Dimov

Peter Dimov

Alexander Terekhov

Joe Seigh

David Hopwood

Peter Dimov

Joe Seigh

Peter Dimov

Alexander Terekhov

David Hopwood

Joe Seigh

Peter Dimov

Peter Dimov

Joe Seigh

Peter Dimov

Joe Seigh

Joe Seigh

Peter Dimov

Joe Seigh

Peter Dimov

Joe Seigh

Peter Dimov

Chris Thomasson

Peter Dimov

Chris Thomasson

Joe Seigh

Alexander Terekhov

Joe Seigh

Chris Thomasson

Peter Dimov

Alexander Terekhov

Chris Thomasson

Chris Thomasson

Alexander Terekhov

Alexander Terekhov

Joe Seigh

Peter Dimov

Joe Seigh

Peter Dimov

Joe Seigh

Joe Seigh

Chris Thomasson

Peter Dimov

Alexander Terekhov