threadsafe reference counting

26 views
Skip to first unread message

Jochen Wilhelmy

unread,
Nov 17, 2002, 7:04:53 PM11/17/02
to
hi!

when writing a c/c++ program using the sun-specific functions
atomic_add_32 and atomic_add_32_nv
from "sys/atomic.h" for thread safe reference counting
i get a linker error. does anybody know which library
to include?

tanks,
jochen

Sean P. Burke

unread,
Nov 17, 2002, 11:31:48 PM11/17/02
to

Jochen Wilhelmy <j.wil...@arcor.de> writes:

This subject came up previously sometime in the last year,
so you might be able to get a better answer through goodle.

Basically, these calls are in the kernel, and not in any
of the userland libraries. You can extract the code from
the kernel, or code them up yourself if you know a little
Sparc assembly. google for details.

-SEan

--
"A man is never so innocently employed
as when he is making money." - Dr. Johnson

Alexander Terekhov

unread,
Nov 17, 2002, 11:54:52 PM11/17/02
to

"Sean P. Burke" wrote:
[...]

> so you might be able to get a better answer through goodle.
^^^^^^

Yeah, looks not bad (well, "making life better", etc.)

http://www.goodle.com

;-)

regards,
alexander.

--
http://groups.google.com/groups?q=atomic_add_32

Mike Mowbray

unread,
Nov 18, 2002, 1:49:25 AM11/18/02
to
Jochen Wilhelmy wrote:

As Sean Burke has already noted, I posted an answer to
this some time ago. If you can't find it on google,
feel free to email me.


- MikeM.

Joseph Seigh

unread,
Nov 18, 2002, 7:25:30 AM11/18/02
to

Alexander Terekhov wrote:
...

You're not warning them off anymore? Well, at least people
realize that they need to do something to make reference
counting thread-safe even though what they get is only
mostly thread-safe as Miracle Max would say.

Speaking of atomic, I actually found an intel cmpxchg8b inline
function in asm/system.h for Linux. Except it isn't used to
do a compare and swap.

Joe Seigh

Alexander Terekhov

unread,
Nov 18, 2002, 7:49:36 AM11/18/02
to

Joseph Seigh wrote:
>
> Alexander Terekhov wrote:
> ...
>
> You're not warning them off anymore? ...

Well, I have to admit that sometimes I'm getting tired. ;-)

<warning>

http://groups.google.com/groups?selm=3DAE728A.12F929A4%40web.de
http://groups.google.com/groups?selm=3DAFD6B9.EB49FF66%40web.de
(Subject: Re: portable library for atomic_inc/dec)

http://groups.google.com/groups?selm=3DD6B018.1138BFB4%40web.de
http://groups.google.com/groups?selm=3DD85DB4.E303F98E%40web.de
(Subject: Re: std::string, reference counting and multithreading)

</warning>

regards,
alexander.

Jochen Wilhelmy

unread,
Nov 18, 2002, 10:53:24 AM11/18/02
to
>>You're not warning them off anymore? ...
>
>
> Well, I have to admit that sometimes I'm getting tired. ;-)


if i have some object obj with a referenceCount,
then whats wrong about

if (atomic_dec_and_test(&obj->referenceCount))
delete obj;

if the reference count is 2 and two threads do this
simultaneously then only one should delete the object
because atomic_dec_and_test returns true only in one thread.


jochen

Alexander Terekhov

unread,
Nov 18, 2002, 11:11:14 AM11/18/02
to

Okay, no links this time.

---
Naked "atomic counts" (w/o *memory synchronization* protocol
ala POSIX mutexes) don't work for *MUTABLE* objects, unless
you acquire the right lock in the destructor as well, which
is rather silly.
---

regards,
alexander.

Jochen Wilhelmy

unread,
Nov 18, 2002, 11:54:26 AM11/18/02
to
i start to guess what's wrong with

if (atomic_dec_and_test(&obj->referenceCount))
delete obj;

the code that is in the destructor may not see the
correct data if we have a multiprocessor system and
an other thread on another processor just has modified
the data.

interesting thing: does this apply for copy-on-write
shared strings since a string can't be written to if
two threads hold a reference to it. a thread that
writes to it copies it first.

another thing: i have implemented some kind of event loop
that each thread enters and you can call methods in
other threads (involving storing the parameters in
a class and some template stuff) which mostly removes
the need to access shared data with mutexes, and also
the smartpointer problem. you just create an object
and send it to another thread. the mutex in the message
system should do the synchronisation.
(windows knows this as apartment model, but my thing
is platform-independent and typesafe)

i will release this stuff soon, and let's see the comments
from you because i didn't find another thread library
with this event loop aproach yet.

jochen

Joseph Seigh

unread,
Nov 18, 2002, 5:18:28 PM11/18/02
to

Jochen Wilhelmy wrote:
>
>
> if i have some object obj with a referenceCount,
> then whats wrong about
>
> if (atomic_dec_and_test(&obj->referenceCount))
> delete obj;
>
> if the reference count is 2 and two threads do this
> simultaneously then only one should delete the object
> because atomic_dec_and_test returns true only in one thread.
>

Wrong place. It's the increment that's not safe. Take the
following scenario (cut and pasted from c.l.c++.m)

Thread A creates a object with count == 1
Thread A stores address of object in X
Thread B fetches address of ojbect from X
Thread A deletes (stores 0 in) X
Thread A decrements count on object from 1 to 0
Thread A deletes object
Thread B increments count in no longer existing object
(value of count is undefined so B cannot necessarily determine that something has gone awry
before it is too late)


In order to get away with just atomic_inc/atomic_dec you have to
guarantee that the reference count will never go to zero. This
adds additional constraints to reference counting which to me
obviates much of its usefulness.

Joe Seigh

Joseph Seigh

unread,
Nov 18, 2002, 6:05:31 PM11/18/02
to

Jochen Wilhelmy wrote:
>
> i start to guess what's wrong with
>
> if (atomic_dec_and_test(&obj->referenceCount))
> delete obj;
>
> the code that is in the destructor may not see the
> correct data if we have a multiprocessor system and
> an other thread on another processor just has modified
> the data.
>

I don't believe that is necessarily true. If it's just
memory being released, the heap manager should have its
own memory barriers if it needs them. If other resources
are involved, then I would presume some form of locking
is being used to coordinate things and that the destructor
would also be using the locking which takes care of that
problem.

Joe Seigh

Jochen Wilhelmy

unread,
Nov 18, 2002, 6:20:19 PM11/18/02
to
hi!

> Wrong place. It's the increment that's not safe. Take the
> following scenario (cut and pasted from c.l.c++.m)
>
> Thread A creates a object with count == 1
> Thread A stores address of object in X
> Thread B fetches address of ojbect from X
> Thread A deletes (stores 0 in) X
> Thread A decrements count on object from 1 to 0
> Thread A deletes object
> Thread B increments count in no longer existing object
> (value of count is undefined so B cannot necessarily determine that something has gone awry
> before it is too late)


i saw this one but thought that is a higher level problem,
a kind of ownership problem.
what is when thread A executes until the object is deleted
and after that B fetches the address of object from X.
this is an error in any case and crashes the app (or
throws a null pointer exception).
so if the ownership of an object is assigned to a thread
and this thread takes care that it holds a valid pointer
as long as other threads may access it, then there should
be no problem.
as mentioned in my other mail i have implemented a thread
library that supports what windows calls apartment model
threading (to be released soon on sourceforge) i have this
kind of ownership. but tell me if i am wrong with this.

jochen

David Schwartz

unread,
Nov 18, 2002, 6:45:46 PM11/18/02
to
Jochen Wilhelmy wrote:

> > Wrong place. It's the increment that's not safe. Take the
> > following scenario (cut and pasted from c.l.c++.m)

This scenario is impossible.

> > Thread A creates a object with count == 1
> > Thread A stores address of object in X

> > Thread B fetches address of ojbect from X

The fetching of an object's address should increment its reference
count. If it doesn't, the fetch itself is broken.

> > Thread A deletes (stores 0 in) X
> > Thread A decrements count on object from 1 to 0

You mean from 2 to 1. How can the reference count ever be zero while
another thread holds a pointer to the object?

> > Thread A deletes object
> > Thread B increments count in no longer existing object
> > (value of count is undefined so B cannot necessarily determine that something has gone awry
> > before it is too late)

The rest of this would never happen.



> i saw this one but thought that is a higher level problem,
> a kind of ownership problem.

Any attempt to get access to an object must, in the process of finding
the object, increment its reference count.

> so if the ownership of an object is assigned to a thread
> and this thread takes care that it holds a valid pointer
> as long as other threads may access it, then there should
> be no problem.

I don't suggest doing this. If a thread has access to an object through
a mechanism that is not inherently safe (such as a pointer), it should
hace its own reference.

DS

Joseph Seigh

unread,
Nov 18, 2002, 7:29:51 PM11/18/02
to

If owning a pointer means no other thread can change it, then
yes, as long as your thread owns at least one pointer and thus
guarantees the count will not go to zero, incrementing the
count will be safe.

So for example if you had a copy on write object you'd fetch the
pointer to it using the same mutex that is used to modify/create
new copies meaning you own the pointer also.

pthread_mutex_lock(&mutex);
ptr = global_ptr; // get reference counted pointer to object
pthread_mutex_unlock(&mutex);
... // access object

With a thread-safe reference counted pointer w/ memory barriers the
same thing could be done by

ptr = global_ptr; // get reference counted pointer to object
... // access object

There's other really cool things you can do with other data structures
if you have reference counted pointers that really work such as iterating
through a linked list without a lock. This is basically what
AWTEventMulticaster in Java does (memory barriers issues aside).

Joe Seigh

Hillel Y. Sims

unread,
Nov 18, 2002, 11:43:58 PM11/18/02
to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3DD9688D...@xemaps.com...


>
> Thread A creates a object with count == 1
> Thread A stores address of object in X
> Thread B fetches address of ojbect from X
> Thread A deletes (stores 0 in) X
> Thread A decrements count on object from 1 to 0
> Thread A deletes object
> Thread B increments count in no longer existing object
> (value of count is undefined so B cannot necessarily determine
that something has gone awry
> before it is too late)
>

This problem as stated is not directly related to (internal) reference
counting issues. This is a problem of unsynchronized access of shared
_external_ references to an object. Here is your scenario, restated using
ints:

Thread A: int* i = new int(69); // "count == 1"
Thread A: x = i; // x is global int*
Thread B: int* p = x;
Thread A: x = 0; delete i; // "count == 0"
Thread B: int j = *p; // format hard drive..

>
> In order to get away with just atomic_inc/atomic_dec you have to
> guarantee that the reference count will never go to zero.

It can never go to zero until only a single thread is accessing the
reference counted object, if correct shared object synchronization protocols
are followed.

hys

--
(c) 2002 Hillel Y. Sims
FactSet Research Systems
hsims AT factset.com


Casper H.S. Dik

unread,
Nov 19, 2002, 3:37:17 AM11/19/02
to
Joseph Seigh <jsei...@xemaps.com> writes:

>Wrong place. It's the increment that's not safe. Take the
>following scenario (cut and pasted from c.l.c++.m)

>Thread A creates a object with count == 1
>Thread A stores address of object in X
>Thread B fetches address of ojbect from X
>Thread A deletes (stores 0 in) X

At this point, we're talking about an error in the algorithm.

In proper refcounted algorithm, only threads that already hold
a reference can increment the reference.

The issue in this particular scenario is not so much the
reference counting as well as the fact the the "X" object
is not properly protected; it must be accessed and modified
under a lock.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Jochen Wilhelmy

unread,
Nov 19, 2002, 5:14:23 AM11/19/02
to
> If owning a pointer means no other thread can change it, then
> yes, as long as your thread owns at least one pointer and thus
> guarantees the count will not go to zero, incrementing the
> count will be safe.
>
> So for example if you had a copy on write object you'd fetch the
> pointer to it using the same mutex that is used to modify/create
> new copies meaning you own the pointer also.
>
> pthread_mutex_lock(&mutex);
> ptr = global_ptr; // get reference counted pointer to object
> pthread_mutex_unlock(&mutex);
> ... // access object
>
> With a thread-safe reference counted pointer w/ memory barriers the
> same thing could be done by
>
> ptr = global_ptr; // get reference counted pointer to object
> ... // access object

thanks for this clarification. but i thought that it is bad
to assume some type as thread safe anyway, be it an int or
a smartpointer. so if you have a global int i would use
a mutex to access it since the access may not be atomic.


> There's other really cool things you can do with other data structures
> if you have reference counted pointers that really work such as iterating
> through a linked list without a lock. This is basically what
> AWTEventMulticaster in Java does (memory barriers issues aside).

why does iterating (that is reading) through a linked list only
works with proper refcounted pointers? is the iterator a global
variable?

jochen

Alexander Terekhov

unread,
Nov 19, 2002, 5:18:35 AM11/19/02
to

Joseph Seigh wrote:
>
> Jochen Wilhelmy wrote:
> >
> > i start to guess what's wrong with

Lack of memory barriers is wrong [or .acq/.rel atomic op + mbar,
or "full mbar" atomic op] with your version. Plus: optimization
restrictions with respect to reordering/code motions imposed on
compiler, of course.

> >
> > if (atomic_dec_and_test(&obj->referenceCount))
{
__INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> > delete obj;
}
else
{
__INJECT_RELEASE_MEMORY_BARRIER(); // release semantics
}

> >
> > the code that is in the destructor may not see the
> > correct data if we have a multiprocessor system and
> > an other thread on another processor just has modified
> > the data.
> >
> I don't believe that is necessarily true.

But it's really true.

> If it's just memory being released, the heap manager
> should have its own memory barriers if it needs them.

Yes, but "heap manager" != "non-trivial" destructor.

< ISO/IEC 14882:1998(E) >

"A destructor is trivial if it is an implicitly-declared
destructor and if:

— all of the direct base classes of its class have trivial
destructors and

— for all of the non-static data members of its class that
are of class type (or array thereof), each such class has
a trivial destructor.

Otherwise, the destructor is non-trivial."

regards,
alexander.

Joseph Seigh

unread,
Nov 19, 2002, 6:17:49 AM11/19/02
to

Jochen Wilhelmy wrote:
...


>
> > There's other really cool things you can do with other data structures
> > if you have reference counted pointers that really work such as iterating
> > through a linked list without a lock. This is basically what
> > AWTEventMulticaster in Java does (memory barriers issues aside).
>
> why does iterating (that is reading) through a linked list only
> works with proper refcounted pointers? is the iterator a global
> variable?

The iterator is local to the thread using it. Iterating through
a list in such a way is a standard wait-free technique in systems with
some form of GC available. Reference counting is a form of GC.

Joe Seigh

Jochen Wilhelmy

unread,
Nov 19, 2002, 7:00:19 AM11/19/02
to
> The iterator is local to the thread using it. Iterating through
> a list in such a way is a standard wait-free technique in systems with
> some form of GC available. Reference counting is a form of GC.
can you give me an example what i can do with correct pointers
that i can't do with unsafe pointers?

if i have a list
list<int> l;
// fill with data
for (list<int>::iterator it = l.begin(); it != l.end(); ++it)
{
// read only
}

this should work in any case if all threads only read.
writing or even inserting shold be locked in any case?

or can i insert into the list with one thread while reading
with the other thread safely with correct pointers?

jochen

Joseph Seigh

unread,
Nov 19, 2002, 7:04:07 AM11/19/02
to

"Casper H.S. Dik" wrote:
>
> Joseph Seigh <jsei...@xemaps.com> writes:
>
> >Wrong place. It's the increment that's not safe. Take the
> >following scenario (cut and pasted from c.l.c++.m)
>
> >Thread A creates a object with count == 1
> >Thread A stores address of object in X
> >Thread B fetches address of ojbect from X
> >Thread A deletes (stores 0 in) X
>
> At this point, we're talking about an error in the algorithm.
>
> In proper refcounted algorithm, only threads that already hold
> a reference can increment the reference.
>
> The issue in this particular scenario is not so much the
> reference counting as well as the fact the the "X" object
> is not properly protected; it must be accessed and modified
> under a lock.
>

The "proper" way is a compromise that is used to deal with
what most believe is an intractable problem.

It probably got that way somewhat along these lines. First
you had naive reference counting in a single thread which
worked fine. When that was used in a multi-threaded environment
the increment/decrement of a shared counter problem quickly became
apparent. That was fixed with atomic increment/decrement. But
it turns out there was still the problem of count going to
zero before increment. The solution to that appeared to be
intractable, so a compromise was made and the restriction
of having to own a reference to copy it was widely adopted.

Also, a lot of people have rather limited experience with
the different concurrent programming techniques that are
out there but don't realize it. Anything outside that
experience is obviously wrong and invalid to their view.

You see some of that thinking in places here, the presumption
that a lock is implicitly required for everything. They're not
aware of lock-free or wait-free programming it seems.

Joe Seigh

Joseph Seigh

unread,
Nov 19, 2002, 7:47:53 AM11/19/02
to

Usually the thread doing list modification has a lock. Deletion
is accomplished by unlinking the list item and letting GC take
care of the rest.

Reading or traversing the list doesn't require a lock.

If list item state is an issue you deal with it at the
list item level rather than at the list level. This can reduce
your locking granularity and allow better concurrency.

Semantics are slightly different and some people have difficulty
is dealing with it. In their case they are better off sticking
with the more conventional methods they are used to.

Joe Seigh

ig...@notformail.paco.net

unread,
Nov 19, 2002, 2:41:30 PM11/19/02
to

When used with smart pointers is it possible to keep information about
higher level locking into the object and pointers to this object?
Like this

template<class R>
class Ptr {
public:
Ptr() {
rep = NULL;
Lock = NULL;
};
~Ptr() {
if ( rep ) {
if ( Lock ) Lock->Lock();
rep->Ref_Decrement();
if ( rep->Ref_Value() == 0 )
delete rep;
if ( Lock ) Lock->UnLock();
}
};
Ptr(const Ptr& P) {
if ( P.rep ) P.rep->Ref_Increment();
rep = P.rep;
Lock= P.Lock;
};
void SetLocker(Mutex *L) {
Lock = L;
}

...

private:
R* rep;
Mutex* Lock;
};

> ---
>
> regards,
> alexander.

--
Igor Khasilev |
PACO Links, igor at paco dot net |

Torsten Robitzki

unread,
Nov 19, 2002, 3:18:08 PM11/19/02
to
Joseph Seigh wrote:
>
> Jochen Wilhelmy wrote:
> >
> >
> > if i have some object obj with a referenceCount,
> > then whats wrong about
> >
> > if (atomic_dec_and_test(&obj->referenceCount))
> > delete obj;
> >
> > if the reference count is 2 and two threads do this
> > simultaneously then only one should delete the object
> > because atomic_dec_and_test returns true only in one thread.
> >
>
> Wrong place. It's the increment that's not safe. Take the
> following scenario (cut and pasted from c.l.c++.m)
>
> Thread A creates a object with count == 1
> Thread A stores address of object in X
> Thread B fetches address of ojbect from X
> Thread A deletes (stores 0 in) X
> Thread A decrements count on object from 1 to 0
> Thread A deletes object
> Thread B increments count in no longer existing object
> (value of count is undefined so B cannot necessarily determine that something has gone awry
> before it is too late)

First of all, I hope we can agree on the purpose of a reference
couting smart pointer in c++. It's just to keep track of the
numbers of references to an dynamic created object. And to
delete this object if the number of references drop down to zero.

There is no special thread garanty by a smart pointer that goes
further then that of a normal pointer.

You have to synchronize the access to a smartpointer even if
you want to make a copy of the pointer.

The atomic increment/decrement read operation on the reference
counter is just needed to safely decrement the reference count
in the destructor of the smart pointer to decide if the object
can be deleted.

> In order to get away with just atomic_inc/atomic_dec you have to
> guarantee that the reference count will never go to zero. This
> adds additional constraints to reference counting which to me
> obviates much of its usefulness.

When the reference count drops to zero, you can safely destroy the
object because there is no other reference to that object.

regards Torsten

Alexander Terekhov

unread,
Nov 19, 2002, 4:34:32 PM11/19/02
to

ig...@notformail.paco.net wrote:
[...]

> > Naked "atomic counts" (w/o *memory synchronization* protocol
> > ala POSIX mutexes) don't work for *MUTABLE* objects, unless
> > you acquire the right lock in the destructor as well, which
> > is rather silly.
>
> When used with smart pointers is it possible to keep information
> about higher level locking into the object and pointers to this
> object?

Yes, I think so [if you really, really want].

> Like this
>
> template<class R>
> class Ptr {
> public:
> Ptr() {
> rep = NULL;
> Lock = NULL;
> };
> ~Ptr() {
> if ( rep ) {

/*

> if ( Lock ) Lock->Lock();

*/

Not here.

> rep->Ref_Decrement();
> if ( rep->Ref_Value() == 0 )

I'd make Ref_Decrement() [atomic but w/o mem.sync.] indicate
"dropped to zero" condition.

{
if ( Lock ) { Lock->Lock(); Lock->UnLock(); }

Looks somewhat silly, oder? ;-)

> delete rep;
}

/*


> if ( Lock ) Lock->UnLock();

*/

Not needed and really "dangerous" -- deadlocks aside [unless you
really want recursive locking], d-tor might easily evaporate the
lock as well [e.g. if it's "owned" by the referenced object].

> }
> };

regards,
alexander.

Alexander Terekhov

unread,
Nov 19, 2002, 4:38:34 PM11/19/02
to

Torsten Robitzki wrote:
[...]

> When the reference count drops to zero, you can safely destroy the
> object because there is no other reference to that object.

Unless that object meant to have a "zombie" state [think of registry/
cache] and CAN be "resurrected". Do you want a link, Torsten? ;-) ;-)

regards,
alexander.

--
"If someone asks you the time, they're trying to set you up to MUG you."

....

"If someone asks you the time, say you have no idea.

If someone asks you for money, pretend you don't speak English.

If someone says hand over that camera or die, give it up.

If someone says run, it's probably a good idea.

If you want to live, stay in lower Manhattan. "

Jochen Wilhelmy

unread,
Nov 19, 2002, 5:21:57 PM11/19/02
to
> I'd make Ref_Decrement() [atomic but w/o mem.sync.] indicate
> "dropped to zero" condition.
>
> {
> if ( Lock ) { Lock->Lock(); Lock->UnLock(); }
>
> Looks somewhat silly, oder? ;-)
>
>
>> delete rep;


why by the way is it not possible to create a new mutex before delete,
lock it, unlock it and delete it. i have not yet written an os so
far :)

jochen

Torsten Robitzki

unread,
Nov 19, 2002, 5:43:17 PM11/19/02
to
Alexander Terekhov wrote:
>
> Torsten Robitzki wrote:
> [...]
> > When the reference count drops to zero, you can safely destroy the
> > object because there is no other reference to that object.
>
> Unless that object meant to have a "zombie" state [think of registry/
> cache] and CAN be "resurrected".

Can you elaborate on this? Do you think of a state in the object that
is needed by the destructor to perform proper destruction? If so, the
state of the object might only be changed by an other thread
while holding a mutex.

>
> Do you want a link, Torsten? ;-) ;-)

It this a threat? ;-)

Alexander Terekhov

unread,
Nov 19, 2002, 6:23:06 PM11/19/02
to

Jochen Wilhelmy wrote:
>
> > I'd make Ref_Decrement() [atomic but w/o mem.sync.] indicate
> > "dropped to zero" condition.
> >
> > {
> > if ( Lock ) { Lock->Lock(); Lock->UnLock(); }
> >
> > Looks somewhat silly, oder? ;-)
> >
> >
> >> delete rep;
>
> why by the way is it not possible to create a new mutex before delete,
> lock it, unlock it and delete it.

Good question. Uhmm. Unless I'm just missing something [and it's
really time for me to go sleep [condvar, timedwait()] ;-) ]...

Yes, as long as threads DON'T unref objects that they mutate INSIDE
"modify" critical regions [and compiler doesn't move unrefs in], it
doesn't really matter which lock will be locked{/unlocked} prior to
delete. So, locking some >>thread specific<< [oh, oh, yeah] "mutex"
should work "just fine", I believe. ;-)

IF this assumption [no unrefs inside "modify" critical regions] can
NOT be guaranteed, then one would need recursive locks anyway, to
begin with. Well, we really need pthread_refcount_t, I'd say. ...

regards,
alexander.

Alexander Terekhov

unread,
Nov 19, 2002, 6:24:20 PM11/19/02
to

Torsten Robitzki wrote:
>
> Alexander Terekhov wrote:
> >
> > Torsten Robitzki wrote:
> > [...]
> > > When the reference count drops to zero, you can safely destroy the
> > > object because there is no other reference to that object.
> >
> > Unless that object meant to have a "zombie" state [think of registry/
> > cache] and CAN be "resurrected".
>
> Can you elaborate on this? Do you think of a state in the object that
> is needed by the destructor to perform proper destruction?

No. I mean the following stuff:

http://groups.google.com/groups?selm=3D021EA4.E2217C09%40web.de
(Subject: Re: Objects in container: creation and deletion details)

with Zombies, resurrection [God rules], etc. ;-)

regards,
alexander. < offline till tomorrow >

Mike Mowbray

unread,
Nov 19, 2002, 7:01:43 PM11/19/02
to
Alexander Terekhov wrote:

> [...Talking about mutable refcounted objects...]


> > if (atomic_dec_and_test(&obj->referenceCount))
> > {
> > __INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> > delete obj;
> > }
> > else
> > {
> > __INJECT_RELEASE_MEMORY_BARRIER(); // release semantics
> > }

Hmmm. Is this sequence correct? Suppose the refcount is 2.
Thread-A performs a decrement (that results in 1), but before
it executes the "release" membar in the "else" clause, thread-B
performs a decrement, finds a result of 0, and proceeds to the
"acquire" membar and deletion before thread-A has completed its
release membar.

I would have thought something like the following is better:

__INJECT_RELEASE_MEMORY_BARRIER(); // release semantics

if (atomic_dec_and_test(&obj->referenceCount))
{
__INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
delete obj;
}

(?)


- MikeM.


Alexander Terekhov

unread,
Nov 20, 2002, 6:09:26 AM11/20/02
to

Mike Mowbray wrote:
>
> Alexander Terekhov wrote:
>
> > [...Talking about mutable refcounted objects...]
>
> > > if (atomic_dec_and_test(&obj->referenceCount))
> > > {
> > > __INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> > > delete obj;
> > > }
> > > else
> > > {
> > > __INJECT_RELEASE_MEMORY_BARRIER(); // release semantics
> > > }
>
> Hmmm. Is this sequence correct?

No, of course. I've just copied it from this <http://tinyurl.com/2us7>
message of mine. I really wish that you would jump in and correct me
back then. What I was think of at that time (and now too) is the
following.

Basically, in sort of "portable" terms, the decrement (and sub)
should do the following:

if (ENTER_CRITICAL_REGION_AND__DO_atomic_dec_and_test(&obj->referenceCount))
{
LEAVE_CRITICAL_REGION_AND__INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
delete obj;
}
else
{
LEAVE_CRITICAL_REGION__INJECTING_RELEASE_MEMORY_BARRIER(); // release semantics
}

where neither ENTER_ nor atomic_dec inject barriers.

Is this clear enough now? ;-)

Well, that's why my "sketch" versions of pthread_refcount_t are
"specified" in a way that would allow to drop in *normal mutex*
to synchronize decrements/subs [in addition to fully atomic ops].

[...]


> I would have thought something like the following is better:
>
> __INJECT_RELEASE_MEMORY_BARRIER(); // release semantics
>
> if (atomic_dec_and_test(&obj->referenceCount))
> {
> __INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> delete obj;
> }

Yes, this MIGHT work (depending on the semantics of atomic op and
the preceding RELEASE barrier). Probably.

regards,
alexander.

Mike Mowbray

unread,
Nov 20, 2002, 8:14:33 PM11/20/02
to
Alexander Terekhov wrote:

> > Mike Mowbray wrote:
> >>
> >> Alexander Terekhov wrote:
> >>
> >> > [...Talking about mutable refcounted objects...]
> >>
> >>> > if (atomic_dec_and_test(&obj->referenceCount))
> >>> > {
> >>> > __INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> >>> > delete obj;
> >>> > }
> >>> > else
> >>> > {
> >>> > __INJECT_RELEASE_MEMORY_BARRIER(); // release semantics
> >>> > }
> >>
> >> Hmmm. Is this sequence correct?
>
> > No, of course. I've just copied it from this <http://tinyurl.com/2us7>
> > message of mine. I really wish that you would jump in and correct me
> > back then.

?? OK. Now let's see... Where's that time machine they were
talking about recently on sci.physics? :-)

Seriously though, I've only just recently started thinking more
closely about correct placement of membars in conjunction with
atomic ops. I've had a handle-and-rep class that uses atomic
incr/decr for several years, and I was lucky because it only
needed to run on sparcv8plus (which uses a TSO memory model)
and alpha/Tru64 (which thoughtfully provides membars in its
various load-locked/store-conditional builtin intrinsics).
But now I need to get my act together for sparcv9, so I'm
trying to follow this discussion more closely.

> > Basically, in sort of "portable" terms, the decrement
> > (and sub) should do the following:
> >
> > if (ENTER_CRITICAL_REGION_AND__DO_atomic_dec_and_test(&obj->referenceCount))
> > {
> > LEAVE_CRITICAL_REGION_AND__INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
>
> > delete obj;
> > }
> > else
> > {
> > LEAVE_CRITICAL_REGION__INJECTING_RELEASE_MEMORY_BARRIER(); // release semantics
> > }
> >
> > where neither ENTER_ nor atomic_dec inject barriers.
> >
> > Is this clear enough now? ;-)

Depends on exactly what you mean by "ENTER/LEAVE CRITICAL REGION". I normally
associate these words with locking/unlocking a mutex, but I suspect you
don't mean that. You say that ENTER_CRITICAL_REGION_AND__DO_atomic_dec_and_test
does not involve a membar. In that case, I still think your code sketch above
is wrong, for the reasons explained in my previous post. (I hope it's still
on your server so I don't need to copy it again here. :-)

So..., er..., no, I guess it's either still not clear enough, or it's wrong.


- MikeM.


Alexander Terekhov

unread,
Nov 21, 2002, 8:17:09 AM11/21/02
to

Mike Mowbray wrote:
[...]

> > > Basically, in sort of "portable" terms, the decrement
> > > (and sub) should do the following:
> > >
> > > if (ENTER_CRITICAL_REGION_AND__DO_atomic_dec_and_test(&obj->referenceCount))
> > > {
> > > LEAVE_CRITICAL_REGION_AND__INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> >
> > > delete obj;
> > > }
> > > else
> > > {
> > > LEAVE_CRITICAL_REGION__INJECTING_RELEASE_MEMORY_BARRIER(); // release semantics
> > > }
> > >
> > > where neither ENTER_ nor atomic_dec inject barriers.
> > >
> > > Is this clear enough now? ;-)
>
> Depends on exactly what you mean by "ENTER/LEAVE CRITICAL REGION". I normally
> associate these words with locking/unlocking a mutex, but I suspect you
> don't mean that.

I DO mean sort of "mutex" which...

> You say that ENTER_CRITICAL_REGION_AND__DO_atomic_dec_and_test
> does not involve a membar.

^^^^^^^^^^^^^^^^^^^^^^^^^

on lock() because it "protects" nothing but *atomic* operation/dec. ;-)

Well, the problem is that the count's decrements should not "overtake"
memory transaction(s) resulted from object updates/mutation. Adding
"release" barrier prior to decrement and "acquire" barrier prior to
delete may NOT necessarily ensure this -- depends on the semantics of
atomic op [that is used to decrement count] and/w.r.t. barrier(s),
AFAICS.

regards,
alexander.

Mike Mowbray

unread,
Nov 21, 2002, 8:13:50 PM11/21/02
to
Alexander Terekhov wrote:

> [...] Well, the problem is that the count's decrements


> should not "overtake" memory transaction(s) resulted from
> object updates/mutation.

Right.


> Adding "release" barrier prior to decrement and "acquire"
> barrier prior to delete may NOT necessarily ensure this
> -- depends on the semantics of atomic op [that is used to
> decrement count] and/w.r.t. barrier(s), AFAICS.

Hmmm. Admittedly, I tend to think about this stuff mostly in
terms of sparc cas which doesn't need a membar to implement
atomic incr/decr, and DEC alpha/Tru64's builtin intrinsics
which supply their own membars, but I'm having trouble
imagining any reasonable definition of "atomic incr/decr"
that would interfere with membars in the way you suggest
above. Can you elaborate, pls?


- MikeM.


Hillel Y. Sims

unread,
Nov 21, 2002, 10:37:22 PM11/21/02
to

"Mike Mowbray" <mi...@despammed.com> wrote in message
news:3DDD84B8...@despammed.com...


> Alexander Terekhov wrote:
>
> > Adding "release" barrier prior to decrement and "acquire"
> > barrier prior to delete may NOT necessarily ensure this
> > -- depends on the semantics of atomic op [that is used to
> > decrement count] and/w.r.t. barrier(s), AFAICS.
>
> Hmmm. Admittedly, I tend to think about this stuff mostly in
> terms of sparc cas which doesn't need a membar to implement
> atomic incr/decr, and DEC alpha/Tru64's builtin intrinsics
> which supply their own membars,

According to my system documentation on VMS 7.3 (Alpha), it is true that
older builtin intrinsics automatically generated membars, but newer versions
of some functions (such as atomic incr / decr) do not:

<builtins.h>
** The following operations do not in themselves generate Memory
Barriers.
** The user is expected to code explicitly any Memory Barriers where
** needed using inline assembly code, e.g. asm("mb"), or asm("wmb").
[..]
** Older versions generated memory barriers both before and after the
** update. Newer versions allow the user to control memory barrier
** placement for the low-level primitives.
[..]
int __ATOMIC_INCREMENT_LONG
(volatile void *__addr);
int __ATOMIC_DECREMENT_LONG
(volatile void *__addr);
[..]

(well actually there is also an intrinsic __MB() provided to do asm("mb"),
but no __WMB()...)

>but I'm having trouble
> imagining any reasonable definition of "atomic incr/decr"
> that would interfere with membars in the way you suggest
> above.

I hope you're right... if not, I think it will totally throw off any of the
little that I think I manage to understand so far about how this stuff
works.. :-(

Mike Mowbray

unread,
Nov 21, 2002, 11:37:11 PM11/21/02
to
"Hillel Y. Sims" wrote:


> > "Mike Mowbray" <mi...@despammed.com> wrote in message
> > news:3DDD84B8...@despammed.com...
> >>

> >> [...] I tend to think about this stuff mostly in


> >> terms of sparc cas which doesn't need a membar to implement
> >> atomic incr/decr, and DEC alpha/Tru64's builtin intrinsics
> >> which supply their own membars,

> According to my system documentation on VMS 7.3 (Alpha), it
> is true that older builtin intrinsics automatically generated
> membars, but newer versions of some functions (such as atomic

> incr/decr) do not: [...]

Right you are! Coincidentally, I went back and re-read the header
files shortly after posting my previous message - and was dismayed
to read exactly what you say above. My recollection was obviously
from an older version.

> >> but I'm having trouble imagining any reasonable definition
> >> of "atomic incr/decr" that would interfere with membars

> >> in the way (Alexander) suggests [...]

> I hope you're right... if not, I think it will totally throw
> off any of the little that I think I manage to understand so
> far about how this stuff works.. :-(

I know how you feel! I hope that Alexander and Joe will explain
it more clearly/thoroughly than what I've read so far. :-)

In a previous article, Joe Seigh wrote:

> How this works. Simple, a reference count can be expressed
> as the difference between a count of references started
> and references ended.
>
> Rc = Rs - Re
>
> Furthermore this can be multiple counts.
>
> Rc = Sum(Rs) - Sum(Re)
>
> Since the count is now split up, you can keep the start
> reference count with the pointer and both fetch both the
> pointer value and increment the start reference count
> atomically which solves the race condition that regular
> reference counting has.
>
> However since the process of summing up the reference counts
> may be asynchronous and distributed, you need an additional
> mechanism to determine whether an apparent sum of interest
> (usually zero) is a partial sum or not. Various mechanisms
> can be used.
>
> These reference count pointers cannot be copied however, more
> precisely, you cannot make a pointer that is itself copyable
> so it is a pointer of a different type.
>
> So to solve the asynchronous summing problem and the copy
> problem, you just combine these types of reference counts,
> called ephemeral reference counts, with regular reference
> counts.
>
> A reference count object becomes deletable when the regular
> reference count becomes zero and the ephemeral count becomes
> zero which usually means no copies are in progress.

I'm sorry, Joe. Maybe I'm a bit dense, but there's just not
enough information in the above for me to understand in detail
how your mechanism operates. I'd like to understand it better,
and I could probably help you implement/test the atomic parts
on my sparc+alpha machines here if I had more of a clue about
what you're doing. Is there any chance you can take a bit of
time to write a more pedestrian and complete exposition, with
step-by-step examples for the dummies?


- MikeM.

Konrad Schwarz

unread,
Nov 22, 2002, 3:16:25 AM11/22/02
to
Mike Mowbray schrieb:

> I'm sorry, Joe. Maybe I'm a bit dense, but there's just not
> enough information in the above for me to understand in detail
> how your mechanism operates. I'd like to understand it better,
> and I could probably help you implement/test the atomic parts
> on my sparc+alpha machines here if I had more of a clue about
> what you're doing. Is there any chance you can take a bit of
> time to write a more pedestrian and complete exposition, with
> step-by-step examples for the dummies?

I'd be interested too, FWIW.

Alexander Terekhov

unread,
Nov 22, 2002, 5:40:06 AM11/22/02
to

I like this:

http://www.primenet.com/~jakubik/mpsafe/MultiprocessorSafe.pdf

<quote>

To implement mutex locking, the system software uses a higher-level
lock called a spin lock. On systems with relaxed memory models, this
requires help from the hardware. Most modern CPUs provide atomic
instructions to help with this problem. One common atomic instruction
that many different CPUs provide is an atomic swap. An atomic swap
allows a single instruction to swap data in the CPU with data in
memory. On CPUs with relaxed memory models, like the model we have
been examining, the atomic instruction will do whatever it takes to
make this instruction atomic despite the memory model. Doing so may
involve bypassing the queue of memory operations, or causing some
degree of synchronization with the queue.

The details are beyond the scope of this article."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

</quote>

;-)

Well, I guess that it's time for David Butenhof to jump in and
shred some light on this issues, so to speak.

regards,
alexander.

Joseph Seigh

unread,
Nov 22, 2002, 7:41:10 AM11/22/02
to

Alexander Terekhov wrote:
>
> "Hillel Y. Sims" wrote:
> >
> > "Mike Mowbray" <mi...@despammed.com> wrote in message
> > news:3DDD84B8...@despammed.com...
> > > Alexander Terekhov wrote:
> > >
> > > > Adding "release" barrier prior to decrement and "acquire"
> > > > barrier prior to delete may NOT necessarily ensure this
> > > > -- depends on the semantics of atomic op [that is used to
> > > > decrement count] and/w.r.t. barrier(s), AFAICS.

This is probably closer to reality. Atomic ops don't need memory
barriers to work correctly. They work directly on memory. But
if you are going to use them in conjunction with some other scheme
then you may need memory barriers to synchronize memory state with
the atomic ops. And that all depends on what those other schemes
are.

As far as smart pointers are concerned, there is no intrinsic reason
they need memory barriers. But without memory barriers they would
be about a useless as Java volatile pointers. Actually less so.
Java at least guarantees you see the object monitor in a valid
initial state.

So basically I have to figure out what uses you could put smart pointers
to, e.g. DCL, copy on write, wait-free data structures, etc..., and then
figure out what memory barriers would be needed in order for these
schemes to work. And then document those semantics. Something that
doesn't seem to be industry practice. Incidentially, I posted formal
semantics for mutual exclusion to c.p.t back arount Nov. 2000. While
not "official" ("official" and formal aren't necessarilly the same)
they are probably the only formal semantics that exist for mutual
exclusion. What Java has is a meta-implementation, their memory
model, which they use as a "specification". It's pretty much useless
for program proving for any Java program over 3 lines of code.

Joe Seigh

Joseph Seigh

unread,
Nov 22, 2002, 8:24:19 AM11/22/02
to

Mike Mowbray wrote:
...


>
> In a previous article, Joe Seigh wrote:

<snip>


>
> I'm sorry, Joe. Maybe I'm a bit dense, but there's just not
> enough information in the above for me to understand in detail
> how your mechanism operates. I'd like to understand it better,
> and I could probably help you implement/test the atomic parts
> on my sparc+alpha machines here if I had more of a clue about
> what you're doing. Is there any chance you can take a bit of
> time to write a more pedestrian and complete exposition, with
> step-by-step examples for the dummies?
>

It's on my list I guess. It's not that easy. One advantage
researchers and academics especially have is face time where
they get instant feedback on what parts people are having
difficulty with and what approaches work.

Also, part of the problem is I have been working with this stuff
so long it's second nature. I can think in multi-threading. But
I do go the formal route occasionally. It's a good exercise. You
may think you know something but then you'll find out that's not
really the case.

I already did a native implemenation on sparc which works pretty
well. Actually will work pretty well once I go back and fix
one thing I messed up changing to interface to the native code.
Things got pretty ugly in the heap manager. :-)

I had a sparc programmers manual to help with that. I want to
do an intel implementation next. I suspect I will have to spend
some outrageous sum for an Intel prorammers manual, I don't think
there are any online. This is not good given my current lack
of income.

The interface I'm using is

extern int cas64(void *target, void *cmp, void *value);

return code 0 failed, 1 success
cmp is updated with current value of target.

operands are 64 bit anythings hence void to avoid unecessary casting.

value should probably be const.

and no membars probably.

Joe Seigh

David Butenhof

unread,
Nov 22, 2002, 9:26:17 AM11/22/02
to
Alexander Terekhov wrote:

While that's true OF THE ATOMIC SEQUENCE itself, this is of no help to
surrounding code. Note that Alpha has no "atomic swap instruction", but
rather an extremely general and flexible pair of "locked" instructions,
"load-locked" and "store-conditional" that can be used to build atomic
queues, atomic bit set/clears, increment/decrements, and and lots more.
(While there are restrictions on the nature and number of instructions
between the pair, I've coded some really complicated conditional assembly
sequences before eventually coming to the conclusion I'd've been better off
just using a compare-and-swap. ;-) )

In the simplest cases, the memory guarantees you get "for free" from the
atomic sequence are just fine. If I want an atomic counter that doesn't
"mean" anything beyond its value, that's exactly what I want. It's
accurate, and it has no added overhead. (This is also why the Alpha
intrinsics don't include MB instructions... they're relatively expensive
and unnecessary in the simple cases that you want to run fast.)

HOWEVER, this all changes when the value you're adjusting atomically has
some external meaning to the program.

If I atomically insert a structure onto a queue, the atomic sequence itself
is enough to ensure that an atomic dequeue will see the pointer I wrote.
But it's not enough to ensure that it will see the CONTENTS of the
structure to which it points. That requires memory barriers on both sides
of the transaction. The result might be something like this:

(writer:)

<initialize structure>
__MB();
__ATOMIC_EXCH_QUAD(); /*insert*/

(reader:)

__ATOMIC_EXCH_QUAD(); /*remove*/
__MB();
<use structure>

Or you could imagine instead using atomic increment/decrement to designate
the "valid" (written) slice of a vector; you'd write the vector's element,
issue the MB, and then do the atomic increment. (And similarly in the
reader.)

(Oh, and note another good reason for omitting the mb from the stock
intrinsics... to be general purpose it'd need one on EACH side, but one
would almost always be wasted. The alternative would be to proliferate
variations that combined the operation with one mb or the other... and
sometimes you really do want both. It's much neater this way.)

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

Alexander Terekhov

unread,
Nov 22, 2002, 11:28:57 AM11/22/02
to

David Butenhof wrote:
[...]

> (writer:)
>
> <initialize structure>
> __MB();
> __ATOMIC_EXCH_QUAD(); /*insert*/
>
> (reader:)
>
> __ATOMIC_EXCH_QUAD(); /*remove*/
> __MB();
> <use structure>

I'd prefer:

(writer:)

<initialize structure>
// __MB();
__ATOMIC_EXCH_QUAD_REL(); /*insert -- RELEASE semantics*/

(reader:)

__ATOMIC_EXCH_QUAD_ACQ(); /*remove -- ACQUIRE semantics*/
// __MB();
<use structure>

AFAICS, the problem with "__MB()"-plus-naked-atomics version is that in
the case of "writer:", the __ATOMIC_EXCH_QUAD() implementation should
synchronize with respect to write queue(s) -- should inspect the presence
of membar(s) there and "wait" till preceding writes "complete". This [i.e.
sync with respect to *writes*] is unneeded in the "reader:" case, I think.
Or am I just missing and/or misunderstanding something?

regards,
alexander.

David Butenhof

unread,
Nov 22, 2002, 12:58:33 PM11/22/02
to
Alexander Terekhov wrote:

>
> David Butenhof wrote:
> [...]
>> (writer:)
>>
>> <initialize structure>
>> __MB();
>> __ATOMIC_EXCH_QUAD(); /*insert*/
>>
>> (reader:)
>>
>> __ATOMIC_EXCH_QUAD(); /*remove*/
>> __MB();
>> <use structure>
>
> I'd prefer:
>
> (writer:)
>
> <initialize structure>
> // __MB();
> __ATOMIC_EXCH_QUAD_REL(); /*insert -- RELEASE semantics*/
>
> (reader:)
>
> __ATOMIC_EXCH_QUAD_ACQ(); /*remove -- ACQUIRE semantics*/
> // __MB();
> <use structure>

More interfaces to worry about. Leaving out the barriers gives the coder
total flexibility via more direct access to the hardware. The extra forms
provide little real value to someone who's legitimately programming at that
level; either they know the hardware and can probably work better with
clean primitives... or they're pretty much doomed to mess it up anyway. The
REL and ACQ versions map perfectly into the IPF instruction set, by the
way; is that deliberate or accidental?

> AFAICS, the problem with "__MB()"-plus-naked-atomics version is that in
> the case of "writer:", the __ATOMIC_EXCH_QUAD() implementation should
> synchronize with respect to write queue(s) -- should inspect the presence
> of membar(s) there and "wait" till preceding writes "complete". This [i.e.
> sync with respect to *writes*] is unneeded in the "reader:" case, I think.
> Or am I just missing and/or misunderstanding something?

You can't (in general) use a WMB (writer barrier) on the enqueue side,
unless you know that there are no independent reads that could be
destructively reordered across the WMB. Often, this is reasonable since
you're only concerned about the structure and the atomic queue's pointer;
but it could never be put into a general purpose mechanism like
pthread_mutex_lock(), or __ATOMIC_EXCH_QUAD().

On the other side, there's no such thing as an RMB, even if you're sure
there are no writes that might be destructively reordered -- so there's no
option on that side.

Hillel Y. Sims

unread,
Nov 22, 2002, 11:11:54 PM11/22/02
to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3DDE3167...@xemaps.com...

>
> I had a sparc programmers manual to help with that. I want to
> do an intel implementation next. I suspect I will have to spend
> some outrageous sum for an Intel prorammers manual, I don't think
> there are any online. This is not good given my current lack
> of income.

Is this helpful?
http://developer.intel.com/design/pentium4/manuals/

Joseph Seigh

unread,
Nov 23, 2002, 1:14:59 PM11/23/02
to

"Hillel Y. Sims" wrote:
>
> "Joseph Seigh" <jsei...@xemaps.com> wrote in message
> news:3DDE3167...@xemaps.com...
> >
> > I had a sparc programmers manual to help with that. I want to
> > do an intel implementation next. I suspect I will have to spend
> > some outrageous sum for an Intel prorammers manual, I don't think
> > there are any online. This is not good given my current lack
> > of income.
>
> Is this helpful?
> http://developer.intel.com/design/pentium4/manuals/
>

Actually, I downloaded them yesterday. A couple of days earlier those
web pages were down for maintenance with a note saying that you could
get the publications from elsewhere (for $$). So I didn't find out
they were free until yesterday. Last time I did assembly programming on
intel was on an 8088, actually bit poking in basic for a while until the
assembler became available. I always thought the register layout and usage was
kind of stupid. Every other instruction ended up being a mov. They
seem to have kept their compatibility with respect to this very well.

Also, I was helping out annotating a disassembley of DOS 1.0 until we
were told to stop. I had the command handler and program loader figured out.

Joe Seigh

Alexander Terekhov

unread,
Nov 23, 2002, 2:24:00 PM11/23/02
to

David Butenhof wrote:
[...]

> The REL and ACQ versions map perfectly into the IPF instruction set,
> by the way; is that deliberate or accidental?

Deliberate. Java's revised memory model with its ACQ/REL volatiles
aside for a moment, it basically comes from:

["Intel(R) Itanium(R) Architecture Software Developer's Manual"
http://developer.intel.com/design/itanium/manuals ]

"Memory accesses follow one of four memory ordering semantics:
unordered, release, acquire or fence. Unordered data accesses
may become visible in any order. Release data accesses
guarantee that all previous data accesses are made visible
prior to being made visible themselves. Acquire data accesses
guarantee that they are made visible prior to all subsequent
data accesses. Fence operations combine the release and
acquire semantics into a bi-directional fence, i.e., they
guarantee that all previous data accesses are made visible
prior to any subsequent data accesses being made visible.

Explicit memory ordering takes the form of a set of
instructions: ordered load and ordered check load (ld.acq,
ld.c.clr.acq), ordered store (st.rel), semaphores (cmpxchg,
xchg, fetchadd), and memory fence (mf). The ld.acq and
ld.c.clr.acq instructions follow acquire semantics. The
st.rel follows release semantics. The mf instruction is a
fence operation. The xchg, fetchadd.acq, and cmpxchg.acq
instructions have acquire semantics. The cmpxchg.rel, and
fetchadd.rel instructions have release semantics. The
semaphore instructions also have implicit ordering. If
there is a write, it will always follow the read. In
addition, the read and write will be performed atomically
with no intervening accesses to the same memory region.

....

The Itanium architecture provides a set of three semaphore
instructions: exchange (xchg), compare and exchange (cmpxchg),
and fetch and add (fetchadd). Both cmpxchg and fetchadd may
have either acquire or release semantics depending on the
specific opcode chosen. The xchg instruction always has
acquire semantics. These instructions read a value from
memory, modify this value using an instruction-specific
operation, and then write the modified value back to memory.
The read-modify-write sequence is atomic by definition."

Well, what I don't quite follow is why do they sort of
"restrict-way-too-much" IA-32's stuff [running on IA-64]:

<quote>

IA-32 instructions are mapped into the Itanium memory
ordering model as follows:

- All IA-32 stores have release semantics
- All IA-32 loads have acquire semantics

</quote>

given that IA-32's SSE/SSE2 extensions DID inroduce "out-of-
order" memory ordering and barriers [load-fence/store-fence/
both-load-and-store-fence] to IA-32 as well. Any ideas?

Well, back to ref.counting [for mutable stuff; WITH memory
sync]. "Itanic" version should probably use this:

cmpxchg.rel for decrements/subs PLUS memory fence (mf)
when the count drops to zero [well, ld.acq to load the
object pointer and invoke "delete" would work too, but
would somewhat "complicate" the *pthread_refcount_t*
interface, I guess ;-) ].

Correct?

>
> > AFAICS, the problem with "__MB()"-plus-naked-atomics version is that in
> > the case of "writer:", the __ATOMIC_EXCH_QUAD() implementation should
> > synchronize with respect to write queue(s) -- should inspect the presence
> > of membar(s) there and "wait" till preceding writes "complete". This [i.e.
> > sync with respect to *writes*] is unneeded in the "reader:" case, I think.
> > Or am I just missing and/or misunderstanding something?
>
> You can't (in general) use a WMB (writer barrier) on the enqueue side,
> unless you know that there are no independent reads that could be
> destructively reordered across the WMB. Often, this is reasonable since
> you're only concerned about the structure and the atomic queue's pointer;

Yep.

> but it could never be put into a general purpose mechanism like
> pthread_mutex_lock(),

Yup. pthread_mutex_lock() should have "acquire" semantics and
pthread_mutex_lock() should have "release".


> or __ATOMIC_EXCH_QUAD().
>
> On the other side, there's no such thing as an RMB,

Except that *IA-32* does have "LFENCE" thing. ;-)

> even if you're sure
> there are no writes that might be destructively reordered -- so there's no
> option on that side.

regards,
alexander.

Hillel Y. Sims

unread,
Nov 24, 2002, 1:31:59 PM11/24/02
to
Apologies in advance for being really dense...

"David Butenhof" <David.B...@compaq.com> wrote in message
news:d8rD9.1$K85....@news.cpqcorp.net...


>
> In the simplest cases, the memory guarantees you get "for free" from the
> atomic sequence are just fine. If I want an atomic counter that doesn't
> "mean" anything beyond its value, that's exactly what I want. It's
> accurate, and it has no added overhead.

If I understand correctly, the following is safe on an Alpha SMP system
(with respect to _only_ this object):

class RawCounter {
int count;
public:
RawCounter() : count(0) {}

void incr() { __ATOMIC_INCREMENT_LONG(&count); }
void decr() { __ATOMIC_DECREMENT_LONG(&count); }
int get_count() const { return count; }
};

No need for mb here before reading 'count', the reader will always see the
up-to-date version (in any of the three relevant functions). Is this
correct?

The thing I am a bit confused about is how this relates to the description
of Alpha cache coherency wrt reordering issues at
http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html :
Initial state: p = &x, x = 1, y = 0
T1: y = 1; mb; p = &y
T2: i = *p
Can result in: i == 0


<quote>
Here is what has to happen for this behavior to show up. Assume T1 runs on
P1 and T2 on P2. P2 has to be caching location y with value 0. P1 does y=1
which causes an "invalidate y" to be sent to P2. This invalidate goes into
the incoming "probe queue" of P2; as you will see, the problem arises
because this invalidate could theoretically sit in the probe queue without
doing an MB on P2. The invalidate is acknowledged right away at this point
(i.e., you don't wait for it to actually invalidate the copy in P2's cache
before sending the acknowledgment). Therefore, P1 can go through its MB. And
it proceeds to do the write to p. Now P2 proceeds to read p. The reply for
read p is allowed to bypass the probe queue on P2 on its incoming path (this
allows replies/data to get back to the 21264 quickly without needing to wait
for previous incoming probes to be serviced). Now, P2 can derefence P to
read the old value of y that is sitting in its cache (the inval y in P2's
probe queue is still sitting there).

How does an MB on P2 fix this? The 21264 flushes its incoming probe queue
(i.e., services any pending messages in there) at every MB. Hence, after the
read of P, you do an MB which pulls in the inval to y for sure. And you can
no longer see the old cached value for y.

</quote>

If I understand correctly, this means that T2 should really look like: "q =
p; mb; i = *q" and then it would not be possible to see i == 0. But... what
if P2 was also already caching location p before T2 runs? Could not the
scenario described about incompletely processed probe queue entries apply in
the same way, such that without a prior 'read/acquire' mb by T2 to force p
to be updated, the read "q = p" might see the _old_ value of p (&x), having
not yet processed its incoming probe queue invalidations (and then even
after the intermediate mb, the "i = *q" read may pull in the value of x,
instead of y)? So T2 _really_ needs to be: "mb; q = p; mb; i = *q"? If not,
why is it guaranteed for the read of p to be correct while the read of y was
not?


thanks!

David Butenhof

unread,
Nov 25, 2002, 11:48:15 AM11/25/02
to
Hillel Y. Sims wrote:

> Apologies in advance for being really dense...
>
> "David Butenhof" <David.B...@compaq.com> wrote in message
> news:d8rD9.1$K85....@news.cpqcorp.net...
>>
>> In the simplest cases, the memory guarantees you get "for free" from the
>> atomic sequence are just fine. If I want an atomic counter that doesn't
>> "mean" anything beyond its value, that's exactly what I want. It's
>> accurate, and it has no added overhead.
>
> If I understand correctly, the following is safe on an Alpha SMP system
> (with respect to _only_ this object):
>
> class RawCounter {
> int count;
> public:
> RawCounter() : count(0) {}
>
> void incr() { __ATOMIC_INCREMENT_LONG(&count); }
> void decr() { __ATOMIC_DECREMENT_LONG(&count); }
> int get_count() const { return count; }
> };
>
> No need for mb here before reading 'count', the reader will always see the
> up-to-date version (in any of the three relevant functions). Is this
> correct?

Define "correct". Although the UPDATES are synchronized through to main
memory, there's no such guarantee for the read. Although atomic writes from
other processors must have already notified the reading processor's cache,
I don't think there's any rule that it must have processed all such
notifications before retiring the read. In practice, it doesn't matter;
even if all previous updates HAD been handled, another processor could
write immediately after the fetch (long before the issuing instruction
stream could possibly do anything with the read's result).

In any case, an 'mb' wouldn't help the fetch, because there's no ORDERING
involved. And again, restating: the fact that the read may be "out of date"
(at least by some definition) has nothing to do with out of sequence reads
or writes, but only with the fact that it's a multiprocessor. (And if it's
not a multiprocessor, then all reads and writes, "atomic" or not, are
sequential.)

I wish I had time to parse all that and constrast against architectural
documentation, but I really can't spare it now.

The real answer, in any case, goes back to my opening point on the first
paragraph: "Define correct." Memory barriers are NOT intended to be global
data flushes. They're intended to preserve ORDERING dependencies between
processors.

"y = 0; mb; x = 1;" means that the write of 1 to x shall not be made visible
to other processors until the write of 0 to y is visible. Similarly, "t0 =
x; mb; t1 = y;" means that the read of y shall not occur before the read of
x. That is, the value read for y shall be "at least as up to date" as the
value of x. This is critical if x is a pointer to y, for example, as in an
atomic queue. But this isn't intended to be a guarantee that no later
writes have occurred to either on different processors that haven't been
flushed through the system.

After all, there's no way to consistently define "now" across a
multiprocessor without suspending all other processors, making the memory
subsystem (including all caches) consistent, and then performing the
operation. (And even then, "now" can be uniquely defined only until you
resume operation of the other processors.)

Or to put it another way, what difference does it make if I can say I read
the "current value of y", if some other processor can change y between my
read and my first opportunity to do anything with the value I read?

Alexander Terekhov

unread,
Nov 25, 2002, 12:48:07 PM11/25/02
to

David Butenhof wrote:
[...]

> After all, there's no way to consistently define "now" across a
> multiprocessor without suspending all other processors, making the memory
> subsystem (including all caches) consistent, and then performing the
> operation. (And even then, "now" can be uniquely defined only until you
> resume operation of the other processors.)

Uhmm...

>
> Or to put it another way, what difference does it make if I can say I read
> the "current value of y", if some other processor can change y between my
> read and my first opportunity to do anything with the value I read?

Yes, but the "current value of y" MAY indicate [presuming that it IS the
'current' one] that no other processor can ever "change y between my read
and my first opportunity to do anything with the value I read". THAT might
be really useful [or, actually even required; see 'unshareble'/0 state
below]; e.g. to avoid subsequent 'atomic'/CAS/LL-SC-like read-modify op
[it is probably much more expensive than 'a simple write']. I mean:

< IntAtomicGet() is used to read the *current* value >

http://groups.google.com/groups?selm=3D81E7B3.D462E6DE%40web.de
http://groups.google.com/groups?selm=3D858C7A.2975BA4A%40web.de
(Subject: Re: Ref-counted strings not thread-safe -- looking for clarification)

inline StringBuf::StringBuf() : len(0), used(0), refs(1), buf(0) { }

inline StringBuf::StringBuf( AppendStringBufPtr other )
: len(0), used(0), refs(1), buf(0)
{
Reserve( max( other->len, other->used+1 ) );
memcpy( buf, other->buf, used = other->used );
if ( 0 == IntAtomicDecrement( other->refs ) )
other.destruct();
}

inline StringBuf::StringBuf( CharRefStringBufPtr other )
: len(other->len), used(other->used), refs(0), buf(new char[len])
{
++String::nAllocs;
memcpy( buf, other->buf, used );
if ( 0 == IntAtomicDecrement( other->refs ) )
other.destruct();
}

inline void String::Clear() {
if ( 2 > IntAtomicGet( data_->refs ) ||
1 > IntAtomicDecrement( data_->refs ) ) {
data_->Clear();
data_->refs = 1; // mutation -> 'shareable'
}
else {
data_ = new StringBuf;
}
}

inline void String::Append( char c ) {
switch ( IntAtomicGet( data_->refs ) ) {
case 00: data_->refs = 1; // mutation -> 'shareable'
case 01: data_->Reserve( data_->used+1 ); break;
default: data_ = new StringBuf( AppendStringBufPtr( data_ ) );
} data_->buf[data_->used++] = c;
}

inline char& String::operator[]( size_t n ) {
switch ( IntAtomicGet( data_->refs ) ) {
case 01: data_->refs = 0; // 'unshareable'
case 00: break;
default: data_ = new StringBuf( CharRefStringBufPtr( data_ ) );
}
return data_->buf[n];
}

regards,
alexander.

Hillel Y. Sims

unread,
Nov 25, 2002, 11:58:42 PM11/25/02
to

"David Butenhof" <David.B...@compaq.com> wrote in message
news:jvsE9.9$Wn2.5...@news.cpqcorp.net...

>
> Similarly, "t0 =
> x; mb; t1 = y;" means that the read of y shall not occur before the read
of
> x. That is, the value read for y shall be "at least as up to date" as the
> value of x.
[..]

> After all, there's no way to consistently define "now" across a
> multiprocessor without suspending all other processors, making the memory
> subsystem (including all caches) consistent, and then performing the
> operation. (And even then, "now" can be uniquely defined only until you
> resume operation of the other processors.)
>
> Or to put it another way, what difference does it make if I can say I read
> the "current value of y", if some other processor can change y between my
> read and my first opportunity to do anything with the value I read?
>

Wow. I am trying to let all this sink in. This is really starting to sound
like quantum physics... (the "Butenhof Uncertainty Principle"??)

Joseph Seigh

unread,
Nov 26, 2002, 7:05:52 AM11/26/02
to

Mike Mowbray wrote:
> > what you're doing. Is there any chance you can take a bit of
> time to write a more pedestrian and complete exposition, with
> step-by-step examples for the dummies?
>


We'll try a by the numbers example of a smart pointer and making a copy of it.
There's two kinds of references. One is called ephemeral to distinguish it from
the other.

The smart pointer has an ephemeral count and a pointer to the reference object which
has an ephemeral count and a regular reference count and a pointer to the acutal object.
The ephemeral count is a differential count. The smart pointer part is the reference
started part and the ref obj part is the reference ended part (stores as a negative number)


After ctor of smart pointer you have bot ephemeral count parts set to zero, and regular
count to 1.

sptr ref A
0) 0, ref -> 0, 1, A -> ...

For a thread to copy the pointer, it atomically increments the ephemeral count by 1 as
it fetches the ref value, thus

1) 1, ref -> 0, 1, A -> ...

Then it updates the ref object by decrementing its ephemeral count by 1 and incrementing the
regular count by 1. New copy is init to "0, ref".

2) 1, ref -> -1, 2, A -> ...
0, ref ->

Now suppose some thread deletes the first smart pointer

3) 0, null -1, 2, A -> ...
0, ref ->

and updates shared ref object *adding* its ephemeral count to ref obj ephemeral count
and decrementing regular ref count. Both are not zero so object is not deleted.

4) 0, null 0, 1, A -> ...
0, ref ->

--


Suppose instead that the pointer being copied is deleted before the shared reference count is
updated. In regular reference counting schemes this is where the race condition occurs.
So picking up at step 1 above

1) 1, ref -> 0, 1, A -> ...

Another thread deletes pointer

2) 0, null 0, 1, A -> ...

and updates shared ref object *adding* its ephemeral count to ref obj ephemeral count
and decrementing regular ref count. Both are not zero so object is not deleted.

3) 0, null 1, 0, A -> ...

First thread finishes copying of smart pointer, step 2 of first example, i.e.
decrement ephemeral count by one and increment regular count by 1.

4) 0, null 0, 1, A -> ...
0, ref ->

--

Joe Seigh

David Butenhof

unread,
Nov 26, 2002, 8:43:09 AM11/26/02
to
Hillel Y. Sims wrote:

Asynchronous programming IS a little like quantum physics. You have to
design your code to accept, embrace, and exploit uncertainty instead of
fearing it. The only way to eliminate uncertainty is to make everything
sequential, which would be a pretty silly way to apply a multiprocessor!

That's exactly why mutexes exist, which is why I've often joked that they
should be called "bottlenecks". You sacrifice parallelism for serialization
when you really need it -- but you must strive to minimize the need.

Uncertainty ("quantum threading?") is your friend.

David Butenhof

unread,
Nov 26, 2002, 9:59:24 AM11/26/02
to
Alexander Terekhov wrote:

>> Or to put it another way, what difference does it make if I can say I
>> read the "current value of y", if some other processor can change y
>> between my read and my first opportunity to do anything with the value I
>> read?
>
> Yes, but the "current value of y" MAY indicate [presuming that it IS the
> 'current' one] that no other processor can ever "change y between my read
> and my first opportunity to do anything with the value I read". THAT might
> be really useful [or, actually even required; see 'unshareble'/0 state
> below]; e.g. to avoid subsequent 'atomic'/CAS/LL-SC-like read-modify op
> [it is probably much more expensive than 'a simple write']. I mean:

If 'y' is written ONLY ONCE, and with proper ordering (memory barriers),
then the reader can be sure that a changed value will not change. That's
'ye olde' double-checked initialization pattern. If you can't see that
initialization has occurred, you need to synchronize and try again. If you
CAN see that initialization has occurred, you need only ensure memory
ordering.

If the sequence 'if y then x' is SYNCHRONIZED, then you know no other thread
can change 'y' until you're done with your critical region.

In fully mutable data, without synchronization, you can never be sure that
'y' won't change out from under you at any instant, no matter what value
you've just read. Even if "application logic" states that the value is a
marker of some significance and "shall not change", you can't be sure the
value is up to date.

> < IntAtomicGet() is used to read the *current* value >

There's no such thing as "atomic get". On Alpha or IPF, the actual
implementation would be something along the lines of a compare-and-swap
implementing 'if the current value is what I think it is, make it what it
is'. Otherwise it's just a read, with no ordering or coherency
restrictions, and you can't know that the value is "up to date". While the
compare-and-swap may do what you want, it's "cheating" in that it's really
just memory synchronization. It's no different (to the machine) than any
other form of synchronization, including a spinlock.

If you "atomically set" a state that you know must remain constant until you
release it, then indeed you know nobody else can change it "between the
read and your use of the value", and you can make safe decisions based on
that state. In this case, you're "setting" the state that's already
there... but that really makes no difference. You've just used an atomic
operation to implement your own customized form of lock. Nothing wrong with
that at all, but it's nothing special, either.

So, yes, one may sometimes avoid an atomic sequence by verifying its safety
with an atomic sequence. Just as when one locks a mutex. Granted, on
uniprocessors or multiprocessors with globally ordered memory operations,
IntAtomicGet() may be just a read, and possibly cheaper than a
compare-and-swap. That doesn't generalize, though, so it's just another
possible machine-specific optimization.

Alexander Terekhov

unread,
Nov 26, 2002, 11:28:50 AM11/26/02
to

David Butenhof wrote:
[...]

> > < IntAtomicGet() is used to read the *current* value >
>
> There's no such thing as "atomic get". ....

Okay, let me put it this way... Below you can find a sort of "updated"
COW string example using the hypothetical pthread_refcount_t interface.
Do you see any problems with it? What are they? I hope that you'll
"deduce" the meaning of pthread_refcount_t calls from it with the COW
example used as an illustration. I'm sorry, but it's still in C++. ;-)

pthread_refcount_t interface(*) used [without error checking and I've
not yet had a chance to compile it, sorry]:

pthread_refcount_init()
pthread_refcount_destroy()
pthread_refcount_getvalue()
pthread_refcount_setvalue()
pthread_refcount_increment()
pthread_refcount_decrement_no_msync()

COW string illustration based on Herb Sutter's "gotw-045-test-harness":
<http://www.gotw.ca/gotw/045.htm> <http://www.gotw.ca/gotw/045code.zip>

namespace COW_AtomicInt3 {

#undef NAME
#undef BAGGAGE
#define NAME "COW_AtomicInt3"
#define BAGGAGE

size_t IntAtomicGet( pthread_refcount_t& refs ) {
size_t result;
pthread_refcount_getvalue( &refs, &result );
return result;
}

void IntAtomicIncrement( pthread_refcount_t& refs ) {
pthread_refcount_increment( &refs );
}

size_t IntAtomicDecrement( pthread_refcount_t& refs ) {
int status = pthread_refcount_decrement_no_msync( &refs );
if (PTHREAD_REFCOUNT_DROPPED_TO_ZERO == status )
return 0;
// error checking ...
return 1; // Whatever !0
}

class String;
struct StringBuf;

class AppendStringBufPtr {
StringBuf* ptr;
public:
AppendStringBufPtr( StringBuf* p ) : ptr( p ) {}
StringBuf* operator->() const { return ptr; }
void destruct() { delete ptr; }
};

class CharRefStringBufPtr {
StringBuf* ptr;
public:
CharRefStringBufPtr( StringBuf* p ) : ptr( p ) {}
StringBuf* operator->() const { return ptr; }
void destruct() { delete ptr; }
};

struct StringBuf {
StringBuf();
~StringBuf();
StringBuf( const StringBuf& other );
StringBuf( AppendStringBufPtr other );
StringBuf( CharRefStringBufPtr other );

void Clear();
void Reserve( size_t n );

size_t len;
size_t used;
pthread_refcount_t refs;
char* buf;
BAGGAGE;

void* operator new( size_t n );
void operator delete( void* p );
};

class String {
public:
String();
~String();
String( const String& );

void Clear();
void Append( char );
size_t Length() const;
char& operator[](size_t);

static int nCopies;
static int nAllocs;
private:
StringBuf* data_;
};

int String::nCopies;
int String::nAllocs;

static FastArena fa( NAME, sizeof(StringBuf) );
void* StringBuf::operator new( size_t n ) { return fa.Allocate( n ); }
void StringBuf::operator delete( void* p ) { fa.Deallocate( p ); }

inline StringBuf::StringBuf() : len(0), used(0), buf(0) {
pthread_refcount_init( &refs, 1 );
}

inline StringBuf::~StringBuf() {
pthread_refcount_destroy( &refs );
delete[] buf;
}

inline StringBuf::StringBuf( const StringBuf& other )
: len(other.len), used(other.used), buf(new char[len])
{
++String::nAllocs;
pthread_refcount_init( &refs, 1 );
memcpy( buf, other.buf, used );
}

inline StringBuf::StringBuf( AppendStringBufPtr other )

: len(0), used(0), buf(0)
{
pthread_refcount_init( &refs, 1 );


Reserve( max( other->len, other->used+1 ) );
memcpy( buf, other->buf, used = other->used );
if ( 0 == IntAtomicDecrement( other->refs ) )
other.destruct();
}

inline StringBuf::StringBuf( CharRefStringBufPtr other )

: len(other->len), used(other->used), buf(new char[len])
{
++String::nAllocs;
pthread_refcount_init( &ref, 0 );


memcpy( buf, other->buf, used );
if ( 0 == IntAtomicDecrement( other->refs ) )
other.destruct();
}

inline void StringBuf::Clear() {
delete[] buf;
buf = 0;
len = 0;
used = 0;
}

inline void StringBuf::Reserve( size_t n ) {
<snip>
}

inline String::String() : data_(new StringBuf) { }

inline String::~String() {


if ( 2 > IntAtomicGet( data_->refs ) ||
1 > IntAtomicDecrement( data_->refs ) )

delete data_;
}

inline String::String( const String& other )
{
switch ( IntAtomicGet( other.data_->refs ) ) {
case 00: data_ = new StringBuf( *other.data_ ); break;
default: IntAtomicIncrement( (data_ = other.data_)->refs );
}
++nCopies;
}

inline void String::Append( char c ) {
switch ( IntAtomicGet( data_->refs ) ) {

case 00: pthread_refcount_setvalue( &data_->refs, 1 ); // mutation -> 'shareable'


case 01: data_->Reserve( data_->used+1 ); break;
default: data_ = new StringBuf( AppendStringBufPtr( data_ ) );
} data_->buf[data_->used++] = c;
}

inline char& String::operator[]( size_t n ) {
switch ( IntAtomicGet( data_->refs ) ) {

case 01: pthread_refcount_setvalue( &data_->refs, 0 ); // 'unshareable'


case 00: break;
default: data_ = new StringBuf( CharRefStringBufPtr( data_ ) );
} return data_->buf[n];
}

inline size_t String::Length() const {
return data_->used;
}

inline void String::Clear() {
if ( 2 > IntAtomicGet( data_->refs ) ||
1 > IntAtomicDecrement( data_->refs ) ) {
data_->Clear();

pthread_refcount_setvalue( &data_->refs, 1 ); // mutation -> 'shareable'


}
else {
data_ = new StringBuf;
}
}

} // namespace COW_AtomicInt3

regards,
alexander.

(*) <http://tinyurl.com/2wai> (SKETCHv2, that's "current" one).
<http://tinyurl.com/2wal> (SKETCHv1, that's an older one).

Joseph Seigh

unread,
Nov 26, 2002, 12:07:53 PM11/26/02
to

Alexander Terekhov wrote:
>
> David Butenhof wrote:
> [...]
> > > < IntAtomicGet() is used to read the *current* value >
> >
> > There's no such thing as "atomic get". ....
>
> Okay, let me put it this way... Below you can find a sort of "updated"
> COW string example using the hypothetical pthread_refcount_t interface.
> Do you see any problems with it? What are they? I hope that you'll
> "deduce" the meaning of pthread_refcount_t calls from it with the COW
> example used as an illustration. I'm sorry, but it's still in C++. ;-)
>
> pthread_refcount_t interface(*) used [without error checking and I've
> not yet had a chance to compile it, sorry]:
>
> pthread_refcount_init()
> pthread_refcount_destroy()
> pthread_refcount_getvalue()
> pthread_refcount_setvalue()
> pthread_refcount_increment()
> pthread_refcount_decrement_no_msync()
>
> COW string illustration based on Herb Sutter's "gotw-045-test-harness":
> <http://www.gotw.ca/gotw/045.htm> <http://www.gotw.ca/gotw/045code.zip>
>

Are you trying to combine a reference count with a rwlock? Seems more
problematic than it's worth. Pthread_refcount would need rwlock
functionality and semantics.

A thread-safe internal smart pointer seems like an easier abstraction to work
with. All your read methods just access the string buffer via the smart
pointer without any locks. All your write/modify methods would get a mutex
lock, copy the string buffer, modify the copy, and point the smart buffer to
it. That's it. And don't be tempted to do your string append ops in place.
It's more trouble than it's worth. The appended part wouldn't necessarily
become visible in a consistent manner. Even if you pre-initialized the
spare buffer to zero's (for end of string). Unless of your string class
semantics permitted partial visibility of appends.

Joe Seigh

Alexander Terekhov

unread,
Nov 26, 2002, 12:13:35 PM11/26/02
to

Joseph Seigh wrote:
[...]

> Are you trying to combine a reference count with a rwlock?

No. What makes you think so, Joe?

regards,
alexander.

Joseph Seigh

unread,
Nov 26, 2002, 1:05:01 PM11/26/02
to

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:
> [...]
> > Are you trying to combine a reference count with a rwlock?
>
> No. What makes you think so, Joe?
>

You appear to be trying to distinquish the usage, that a refcount
of one means that no other process currently accessing the data
and therefore it is safe to try something that requires exclusive
access. Which BTW would only work if no other processes attempt to
access the object before that something is done.

Joe Seigh

Alexander Terekhov

unread,
Nov 26, 2002, 1:17:32 PM11/26/02
to

Joseph Seigh wrote:
>
> Alexander Terekhov wrote:
> >
> > Joseph Seigh wrote:
> > [...]
> > > Are you trying to combine a reference count with a rwlock?
> >
> > No. What makes you think so, Joe?
> >
> You appear to be trying to distinquish the usage, that a refcount
> of one means that no other process currently accessing the data
> and therefore it is safe to try something that requires exclusive
> access.

Yes. Because all mutations need to be protected/synchronized by the
callers/string owner(s).

> Which BTW would only work if no other processes attempt to
> access the object before that something is done.

Right. That's the case for all mutations [non-const methods]. "1" is
NOT used for non-const string objects. I've made that mistake [in the
copy c-tor] and was corrected by Mr. Hans Boehm:

http://groups.google.com/groups?selm=3D858C7A.2975BA4A%40web.de
(Subject: Re: Ref-counted strings not thread-safe -- looking for clarification)

What's the problem?

regards,
alexander.