Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

threadsafe reference counting

33 views
Skip to first unread message

Jochen Wilhelmy

unread,
Nov 17, 2002, 7:04:53 PM11/17/02
to
hi!

when writing a c/c++ program using the sun-specific functions
atomic_add_32 and atomic_add_32_nv
from "sys/atomic.h" for thread safe reference counting
i get a linker error. does anybody know which library
to include?

tanks,
jochen

Sean P. Burke

unread,
Nov 17, 2002, 11:31:48 PM11/17/02
to

Jochen Wilhelmy <j.wil...@arcor.de> writes:

This subject came up previously sometime in the last year,
so you might be able to get a better answer through goodle.

Basically, these calls are in the kernel, and not in any
of the userland libraries. You can extract the code from
the kernel, or code them up yourself if you know a little
Sparc assembly. google for details.

-SEan

--
"A man is never so innocently employed
as when he is making money." - Dr. Johnson

Alexander Terekhov

unread,
Nov 17, 2002, 11:54:52 PM11/17/02
to

"Sean P. Burke" wrote:
[...]

> so you might be able to get a better answer through goodle.
^^^^^^

Yeah, looks not bad (well, "making life better", etc.)

http://www.goodle.com

;-)

regards,
alexander.

--
http://groups.google.com/groups?q=atomic_add_32

Mike Mowbray

unread,
Nov 18, 2002, 1:49:25 AM11/18/02
to
Jochen Wilhelmy wrote:

As Sean Burke has already noted, I posted an answer to
this some time ago. If you can't find it on google,
feel free to email me.


- MikeM.

Joseph Seigh

unread,
Nov 18, 2002, 7:25:30 AM11/18/02
to

Alexander Terekhov wrote:
...

You're not warning them off anymore? Well, at least people
realize that they need to do something to make reference
counting thread-safe even though what they get is only
mostly thread-safe as Miracle Max would say.

Speaking of atomic, I actually found an intel cmpxchg8b inline
function in asm/system.h for Linux. Except it isn't used to
do a compare and swap.

Joe Seigh

Alexander Terekhov

unread,
Nov 18, 2002, 7:49:36 AM11/18/02
to

Joseph Seigh wrote:
>
> Alexander Terekhov wrote:
> ...
>
> You're not warning them off anymore? ...

Well, I have to admit that sometimes I'm getting tired. ;-)

<warning>

http://groups.google.com/groups?selm=3DAE728A.12F929A4%40web.de
http://groups.google.com/groups?selm=3DAFD6B9.EB49FF66%40web.de
(Subject: Re: portable library for atomic_inc/dec)

http://groups.google.com/groups?selm=3DD6B018.1138BFB4%40web.de
http://groups.google.com/groups?selm=3DD85DB4.E303F98E%40web.de
(Subject: Re: std::string, reference counting and multithreading)

</warning>

regards,
alexander.

Jochen Wilhelmy

unread,
Nov 18, 2002, 10:53:24 AM11/18/02
to
>>You're not warning them off anymore? ...
>
>
> Well, I have to admit that sometimes I'm getting tired. ;-)


if i have some object obj with a referenceCount,
then whats wrong about

if (atomic_dec_and_test(&obj->referenceCount))
delete obj;

if the reference count is 2 and two threads do this
simultaneously then only one should delete the object
because atomic_dec_and_test returns true only in one thread.


jochen

Alexander Terekhov

unread,
Nov 18, 2002, 11:11:14 AM11/18/02
to

Okay, no links this time.

---
Naked "atomic counts" (w/o *memory synchronization* protocol
ala POSIX mutexes) don't work for *MUTABLE* objects, unless
you acquire the right lock in the destructor as well, which
is rather silly.
---

regards,
alexander.

Jochen Wilhelmy

unread,
Nov 18, 2002, 11:54:26 AM11/18/02
to
i start to guess what's wrong with

if (atomic_dec_and_test(&obj->referenceCount))
delete obj;

the code that is in the destructor may not see the
correct data if we have a multiprocessor system and
an other thread on another processor just has modified
the data.

interesting thing: does this apply for copy-on-write
shared strings since a string can't be written to if
two threads hold a reference to it. a thread that
writes to it copies it first.

another thing: i have implemented some kind of event loop
that each thread enters and you can call methods in
other threads (involving storing the parameters in
a class and some template stuff) which mostly removes
the need to access shared data with mutexes, and also
the smartpointer problem. you just create an object
and send it to another thread. the mutex in the message
system should do the synchronisation.
(windows knows this as apartment model, but my thing
is platform-independent and typesafe)

i will release this stuff soon, and let's see the comments
from you because i didn't find another thread library
with this event loop aproach yet.

jochen

Joseph Seigh

unread,
Nov 18, 2002, 5:18:28 PM11/18/02
to

Jochen Wilhelmy wrote:
>
>
> if i have some object obj with a referenceCount,
> then whats wrong about
>
> if (atomic_dec_and_test(&obj->referenceCount))
> delete obj;
>
> if the reference count is 2 and two threads do this
> simultaneously then only one should delete the object
> because atomic_dec_and_test returns true only in one thread.
>

Wrong place. It's the increment that's not safe. Take the
following scenario (cut and pasted from c.l.c++.m)

Thread A creates a object with count == 1
Thread A stores address of object in X
Thread B fetches address of ojbect from X
Thread A deletes (stores 0 in) X
Thread A decrements count on object from 1 to 0
Thread A deletes object
Thread B increments count in no longer existing object
(value of count is undefined so B cannot necessarily determine that something has gone awry
before it is too late)


In order to get away with just atomic_inc/atomic_dec you have to
guarantee that the reference count will never go to zero. This
adds additional constraints to reference counting which to me
obviates much of its usefulness.

Joe Seigh

Joseph Seigh

unread,
Nov 18, 2002, 6:05:31 PM11/18/02
to

Jochen Wilhelmy wrote:
>
> i start to guess what's wrong with
>
> if (atomic_dec_and_test(&obj->referenceCount))
> delete obj;
>
> the code that is in the destructor may not see the
> correct data if we have a multiprocessor system and
> an other thread on another processor just has modified
> the data.
>

I don't believe that is necessarily true. If it's just
memory being released, the heap manager should have its
own memory barriers if it needs them. If other resources
are involved, then I would presume some form of locking
is being used to coordinate things and that the destructor
would also be using the locking which takes care of that
problem.

Joe Seigh

Jochen Wilhelmy

unread,
Nov 18, 2002, 6:20:19 PM11/18/02
to
hi!

> Wrong place. It's the increment that's not safe. Take the
> following scenario (cut and pasted from c.l.c++.m)
>
> Thread A creates a object with count == 1
> Thread A stores address of object in X
> Thread B fetches address of ojbect from X
> Thread A deletes (stores 0 in) X
> Thread A decrements count on object from 1 to 0
> Thread A deletes object
> Thread B increments count in no longer existing object
> (value of count is undefined so B cannot necessarily determine that something has gone awry
> before it is too late)


i saw this one but thought that is a higher level problem,
a kind of ownership problem.
what is when thread A executes until the object is deleted
and after that B fetches the address of object from X.
this is an error in any case and crashes the app (or
throws a null pointer exception).
so if the ownership of an object is assigned to a thread
and this thread takes care that it holds a valid pointer
as long as other threads may access it, then there should
be no problem.
as mentioned in my other mail i have implemented a thread
library that supports what windows calls apartment model
threading (to be released soon on sourceforge) i have this
kind of ownership. but tell me if i am wrong with this.

jochen

David Schwartz

unread,
Nov 18, 2002, 6:45:46 PM11/18/02
to
Jochen Wilhelmy wrote:

> > Wrong place. It's the increment that's not safe. Take the
> > following scenario (cut and pasted from c.l.c++.m)

This scenario is impossible.

> > Thread A creates a object with count == 1
> > Thread A stores address of object in X

> > Thread B fetches address of ojbect from X

The fetching of an object's address should increment its reference
count. If it doesn't, the fetch itself is broken.

> > Thread A deletes (stores 0 in) X
> > Thread A decrements count on object from 1 to 0

You mean from 2 to 1. How can the reference count ever be zero while
another thread holds a pointer to the object?

> > Thread A deletes object
> > Thread B increments count in no longer existing object
> > (value of count is undefined so B cannot necessarily determine that something has gone awry
> > before it is too late)

The rest of this would never happen.



> i saw this one but thought that is a higher level problem,
> a kind of ownership problem.

Any attempt to get access to an object must, in the process of finding
the object, increment its reference count.

> so if the ownership of an object is assigned to a thread
> and this thread takes care that it holds a valid pointer
> as long as other threads may access it, then there should
> be no problem.

I don't suggest doing this. If a thread has access to an object through
a mechanism that is not inherently safe (such as a pointer), it should
hace its own reference.

DS

Joseph Seigh

unread,
Nov 18, 2002, 7:29:51 PM11/18/02
to

If owning a pointer means no other thread can change it, then
yes, as long as your thread owns at least one pointer and thus
guarantees the count will not go to zero, incrementing the
count will be safe.

So for example if you had a copy on write object you'd fetch the
pointer to it using the same mutex that is used to modify/create
new copies meaning you own the pointer also.

pthread_mutex_lock(&mutex);
ptr = global_ptr; // get reference counted pointer to object
pthread_mutex_unlock(&mutex);
... // access object

With a thread-safe reference counted pointer w/ memory barriers the
same thing could be done by

ptr = global_ptr; // get reference counted pointer to object
... // access object

There's other really cool things you can do with other data structures
if you have reference counted pointers that really work such as iterating
through a linked list without a lock. This is basically what
AWTEventMulticaster in Java does (memory barriers issues aside).

Joe Seigh

Hillel Y. Sims

unread,
Nov 18, 2002, 11:43:58 PM11/18/02
to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3DD9688D...@xemaps.com...


>
> Thread A creates a object with count == 1
> Thread A stores address of object in X
> Thread B fetches address of ojbect from X
> Thread A deletes (stores 0 in) X
> Thread A decrements count on object from 1 to 0
> Thread A deletes object
> Thread B increments count in no longer existing object
> (value of count is undefined so B cannot necessarily determine
that something has gone awry
> before it is too late)
>

This problem as stated is not directly related to (internal) reference
counting issues. This is a problem of unsynchronized access of shared
_external_ references to an object. Here is your scenario, restated using
ints:

Thread A: int* i = new int(69); // "count == 1"
Thread A: x = i; // x is global int*
Thread B: int* p = x;
Thread A: x = 0; delete i; // "count == 0"
Thread B: int j = *p; // format hard drive..

>
> In order to get away with just atomic_inc/atomic_dec you have to
> guarantee that the reference count will never go to zero.

It can never go to zero until only a single thread is accessing the
reference counted object, if correct shared object synchronization protocols
are followed.

hys

--
(c) 2002 Hillel Y. Sims
FactSet Research Systems
hsims AT factset.com


Casper H.S. Dik

unread,
Nov 19, 2002, 3:37:17 AM11/19/02
to
Joseph Seigh <jsei...@xemaps.com> writes:

>Wrong place. It's the increment that's not safe. Take the
>following scenario (cut and pasted from c.l.c++.m)

>Thread A creates a object with count == 1
>Thread A stores address of object in X
>Thread B fetches address of ojbect from X
>Thread A deletes (stores 0 in) X

At this point, we're talking about an error in the algorithm.

In proper refcounted algorithm, only threads that already hold
a reference can increment the reference.

The issue in this particular scenario is not so much the
reference counting as well as the fact the the "X" object
is not properly protected; it must be accessed and modified
under a lock.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Jochen Wilhelmy

unread,
Nov 19, 2002, 5:14:23 AM11/19/02
to
> If owning a pointer means no other thread can change it, then
> yes, as long as your thread owns at least one pointer and thus
> guarantees the count will not go to zero, incrementing the
> count will be safe.
>
> So for example if you had a copy on write object you'd fetch the
> pointer to it using the same mutex that is used to modify/create
> new copies meaning you own the pointer also.
>
> pthread_mutex_lock(&mutex);
> ptr = global_ptr; // get reference counted pointer to object
> pthread_mutex_unlock(&mutex);
> ... // access object
>
> With a thread-safe reference counted pointer w/ memory barriers the
> same thing could be done by
>
> ptr = global_ptr; // get reference counted pointer to object
> ... // access object

thanks for this clarification. but i thought that it is bad
to assume some type as thread safe anyway, be it an int or
a smartpointer. so if you have a global int i would use
a mutex to access it since the access may not be atomic.


> There's other really cool things you can do with other data structures
> if you have reference counted pointers that really work such as iterating
> through a linked list without a lock. This is basically what
> AWTEventMulticaster in Java does (memory barriers issues aside).

why does iterating (that is reading) through a linked list only
works with proper refcounted pointers? is the iterator a global
variable?

jochen

Alexander Terekhov

unread,
Nov 19, 2002, 5:18:35 AM11/19/02
to

Joseph Seigh wrote:
>
> Jochen Wilhelmy wrote:
> >
> > i start to guess what's wrong with

Lack of memory barriers is wrong [or .acq/.rel atomic op + mbar,
or "full mbar" atomic op] with your version. Plus: optimization
restrictions with respect to reordering/code motions imposed on
compiler, of course.

> >
> > if (atomic_dec_and_test(&obj->referenceCount))
{
__INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> > delete obj;
}
else
{
__INJECT_RELEASE_MEMORY_BARRIER(); // release semantics
}

> >
> > the code that is in the destructor may not see the
> > correct data if we have a multiprocessor system and
> > an other thread on another processor just has modified
> > the data.
> >
> I don't believe that is necessarily true.

But it's really true.

> If it's just memory being released, the heap manager
> should have its own memory barriers if it needs them.

Yes, but "heap manager" != "non-trivial" destructor.

< ISO/IEC 14882:1998(E) >

"A destructor is trivial if it is an implicitly-declared
destructor and if:

— all of the direct base classes of its class have trivial
destructors and

— for all of the non-static data members of its class that
are of class type (or array thereof), each such class has
a trivial destructor.

Otherwise, the destructor is non-trivial."

regards,
alexander.

Joseph Seigh

unread,
Nov 19, 2002, 6:17:49 AM11/19/02
to

Jochen Wilhelmy wrote:
...


>
> > There's other really cool things you can do with other data structures
> > if you have reference counted pointers that really work such as iterating
> > through a linked list without a lock. This is basically what
> > AWTEventMulticaster in Java does (memory barriers issues aside).
>
> why does iterating (that is reading) through a linked list only
> works with proper refcounted pointers? is the iterator a global
> variable?

The iterator is local to the thread using it. Iterating through
a list in such a way is a standard wait-free technique in systems with
some form of GC available. Reference counting is a form of GC.

Joe Seigh

Jochen Wilhelmy

unread,
Nov 19, 2002, 7:00:19 AM11/19/02
to
> The iterator is local to the thread using it. Iterating through
> a list in such a way is a standard wait-free technique in systems with
> some form of GC available. Reference counting is a form of GC.
can you give me an example what i can do with correct pointers
that i can't do with unsafe pointers?

if i have a list
list<int> l;
// fill with data
for (list<int>::iterator it = l.begin(); it != l.end(); ++it)
{
// read only
}

this should work in any case if all threads only read.
writing or even inserting shold be locked in any case?

or can i insert into the list with one thread while reading
with the other thread safely with correct pointers?

jochen

Joseph Seigh

unread,
Nov 19, 2002, 7:04:07 AM11/19/02
to

"Casper H.S. Dik" wrote:
>
> Joseph Seigh <jsei...@xemaps.com> writes:
>
> >Wrong place. It's the increment that's not safe. Take the
> >following scenario (cut and pasted from c.l.c++.m)
>
> >Thread A creates a object with count == 1
> >Thread A stores address of object in X
> >Thread B fetches address of ojbect from X
> >Thread A deletes (stores 0 in) X
>
> At this point, we're talking about an error in the algorithm.
>
> In proper refcounted algorithm, only threads that already hold
> a reference can increment the reference.
>
> The issue in this particular scenario is not so much the
> reference counting as well as the fact the the "X" object
> is not properly protected; it must be accessed and modified
> under a lock.
>

The "proper" way is a compromise that is used to deal with
what most believe is an intractable problem.

It probably got that way somewhat along these lines. First
you had naive reference counting in a single thread which
worked fine. When that was used in a multi-threaded environment
the increment/decrement of a shared counter problem quickly became
apparent. That was fixed with atomic increment/decrement. But
it turns out there was still the problem of count going to
zero before increment. The solution to that appeared to be
intractable, so a compromise was made and the restriction
of having to own a reference to copy it was widely adopted.

Also, a lot of people have rather limited experience with
the different concurrent programming techniques that are
out there but don't realize it. Anything outside that
experience is obviously wrong and invalid to their view.

You see some of that thinking in places here, the presumption
that a lock is implicitly required for everything. They're not
aware of lock-free or wait-free programming it seems.

Joe Seigh

Joseph Seigh

unread,
Nov 19, 2002, 7:47:53 AM11/19/02
to

Usually the thread doing list modification has a lock. Deletion
is accomplished by unlinking the list item and letting GC take
care of the rest.

Reading or traversing the list doesn't require a lock.

If list item state is an issue you deal with it at the
list item level rather than at the list level. This can reduce
your locking granularity and allow better concurrency.

Semantics are slightly different and some people have difficulty
is dealing with it. In their case they are better off sticking
with the more conventional methods they are used to.

Joe Seigh

ig...@notformail.paco.net

unread,
Nov 19, 2002, 2:41:30 PM11/19/02
to

When used with smart pointers is it possible to keep information about
higher level locking into the object and pointers to this object?
Like this

template<class R>
class Ptr {
public:
Ptr() {
rep = NULL;
Lock = NULL;
};
~Ptr() {
if ( rep ) {
if ( Lock ) Lock->Lock();
rep->Ref_Decrement();
if ( rep->Ref_Value() == 0 )
delete rep;
if ( Lock ) Lock->UnLock();
}
};
Ptr(const Ptr& P) {
if ( P.rep ) P.rep->Ref_Increment();
rep = P.rep;
Lock= P.Lock;
};
void SetLocker(Mutex *L) {
Lock = L;
}

...

private:
R* rep;
Mutex* Lock;
};

> ---
>
> regards,
> alexander.

--
Igor Khasilev |
PACO Links, igor at paco dot net |

Torsten Robitzki

unread,
Nov 19, 2002, 3:18:08 PM11/19/02
to
Joseph Seigh wrote:
>
> Jochen Wilhelmy wrote:
> >
> >
> > if i have some object obj with a referenceCount,
> > then whats wrong about
> >
> > if (atomic_dec_and_test(&obj->referenceCount))
> > delete obj;
> >
> > if the reference count is 2 and two threads do this
> > simultaneously then only one should delete the object
> > because atomic_dec_and_test returns true only in one thread.
> >
>
> Wrong place. It's the increment that's not safe. Take the
> following scenario (cut and pasted from c.l.c++.m)
>
> Thread A creates a object with count == 1
> Thread A stores address of object in X
> Thread B fetches address of ojbect from X
> Thread A deletes (stores 0 in) X
> Thread A decrements count on object from 1 to 0
> Thread A deletes object
> Thread B increments count in no longer existing object
> (value of count is undefined so B cannot necessarily determine that something has gone awry
> before it is too late)

First of all, I hope we can agree on the purpose of a reference
couting smart pointer in c++. It's just to keep track of the
numbers of references to an dynamic created object. And to
delete this object if the number of references drop down to zero.

There is no special thread garanty by a smart pointer that goes
further then that of a normal pointer.

You have to synchronize the access to a smartpointer even if
you want to make a copy of the pointer.

The atomic increment/decrement read operation on the reference
counter is just needed to safely decrement the reference count
in the destructor of the smart pointer to decide if the object
can be deleted.

> In order to get away with just atomic_inc/atomic_dec you have to
> guarantee that the reference count will never go to zero. This
> adds additional constraints to reference counting which to me
> obviates much of its usefulness.

When the reference count drops to zero, you can safely destroy the
object because there is no other reference to that object.

regards Torsten

Alexander Terekhov

unread,
Nov 19, 2002, 4:34:32 PM11/19/02
to

ig...@notformail.paco.net wrote:
[...]

> > Naked "atomic counts" (w/o *memory synchronization* protocol
> > ala POSIX mutexes) don't work for *MUTABLE* objects, unless
> > you acquire the right lock in the destructor as well, which
> > is rather silly.
>
> When used with smart pointers is it possible to keep information
> about higher level locking into the object and pointers to this
> object?

Yes, I think so [if you really, really want].

> Like this
>
> template<class R>
> class Ptr {
> public:
> Ptr() {
> rep = NULL;
> Lock = NULL;
> };
> ~Ptr() {
> if ( rep ) {

/*

> if ( Lock ) Lock->Lock();

*/

Not here.

> rep->Ref_Decrement();
> if ( rep->Ref_Value() == 0 )

I'd make Ref_Decrement() [atomic but w/o mem.sync.] indicate
"dropped to zero" condition.

{
if ( Lock ) { Lock->Lock(); Lock->UnLock(); }

Looks somewhat silly, oder? ;-)

> delete rep;
}

/*


> if ( Lock ) Lock->UnLock();

*/

Not needed and really "dangerous" -- deadlocks aside [unless you
really want recursive locking], d-tor might easily evaporate the
lock as well [e.g. if it's "owned" by the referenced object].

> }
> };

regards,
alexander.

Alexander Terekhov

unread,
Nov 19, 2002, 4:38:34 PM11/19/02
to

Torsten Robitzki wrote:
[...]

> When the reference count drops to zero, you can safely destroy the
> object because there is no other reference to that object.

Unless that object meant to have a "zombie" state [think of registry/
cache] and CAN be "resurrected". Do you want a link, Torsten? ;-) ;-)

regards,
alexander.

--
"If someone asks you the time, they're trying to set you up to MUG you."

....

"If someone asks you the time, say you have no idea.

If someone asks you for money, pretend you don't speak English.

If someone says hand over that camera or die, give it up.

If someone says run, it's probably a good idea.

If you want to live, stay in lower Manhattan. "

Jochen Wilhelmy

unread,
Nov 19, 2002, 5:21:57 PM11/19/02
to
> I'd make Ref_Decrement() [atomic but w/o mem.sync.] indicate
> "dropped to zero" condition.
>
> {
> if ( Lock ) { Lock->Lock(); Lock->UnLock(); }
>
> Looks somewhat silly, oder? ;-)
>
>
>> delete rep;


why by the way is it not possible to create a new mutex before delete,
lock it, unlock it and delete it. i have not yet written an os so
far :)

jochen

Torsten Robitzki

unread,
Nov 19, 2002, 5:43:17 PM11/19/02
to
Alexander Terekhov wrote:
>
> Torsten Robitzki wrote:
> [...]
> > When the reference count drops to zero, you can safely destroy the
> > object because there is no other reference to that object.
>
> Unless that object meant to have a "zombie" state [think of registry/
> cache] and CAN be "resurrected".

Can you elaborate on this? Do you think of a state in the object that
is needed by the destructor to perform proper destruction? If so, the
state of the object might only be changed by an other thread
while holding a mutex.

>
> Do you want a link, Torsten? ;-) ;-)

It this a threat? ;-)

Alexander Terekhov

unread,
Nov 19, 2002, 6:23:06 PM11/19/02
to

Jochen Wilhelmy wrote:
>
> > I'd make Ref_Decrement() [atomic but w/o mem.sync.] indicate
> > "dropped to zero" condition.
> >
> > {
> > if ( Lock ) { Lock->Lock(); Lock->UnLock(); }
> >
> > Looks somewhat silly, oder? ;-)
> >
> >
> >> delete rep;
>
> why by the way is it not possible to create a new mutex before delete,
> lock it, unlock it and delete it.

Good question. Uhmm. Unless I'm just missing something [and it's
really time for me to go sleep [condvar, timedwait()] ;-) ]...

Yes, as long as threads DON'T unref objects that they mutate INSIDE
"modify" critical regions [and compiler doesn't move unrefs in], it
doesn't really matter which lock will be locked{/unlocked} prior to
delete. So, locking some >>thread specific<< [oh, oh, yeah] "mutex"
should work "just fine", I believe. ;-)

IF this assumption [no unrefs inside "modify" critical regions] can
NOT be guaranteed, then one would need recursive locks anyway, to
begin with. Well, we really need pthread_refcount_t, I'd say. ...

regards,
alexander.

Alexander Terekhov

unread,
Nov 19, 2002, 6:24:20 PM11/19/02
to

Torsten Robitzki wrote:
>
> Alexander Terekhov wrote:
> >
> > Torsten Robitzki wrote:
> > [...]
> > > When the reference count drops to zero, you can safely destroy the
> > > object because there is no other reference to that object.
> >
> > Unless that object meant to have a "zombie" state [think of registry/
> > cache] and CAN be "resurrected".
>
> Can you elaborate on this? Do you think of a state in the object that
> is needed by the destructor to perform proper destruction?

No. I mean the following stuff:

http://groups.google.com/groups?selm=3D021EA4.E2217C09%40web.de
(Subject: Re: Objects in container: creation and deletion details)

with Zombies, resurrection [God rules], etc. ;-)

regards,
alexander. < offline till tomorrow >

Mike Mowbray

unread,
Nov 19, 2002, 7:01:43 PM11/19/02
to
Alexander Terekhov wrote:

> [...Talking about mutable refcounted objects...]


> > if (atomic_dec_and_test(&obj->referenceCount))
> > {
> > __INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> > delete obj;
> > }
> > else
> > {
> > __INJECT_RELEASE_MEMORY_BARRIER(); // release semantics
> > }

Hmmm. Is this sequence correct? Suppose the refcount is 2.
Thread-A performs a decrement (that results in 1), but before
it executes the "release" membar in the "else" clause, thread-B
performs a decrement, finds a result of 0, and proceeds to the
"acquire" membar and deletion before thread-A has completed its
release membar.

I would have thought something like the following is better:

__INJECT_RELEASE_MEMORY_BARRIER(); // release semantics

if (atomic_dec_and_test(&obj->referenceCount))
{
__INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
delete obj;
}

(?)


- MikeM.


Alexander Terekhov

unread,
Nov 20, 2002, 6:09:26 AM11/20/02
to

Mike Mowbray wrote:
>
> Alexander Terekhov wrote:
>
> > [...Talking about mutable refcounted objects...]
>
> > > if (atomic_dec_and_test(&obj->referenceCount))
> > > {
> > > __INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> > > delete obj;
> > > }
> > > else
> > > {
> > > __INJECT_RELEASE_MEMORY_BARRIER(); // release semantics
> > > }
>
> Hmmm. Is this sequence correct?

No, of course. I've just copied it from this <http://tinyurl.com/2us7>
message of mine. I really wish that you would jump in and correct me
back then. What I was think of at that time (and now too) is the
following.

Basically, in sort of "portable" terms, the decrement (and sub)
should do the following:

if (ENTER_CRITICAL_REGION_AND__DO_atomic_dec_and_test(&obj->referenceCount))
{
LEAVE_CRITICAL_REGION_AND__INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
delete obj;
}
else
{
LEAVE_CRITICAL_REGION__INJECTING_RELEASE_MEMORY_BARRIER(); // release semantics
}

where neither ENTER_ nor atomic_dec inject barriers.

Is this clear enough now? ;-)

Well, that's why my "sketch" versions of pthread_refcount_t are
"specified" in a way that would allow to drop in *normal mutex*
to synchronize decrements/subs [in addition to fully atomic ops].

[...]


> I would have thought something like the following is better:
>
> __INJECT_RELEASE_MEMORY_BARRIER(); // release semantics
>
> if (atomic_dec_and_test(&obj->referenceCount))
> {
> __INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> delete obj;
> }

Yes, this MIGHT work (depending on the semantics of atomic op and
the preceding RELEASE barrier). Probably.

regards,
alexander.

Mike Mowbray

unread,
Nov 20, 2002, 8:14:33 PM11/20/02
to
Alexander Terekhov wrote:

> > Mike Mowbray wrote:
> >>
> >> Alexander Terekhov wrote:
> >>
> >> > [...Talking about mutable refcounted objects...]
> >>
> >>> > if (atomic_dec_and_test(&obj->referenceCount))
> >>> > {
> >>> > __INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> >>> > delete obj;
> >>> > }
> >>> > else
> >>> > {
> >>> > __INJECT_RELEASE_MEMORY_BARRIER(); // release semantics
> >>> > }
> >>
> >> Hmmm. Is this sequence correct?
>
> > No, of course. I've just copied it from this <http://tinyurl.com/2us7>
> > message of mine. I really wish that you would jump in and correct me
> > back then.

?? OK. Now let's see... Where's that time machine they were
talking about recently on sci.physics? :-)

Seriously though, I've only just recently started thinking more
closely about correct placement of membars in conjunction with
atomic ops. I've had a handle-and-rep class that uses atomic
incr/decr for several years, and I was lucky because it only
needed to run on sparcv8plus (which uses a TSO memory model)
and alpha/Tru64 (which thoughtfully provides membars in its
various load-locked/store-conditional builtin intrinsics).
But now I need to get my act together for sparcv9, so I'm
trying to follow this discussion more closely.

> > Basically, in sort of "portable" terms, the decrement
> > (and sub) should do the following:
> >
> > if (ENTER_CRITICAL_REGION_AND__DO_atomic_dec_and_test(&obj->referenceCount))
> > {
> > LEAVE_CRITICAL_REGION_AND__INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
>
> > delete obj;
> > }
> > else
> > {
> > LEAVE_CRITICAL_REGION__INJECTING_RELEASE_MEMORY_BARRIER(); // release semantics
> > }
> >
> > where neither ENTER_ nor atomic_dec inject barriers.
> >
> > Is this clear enough now? ;-)

Depends on exactly what you mean by "ENTER/LEAVE CRITICAL REGION". I normally
associate these words with locking/unlocking a mutex, but I suspect you
don't mean that. You say that ENTER_CRITICAL_REGION_AND__DO_atomic_dec_and_test
does not involve a membar. In that case, I still think your code sketch above
is wrong, for the reasons explained in my previous post. (I hope it's still
on your server so I don't need to copy it again here. :-)

So..., er..., no, I guess it's either still not clear enough, or it's wrong.


- MikeM.


Alexander Terekhov

unread,
Nov 21, 2002, 8:17:09 AM11/21/02
to

Mike Mowbray wrote:
[...]

> > > Basically, in sort of "portable" terms, the decrement
> > > (and sub) should do the following:
> > >
> > > if (ENTER_CRITICAL_REGION_AND__DO_atomic_dec_and_test(&obj->referenceCount))
> > > {
> > > LEAVE_CRITICAL_REGION_AND__INJECT_ACQUIRE_MEMORY_BARRIER(); // acquire semantics
> >
> > > delete obj;
> > > }
> > > else
> > > {
> > > LEAVE_CRITICAL_REGION__INJECTING_RELEASE_MEMORY_BARRIER(); // release semantics
> > > }
> > >
> > > where neither ENTER_ nor atomic_dec inject barriers.
> > >
> > > Is this clear enough now? ;-)
>
> Depends on exactly what you mean by "ENTER/LEAVE CRITICAL REGION". I normally
> associate these words with locking/unlocking a mutex, but I suspect you
> don't mean that.

I DO mean sort of "mutex" which...

> You say that ENTER_CRITICAL_REGION_AND__DO_atomic_dec_and_test
> does not involve a membar.

^^^^^^^^^^^^^^^^^^^^^^^^^

on lock() because it "protects" nothing but *atomic* operation/dec. ;-)

Well, the problem is that the count's decrements should not "overtake"
memory transaction(s) resulted from object updates/mutation. Adding
"release" barrier prior to decrement and "acquire" barrier prior to
delete may NOT necessarily ensure this -- depends on the semantics of
atomic op [that is used to decrement count] and/w.r.t. barrier(s),
AFAICS.

regards,
alexander.

Mike Mowbray

unread,
Nov 21, 2002, 8:13:50 PM11/21/02
to
Alexander Terekhov wrote:

> [...] Well, the problem is that the count's decrements


> should not "overtake" memory transaction(s) resulted from
> object updates/mutation.

Right.


> Adding "release" barrier prior to decrement and "acquire"
> barrier prior to delete may NOT necessarily ensure this
> -- depends on the semantics of atomic op [that is used to
> decrement count] and/w.r.t. barrier(s), AFAICS.

Hmmm. Admittedly, I tend to think about this stuff mostly in
terms of sparc cas which doesn't need a membar to implement
atomic incr/decr, and DEC alpha/Tru64's builtin intrinsics
which supply their own membars, but I'm having trouble
imagining any reasonable definition of "atomic incr/decr"
that would interfere with membars in the way you suggest
above. Can you elaborate, pls?


- MikeM.


Hillel Y. Sims

unread,
Nov 21, 2002, 10:37:22 PM11/21/02
to

"Mike Mowbray" <mi...@despammed.com> wrote in message
news:3DDD84B8...@despammed.com...


> Alexander Terekhov wrote:
>
> > Adding "release" barrier prior to decrement and "acquire"
> > barrier prior to delete may NOT necessarily ensure this
> > -- depends on the semantics of atomic op [that is used to
> > decrement count] and/w.r.t. barrier(s), AFAICS.
>
> Hmmm. Admittedly, I tend to think about this stuff mostly in
> terms of sparc cas which doesn't need a membar to implement
> atomic incr/decr, and DEC alpha/Tru64's builtin intrinsics
> which supply their own membars,

According to my system documentation on VMS 7.3 (Alpha), it is true that
older builtin intrinsics automatically generated membars, but newer versions
of some functions (such as atomic incr / decr) do not:

<builtins.h>
** The following operations do not in themselves generate Memory
Barriers.
** The user is expected to code explicitly any Memory Barriers where
** needed using inline assembly code, e.g. asm("mb"), or asm("wmb").
[..]
** Older versions generated memory barriers both before and after the
** update. Newer versions allow the user to control memory barrier
** placement for the low-level primitives.
[..]
int __ATOMIC_INCREMENT_LONG
(volatile void *__addr);
int __ATOMIC_DECREMENT_LONG
(volatile void *__addr);
[..]

(well actually there is also an intrinsic __MB() provided to do asm("mb"),
but no __WMB()...)

>but I'm having trouble
> imagining any reasonable definition of "atomic incr/decr"
> that would interfere with membars in the way you suggest
> above.

I hope you're right... if not, I think it will totally throw off any of the
little that I think I manage to understand so far about how this stuff
works.. :-(

Mike Mowbray

unread,
Nov 21, 2002, 11:37:11 PM11/21/02
to
"Hillel Y. Sims" wrote:


> > "Mike Mowbray" <mi...@despammed.com> wrote in message
> > news:3DDD84B8...@despammed.com...
> >>

> >> [...] I tend to think about this stuff mostly in


> >> terms of sparc cas which doesn't need a membar to implement
> >> atomic incr/decr, and DEC alpha/Tru64's builtin intrinsics
> >> which supply their own membars,

> According to my system documentation on VMS 7.3 (Alpha), it
> is true that older builtin intrinsics automatically generated
> membars, but newer versions of some functions (such as atomic

> incr/decr) do not: [...]

Right you are! Coincidentally, I went back and re-read the header
files shortly after posting my previous message - and was dismayed
to read exactly what you say above. My recollection was obviously
from an older version.

> >> but I'm having trouble imagining any reasonable definition
> >> of "atomic incr/decr" that would interfere with membars

> >> in the way (Alexander) suggests [...]

> I hope you're right... if not, I think it will totally throw
> off any of the little that I think I manage to understand so
> far about how this stuff works.. :-(

I know how you feel! I hope that Alexander and Joe will explain
it more clearly/thoroughly than what I've read so far. :-)

In a previous article, Joe Seigh wrote:

> How this works. Simple, a reference count can be expressed
> as the difference between a count of references started
> and references ended.
>
> Rc = Rs - Re
>
> Furthermore this can be multiple counts.
>
> Rc = Sum(Rs) - Sum(Re)
>
> Since the count is now split up, you can keep the start
> reference count with the pointer and both fetch both the
> pointer value and increment the start reference count
> atomically which solves the race condition that regular
> reference counting has.
>
> However since the process of summing up the reference counts
> may be asynchronous and distributed, you need an additional
> mechanism to determine whether an apparent sum of interest
> (usually zero) is a partial sum or not. Various mechanisms
> can be used.
>
> These reference count pointers cannot be copied however, more
> precisely, you cannot make a pointer that is itself copyable
> so it is a pointer of a different type.
>
> So to solve the asynchronous summing problem and the copy
> problem, you just combine these types of reference counts,
> called ephemeral reference counts, with regular reference
> counts.
>
> A reference count object becomes deletable when the regular
> reference count becomes zero and the ephemeral count becomes
> zero which usually means no copies are in progress.

I'm sorry, Joe. Maybe I'm a bit dense, but there's just not
enough information in the above for me to understand in detail
how your mechanism operates. I'd like to understand it better,
and I could probably help you implement/test the atomic parts
on my sparc+alpha machines here if I had more of a clue about
what you're doing. Is there any chance you can take a bit of
time to write a more pedestrian and complete exposition, with
step-by-step examples for the dummies?


- MikeM.

Konrad Schwarz

unread,
Nov 22, 2002, 3:16:25 AM11/22/02
to
Mike Mowbray schrieb:

> I'm sorry, Joe. Maybe I'm a bit dense, but there's just not
> enough information in the above for me to understand in detail
> how your mechanism operates. I'd like to understand it better,
> and I could probably help you implement/test the atomic parts
> on my sparc+alpha machines here if I had more of a clue about
> what you're doing. Is there any chance you can take a bit of
> time to write a more pedestrian and complete exposition, with
> step-by-step examples for the dummies?

I'd be interested too, FWIW.

Alexander Terekhov

unread,
Nov 22, 2002, 5:40:06 AM11/22/02
to

I like this:

http://www.primenet.com/~jakubik/mpsafe/MultiprocessorSafe.pdf

<quote>

To implement mutex locking, the system software uses a higher-level
lock called a spin lock. On systems with relaxed memory models, this
requires help from the hardware. Most modern CPUs provide atomic
instructions to help with this problem. One common atomic instruction
that many different CPUs provide is an atomic swap. An atomic swap
allows a single instruction to swap data in the CPU with data in
memory. On CPUs with relaxed memory models, like the model we have
been examining, the atomic instruction will do whatever it takes to
make this instruction atomic despite the memory model. Doing so may
involve bypassing the queue of memory operations, or causing some
degree of synchronization with the queue.

The details are beyond the scope of this article."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

</quote>

;-)

Well, I guess that it's time for David Butenhof to jump in and
shred some light on this issues, so to speak.

regards,
alexander.

Joseph Seigh

unread,
Nov 22, 2002, 7:41:10 AM11/22/02
to

Alexander Terekhov wrote:
>
> "Hillel Y. Sims" wrote:
> >
> > "Mike Mowbray" <mi...@despammed.com> wrote in message
> > news:3DDD84B8...@despammed.com...
> > > Alexander Terekhov wrote:
> > >
> > > > Adding "release" barrier prior to decrement and "acquire"
> > > > barrier prior to delete may NOT necessarily ensure this
> > > > -- depends on the semantics of atomic op [that is used to
> > > > decrement count] and/w.r.t. barrier(s), AFAICS.

This is probably closer to reality. Atomic ops don't need memory
barriers to work correctly. They work directly on memory. But
if you are going to use them in conjunction with some other scheme
then you may need memory barriers to synchronize memory state with
the atomic ops. And that all depends on what those other schemes
are.

As far as smart pointers are concerned, there is no intrinsic reason
they need memory barriers. But without memory barriers they would
be about a useless as Java volatile pointers. Actually less so.
Java at least guarantees you see the object monitor in a valid
initial state.

So basically I have to figure out what uses you could put smart pointers
to, e.g. DCL, copy on write, wait-free data structures, etc..., and then
figure out what memory barriers would be needed in order for these
schemes to work. And then document those semantics. Something that
doesn't seem to be industry practice. Incidentially, I posted formal
semantics for mutual exclusion to c.p.t back arount Nov. 2000. While
not "official" ("official" and formal aren't necessarilly the same)
they are probably the only formal semantics that exist for mutual
exclusion. What Java has is a meta-implementation, their memory
model, which they use as a "specification". It's pretty much useless
for program proving for any Java program over 3 lines of code.

Joe Seigh

Joseph Seigh

unread,
Nov 22, 2002, 8:24:19 AM11/22/02
to

Mike Mowbray wrote:
...


>
> In a previous article, Joe Seigh wrote:

<snip>


>
> I'm sorry, Joe. Maybe I'm a bit dense, but there's just not
> enough information in the above for me to understand in detail
> how your mechanism operates. I'd like to understand it better,
> and I could probably help you implement/test the atomic parts
> on my sparc+alpha machines here if I had more of a clue about
> what you're doing. Is there any chance you can take a bit of
> time to write a more pedestrian and complete exposition, with
> step-by-step examples for the dummies?
>

It's on my list I guess. It's not that easy. One advantage
researchers and academics especially have is face time where
they get instant feedback on what parts people are having
difficulty with and what approaches work.

Also, part of the problem is I have been working with this stuff
so long it's second nature. I can think in multi-threading. But
I do go the formal route occasionally. It's a good exercise. You
may think you know something but then you'll find out that's not
really the case.

I already did a native implemenation on sparc which works pretty
well. Actually will work pretty well once I go back and fix
one thing I messed up changing to interface to the native code.
Things got pretty ugly in the heap manager. :-)

I had a sparc programmers manual to help with that. I want to
do an intel implementation next. I suspect I will have to spend
some outrageous sum for an Intel prorammers manual, I don't think
there are any online. This is not good given my current lack
of income.

The interface I'm using is

extern int cas64(void *target, void *cmp, void *value);

return code 0 failed, 1 success
cmp is updated with current value of target.

operands are 64 bit anythings hence void to avoid unecessary casting.

value should probably be const.

and no membars probably.

Joe Seigh

David Butenhof

unread,
Nov 22, 2002, 9:26:17 AM11/22/02
to
Alexander Terekhov wrote:

While that's true OF THE ATOMIC SEQUENCE itself, this is of no help to
surrounding code. Note that Alpha has no "atomic swap instruction", but
rather an extremely general and flexible pair of "locked" instructions,
"load-locked" and "store-conditional" that can be used to build atomic
queues, atomic bit set/clears, increment/decrements, and and lots more.
(While there are restrictions on the nature and number of instructions
between the pair, I've coded some really complicated conditional assembly
sequences before eventually coming to the conclusion I'd've been better off
just using a compare-and-swap. ;-) )

In the simplest cases, the memory guarantees you get "for free" from the
atomic sequence are just fine. If I want an atomic counter that doesn't
"mean" anything beyond its value, that's exactly what I want. It's
accurate, and it has no added overhead. (This is also why the Alpha
intrinsics don't include MB instructions... they're relatively expensive
and unnecessary in the simple cases that you want to run fast.)

HOWEVER, this all changes when the value you're adjusting atomically has
some external meaning to the program.

If I atomically insert a structure onto a queue, the atomic sequence itself
is enough to ensure that an atomic dequeue will see the pointer I wrote.
But it's not enough to ensure that it will see the CONTENTS of the
structure to which it points. That requires memory barriers on both sides
of the transaction. The result might be something like this:

(writer:)

<initialize structure>
__MB();
__ATOMIC_EXCH_QUAD(); /*insert*/

(reader:)

__ATOMIC_EXCH_QUAD(); /*remove*/
__MB();
<use structure>

Or you could imagine instead using atomic increment/decrement to designate
the "valid" (written) slice of a vector; you'd write the vector's element,
issue the MB, and then do the atomic increment. (And similarly in the
reader.)

(Oh, and note another good reason for omitting the mb from the stock
intrinsics... to be general purpose it'd need one on EACH side, but one
would almost always be wasted. The alternative would be to proliferate
variations that combined the operation with one mb or the other... and
sometimes you really do want both. It's much neater this way.)

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

Alexander Terekhov

unread,
Nov 22, 2002, 11:28:57 AM11/22/02
to

David Butenhof wrote:
[...]

> (writer:)
>
> <initialize structure>
> __MB();
> __ATOMIC_EXCH_QUAD(); /*insert*/
>
> (reader:)
>
> __ATOMIC_EXCH_QUAD(); /*remove*/
> __MB();
> <use structure>

I'd prefer:

(writer:)

<initialize structure>
// __MB();
__ATOMIC_EXCH_QUAD_REL(); /*insert -- RELEASE semantics*/

(reader:)

__ATOMIC_EXCH_QUAD_ACQ(); /*remove -- ACQUIRE semantics*/
// __MB();
<use structure>

AFAICS, the problem with "__MB()"-plus-naked-atomics version is that in
the case of "writer:", the __ATOMIC_EXCH_QUAD() implementation should
synchronize with respect to write queue(s) -- should inspect the presence
of membar(s) there and "wait" till preceding writes "complete". This [i.e.
sync with respect to *writes*] is unneeded in the "reader:" case, I think.
Or am I just missing and/or misunderstanding something?

regards,
alexander.

David Butenhof

unread,
Nov 22, 2002, 12:58:33 PM11/22/02
to
Alexander Terekhov wrote:

>
> David Butenhof wrote:
> [...]
>> (writer:)
>>
>> <initialize structure>
>> __MB();
>> __ATOMIC_EXCH_QUAD(); /*insert*/
>>
>> (reader:)
>>
>> __ATOMIC_EXCH_QUAD(); /*remove*/
>> __MB();
>> <use structure>
>
> I'd prefer:
>
> (writer:)
>
> <initialize structure>
> // __MB();
> __ATOMIC_EXCH_QUAD_REL(); /*insert -- RELEASE semantics*/
>
> (reader:)
>
> __ATOMIC_EXCH_QUAD_ACQ(); /*remove -- ACQUIRE semantics*/
> // __MB();
> <use structure>

More interfaces to worry about. Leaving out the barriers gives the coder
total flexibility via more direct access to the hardware. The extra forms
provide little real value to someone who's legitimately programming at that
level; either they know the hardware and can probably work better with
clean primitives... or they're pretty much doomed to mess it up anyway. The
REL and ACQ versions map perfectly into the IPF instruction set, by the
way; is that deliberate or accidental?

> AFAICS, the problem with "__MB()"-plus-naked-atomics version is that in
> the case of "writer:", the __ATOMIC_EXCH_QUAD() implementation should
> synchronize with respect to write queue(s) -- should inspect the presence
> of membar(s) there and "wait" till preceding writes "complete". This [i.e.
> sync with respect to *writes*] is unneeded in the "reader:" case, I think.
> Or am I just missing and/or misunderstanding something?

You can't (in general) use a WMB (writer barrier) on the enqueue side,
unless you know that there are no independent reads that could be
destructively reordered across the WMB. Often, this is reasonable since
you're only concerned about the structure and the atomic queue's pointer;
but it could never be put into a general purpose mechanism like
pthread_mutex_lock(), or __ATOMIC_EXCH_QUAD().

On the other side, there's no such thing as an RMB, even if you're sure
there are no writes that might be destructively reordered -- so there's no
option on that side.

Hillel Y. Sims

unread,
Nov 22, 2002, 11:11:54 PM11/22/02
to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3DDE3167...@xemaps.com...

>
> I had a sparc programmers manual to help with that. I want to
> do an intel implementation next. I suspect I will have to spend
> some outrageous sum for an Intel prorammers manual, I don't think
> there are any online. This is not good given my current lack
> of income.

Is this helpful?
http://developer.intel.com/design/pentium4/manuals/

Joseph Seigh

unread,
Nov 23, 2002, 1:14:59 PM11/23/02
to

"Hillel Y. Sims" wrote:
>
> "Joseph Seigh" <jsei...@xemaps.com> wrote in message
> news:3DDE3167...@xemaps.com...
> >
> > I had a sparc programmers manual to help with that. I want to
> > do an intel implementation next. I suspect I will have to spend
> > some outrageous sum for an Intel prorammers manual, I don't think
> > there are any online. This is not good given my current lack
> > of income.
>
> Is this helpful?
> http://developer.intel.com/design/pentium4/manuals/
>

Actually, I downloaded them yesterday. A couple of days earlier those
web pages were down for maintenance with a note saying that you could
get the publications from elsewhere (for $$). So I didn't find out
they were free until yesterday. Last time I did assembly programming on
intel was on an 8088, actually bit poking in basic for a while until the
assembler became available. I always thought the register layout and usage was
kind of stupid. Every other instruction ended up being a mov. They
seem to have kept their compatibility with respect to this very well.

Also, I was helping out annotating a disassembley of DOS 1.0 until we
were told to stop. I had the command handler and program loader figured out.

Joe Seigh

Alexander Terekhov

unread,
Nov 23, 2002, 2:24:00 PM11/23/02
to

David Butenhof wrote:
[...]

> The REL and ACQ versions map perfectly into the IPF instruction set,
> by the way; is that deliberate or accidental?

Deliberate. Java's revised memory model with its ACQ/REL volatiles
aside for a moment, it basically comes from:

["Intel(R) Itanium(R) Architecture Software Developer's Manual"
http://developer.intel.com/design/itanium/manuals ]

"Memory accesses follow one of four memory ordering semantics:
unordered, release, acquire or fence. Unordered data accesses
may become visible in any order. Release data accesses
guarantee that all previous data accesses are made visible
prior to being made visible themselves. Acquire data accesses
guarantee that they are made visible prior to all subsequent
data accesses. Fence operations combine the release and
acquire semantics into a bi-directional fence, i.e., they
guarantee that all previous data accesses are made visible
prior to any subsequent data accesses being made visible.

Explicit memory ordering takes the form of a set of
instructions: ordered load and ordered check load (ld.acq,
ld.c.clr.acq), ordered store (st.rel), semaphores (cmpxchg,
xchg, fetchadd), and memory fence (mf). The ld.acq and
ld.c.clr.acq instructions follow acquire semantics. The
st.rel follows release semantics. The mf instruction is a
fence operation. The xchg, fetchadd.acq, and cmpxchg.acq
instructions have acquire semantics. The cmpxchg.rel, and
fetchadd.rel instructions have release semantics. The
semaphore instructions also have implicit ordering. If
there is a write, it will always follow the read. In
addition, the read and write will be performed atomically
with no intervening accesses to the same memory region.

....

The Itanium architecture provides a set of three semaphore
instructions: exchange (xchg), compare and exchange (cmpxchg),
and fetch and add (fetchadd). Both cmpxchg and fetchadd may
have either acquire or release semantics depending on the
specific opcode chosen. The xchg instruction always has
acquire semantics. These instructions read a value from
memory, modify this value using an instruction-specific
operation, and then write the modified value back to memory.
The read-modify-write sequence is atomic by definition."

Well, what I don't quite follow is why do they sort of
"restrict-way-too-much" IA-32's stuff [running on IA-64]:

<quote>

IA-32 instructions are mapped into the Itanium memory
ordering model as follows:

- All IA-32 stores have release semantics
- All IA-32 loads have acquire semantics

</quote>

given that IA-32's SSE/SSE2 extensions DID inroduce "out-of-
order" memory ordering and barriers [load-fence/store-fence/
both-load-and-store-fence] to IA-32 as well. Any ideas?

Well, back to ref.counting [for mutable stuff; WITH memory
sync]. "Itanic" version should probably use this:

cmpxchg.rel for decrements/subs PLUS memory fence (mf)
when the count drops to zero [well, ld.acq to load the
object pointer and invoke "delete" would work too, but
would somewhat "complicate" the *pthread_refcount_t*
interface, I guess ;-) ].

Correct?

>
> > AFAICS, the problem with "__MB()"-plus-naked-atomics version is that in
> > the case of "writer:", the __ATOMIC_EXCH_QUAD() implementation should
> > synchronize with respect to write queue(s) -- should inspect the presence
> > of membar(s) there and "wait" till preceding writes "complete". This [i.e.
> > sync with respect to *writes*] is unneeded in the "reader:" case, I think.
> > Or am I just missing and/or misunderstanding something?
>
> You can't (in general) use a WMB (writer barrier) on the enqueue side,
> unless you know that there are no independent reads that could be
> destructively reordered across the WMB. Often, this is reasonable since
> you're only concerned about the structure and the atomic queue's pointer;

Yep.

> but it could never be put into a general purpose mechanism like
> pthread_mutex_lock(),

Yup. pthread_mutex_lock() should have "acquire" semantics and
pthread_mutex_lock() should have "release".


> or __ATOMIC_EXCH_QUAD().
>
> On the other side, there's no such thing as an RMB,

Except that *IA-32* does have "LFENCE" thing. ;-)

> even if you're sure
> there are no writes that might be destructively reordered -- so there's no
> option on that side.

regards,
alexander.

Hillel Y. Sims

unread,
Nov 24, 2002, 1:31:59 PM11/24/02
to
Apologies in advance for being really dense...

"David Butenhof" <David.B...@compaq.com> wrote in message
news:d8rD9.1$K85....@news.cpqcorp.net...


>
> In the simplest cases, the memory guarantees you get "for free" from the
> atomic sequence are just fine. If I want an atomic counter that doesn't
> "mean" anything beyond its value, that's exactly what I want. It's
> accurate, and it has no added overhead.

If I understand correctly, the following is safe on an Alpha SMP system
(with respect to _only_ this object):

class RawCounter {
int count;
public:
RawCounter() : count(0) {}

void incr() { __ATOMIC_INCREMENT_LONG(&count); }
void decr() { __ATOMIC_DECREMENT_LONG(&count); }
int get_count() const { return count; }
};

No need for mb here before reading 'count', the reader will always see the
up-to-date version (in any of the three relevant functions). Is this
correct?

The thing I am a bit confused about is how this relates to the description
of Alpha cache coherency wrt reordering issues at
http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html :
Initial state: p = &x, x = 1, y = 0
T1: y = 1; mb; p = &y
T2: i = *p
Can result in: i == 0


<quote>
Here is what has to happen for this behavior to show up. Assume T1 runs on
P1 and T2 on P2. P2 has to be caching location y with value 0. P1 does y=1
which causes an "invalidate y" to be sent to P2. This invalidate goes into
the incoming "probe queue" of P2; as you will see, the problem arises
because this invalidate could theoretically sit in the probe queue without
doing an MB on P2. The invalidate is acknowledged right away at this point
(i.e., you don't wait for it to actually invalidate the copy in P2's cache
before sending the acknowledgment). Therefore, P1 can go through its MB. And
it proceeds to do the write to p. Now P2 proceeds to read p. The reply for
read p is allowed to bypass the probe queue on P2 on its incoming path (this
allows replies/data to get back to the 21264 quickly without needing to wait
for previous incoming probes to be serviced). Now, P2 can derefence P to
read the old value of y that is sitting in its cache (the inval y in P2's
probe queue is still sitting there).

How does an MB on P2 fix this? The 21264 flushes its incoming probe queue
(i.e., services any pending messages in there) at every MB. Hence, after the
read of P, you do an MB which pulls in the inval to y for sure. And you can
no longer see the old cached value for y.

</quote>

If I understand correctly, this means that T2 should really look like: "q =
p; mb; i = *q" and then it would not be possible to see i == 0. But... what
if P2 was also already caching location p before T2 runs? Could not the
scenario described about incompletely processed probe queue entries apply in
the same way, such that without a prior 'read/acquire' mb by T2 to force p
to be updated, the read "q = p" might see the _old_ value of p (&x), having
not yet processed its incoming probe queue invalidations (and then even
after the intermediate mb, the "i = *q" read may pull in the value of x,
instead of y)? So T2 _really_ needs to be: "mb; q = p; mb; i = *q"? If not,
why is it guaranteed for the read of p to be correct while the read of y was
not?


thanks!

David Butenhof

unread,
Nov 25, 2002, 11:48:15 AM11/25/02
to
Hillel Y. Sims wrote:

> Apologies in advance for being really dense...
>
> "David Butenhof" <David.B...@compaq.com> wrote in message
> news:d8rD9.1$K85....@news.cpqcorp.net...
>>
>> In the simplest cases, the memory guarantees you get "for free" from the
>> atomic sequence are just fine. If I want an atomic counter that doesn't
>> "mean" anything beyond its value, that's exactly what I want. It's
>> accurate, and it has no added overhead.
>
> If I understand correctly, the following is safe on an Alpha SMP system
> (with respect to _only_ this object):
>
> class RawCounter {
> int count;
> public:
> RawCounter() : count(0) {}
>
> void incr() { __ATOMIC_INCREMENT_LONG(&count); }
> void decr() { __ATOMIC_DECREMENT_LONG(&count); }
> int get_count() const { return count; }
> };
>
> No need for mb here before reading 'count', the reader will always see the
> up-to-date version (in any of the three relevant functions). Is this
> correct?

Define "correct". Although the UPDATES are synchronized through to main
memory, there's no such guarantee for the read. Although atomic writes from
other processors must have already notified the reading processor's cache,
I don't think there's any rule that it must have processed all such
notifications before retiring the read. In practice, it doesn't matter;
even if all previous updates HAD been handled, another processor could
write immediately after the fetch (long before the issuing instruction
stream could possibly do anything with the read's result).

In any case, an 'mb' wouldn't help the fetch, because there's no ORDERING
involved. And again, restating: the fact that the read may be "out of date"
(at least by some definition) has nothing to do with out of sequence reads
or writes, but only with the fact that it's a multiprocessor. (And if it's
not a multiprocessor, then all reads and writes, "atomic" or not, are
sequential.)

I wish I had time to parse all that and constrast against architectural
documentation, but I really can't spare it now.

The real answer, in any case, goes back to my opening point on the first
paragraph: "Define correct." Memory barriers are NOT intended to be global
data flushes. They're intended to preserve ORDERING dependencies between
processors.

"y = 0; mb; x = 1;" means that the write of 1 to x shall not be made visible
to other processors until the write of 0 to y is visible. Similarly, "t0 =
x; mb; t1 = y;" means that the read of y shall not occur before the read of
x. That is, the value read for y shall be "at least as up to date" as the
value of x. This is critical if x is a pointer to y, for example, as in an
atomic queue. But this isn't intended to be a guarantee that no later
writes have occurred to either on different processors that haven't been
flushed through the system.

After all, there's no way to consistently define "now" across a
multiprocessor without suspending all other processors, making the memory
subsystem (including all caches) consistent, and then performing the
operation. (And even then, "now" can be uniquely defined only until you
resume operation of the other processors.)

Or to put it another way, what difference does it make if I can say I read
the "current value of y", if some other processor can change y between my
read and my first opportunity to do anything with the value I read?

Alexander Terekhov

unread,
Nov 25, 2002, 12:48:07 PM11/25/02
to

David Butenhof wrote:
[...]

> After all, there's no way to consistently define "now" across a
> multiprocessor without suspending all other processors, making the memory
> subsystem (including all caches) consistent, and then performing the
> operation. (And even then, "now" can be uniquely defined only until you
> resume operation of the other processors.)

Uhmm...

>
> Or to put it another way, what difference does it make if I can say I read
> the "current value of y", if some other processor can change y between my
> read and my first opportunity to do anything with the value I read?

Yes, but the "current value of y" MAY indicate [presuming that it IS the
'current' one] that no other processor can ever "change y between my read
and my first opportunity to do anything with the value I read". THAT might
be really useful [or, actually even required; see 'unshareble'/0 state
below]; e.g. to avoid subsequent 'atomic'/CAS/LL-SC-like read-modify op
[it is probably much more expensive than 'a simple write']. I mean:

< IntAtomicGet() is used to read the *current* value >

http://groups.google.com/groups?selm=3D81E7B3.D462E6DE%40web.de
http://groups.google.com/groups?selm=3D858C7A.2975BA4A%40web.de
(Subject: Re: Ref-counted strings not thread-safe -- looking for clarification)

inline StringBuf::StringBuf() : len(0), used(0), refs(1), buf(0) { }

inline StringBuf::StringBuf( AppendStringBufPtr other )
: len(0), used(0), refs(1), buf(0)
{
Reserve( max( other->len, other->used+1 ) );
memcpy( buf, other->buf, used = other->used );
if ( 0 == IntAtomicDecrement( other->refs ) )
other.destruct();
}

inline StringBuf::StringBuf( CharRefStringBufPtr other )
: len(other->len), used(other->used), refs(0), buf(new char[len])
{
++String::nAllocs;
memcpy( buf, other->buf, used );
if ( 0 == IntAtomicDecrement( other->refs ) )
other.destruct();
}

inline void String::Clear() {
if ( 2 > IntAtomicGet( data_->refs ) ||
1 > IntAtomicDecrement( data_->refs ) ) {
data_->Clear();
data_->refs = 1; // mutation -> 'shareable'
}
else {
data_ = new StringBuf;
}
}

inline void String::Append( char c ) {
switch ( IntAtomicGet( data_->refs ) ) {
case 00: data_->refs = 1; // mutation -> 'shareable'
case 01: data_->Reserve( data_->used+1 ); break;
default: data_ = new StringBuf( AppendStringBufPtr( data_ ) );
} data_->buf[data_->used++] = c;
}

inline char& String::operator[]( size_t n ) {
switch ( IntAtomicGet( data_->refs ) ) {
case 01: data_->refs = 0; // 'unshareable'
case 00: break;
default: data_ = new StringBuf( CharRefStringBufPtr( data_ ) );
}
return data_->buf[n];
}

regards,
alexander.

Hillel Y. Sims

unread,
Nov 25, 2002, 11:58:42 PM11/25/02
to

"David Butenhof" <David.B...@compaq.com> wrote in message
news:jvsE9.9$Wn2.5...@news.cpqcorp.net...

>
> Similarly, "t0 =
> x; mb; t1 = y;" means that the read of y shall not occur before the read
of
> x. That is, the value read for y shall be "at least as up to date" as the
> value of x.
[..]

> After all, there's no way to consistently define "now" across a
> multiprocessor without suspending all other processors, making the memory
> subsystem (including all caches) consistent, and then performing the
> operation. (And even then, "now" can be uniquely defined only until you
> resume operation of the other processors.)
>
> Or to put it another way, what difference does it make if I can say I read
> the "current value of y", if some other processor can change y between my
> read and my first opportunity to do anything with the value I read?
>

Wow. I am trying to let all this sink in. This is really starting to sound
like quantum physics... (the "Butenhof Uncertainty Principle"??)

Joseph Seigh

unread,
Nov 26, 2002, 7:05:52 AM11/26/02
to

Mike Mowbray wrote:
> > what you're doing. Is there any chance you can take a bit of
> time to write a more pedestrian and complete exposition, with
> step-by-step examples for the dummies?
>


We'll try a by the numbers example of a smart pointer and making a copy of it.
There's two kinds of references. One is called ephemeral to distinguish it from
the other.

The smart pointer has an ephemeral count and a pointer to the reference object which
has an ephemeral count and a regular reference count and a pointer to the acutal object.
The ephemeral count is a differential count. The smart pointer part is the reference
started part and the ref obj part is the reference ended part (stores as a negative number)


After ctor of smart pointer you have bot ephemeral count parts set to zero, and regular
count to 1.

sptr ref A
0) 0, ref -> 0, 1, A -> ...

For a thread to copy the pointer, it atomically increments the ephemeral count by 1 as
it fetches the ref value, thus

1) 1, ref -> 0, 1, A -> ...

Then it updates the ref object by decrementing its ephemeral count by 1 and incrementing the
regular count by 1. New copy is init to "0, ref".

2) 1, ref -> -1, 2, A -> ...
0, ref ->

Now suppose some thread deletes the first smart pointer

3) 0, null -1, 2, A -> ...
0, ref ->

and updates shared ref object *adding* its ephemeral count to ref obj ephemeral count
and decrementing regular ref count. Both are not zero so object is not deleted.

4) 0, null 0, 1, A -> ...
0, ref ->

--


Suppose instead that the pointer being copied is deleted before the shared reference count is
updated. In regular reference counting schemes this is where the race condition occurs.
So picking up at step 1 above

1) 1, ref -> 0, 1, A -> ...

Another thread deletes pointer

2) 0, null 0, 1, A -> ...

and updates shared ref object *adding* its ephemeral count to ref obj ephemeral count
and decrementing regular ref count. Both are not zero so object is not deleted.

3) 0, null 1, 0, A -> ...

First thread finishes copying of smart pointer, step 2 of first example, i.e.
decrement ephemeral count by one and increment regular count by 1.

4) 0, null 0, 1, A -> ...
0, ref ->

--

Joe Seigh

David Butenhof

unread,
Nov 26, 2002, 8:43:09 AM11/26/02
to
Hillel Y. Sims wrote:

Asynchronous programming IS a little like quantum physics. You have to
design your code to accept, embrace, and exploit uncertainty instead of
fearing it. The only way to eliminate uncertainty is to make everything
sequential, which would be a pretty silly way to apply a multiprocessor!

That's exactly why mutexes exist, which is why I've often joked that they
should be called "bottlenecks". You sacrifice parallelism for serialization
when you really need it -- but you must strive to minimize the need.

Uncertainty ("quantum threading?") is your friend.

David Butenhof

unread,
Nov 26, 2002, 9:59:24 AM11/26/02
to
Alexander Terekhov wrote:

>> Or to put it another way, what difference does it make if I can say I
>> read the "current value of y", if some other processor can change y
>> between my read and my first opportunity to do anything with the value I
>> read?
>
> Yes, but the "current value of y" MAY indicate [presuming that it IS the
> 'current' one] that no other processor can ever "change y between my read
> and my first opportunity to do anything with the value I read". THAT might
> be really useful [or, actually even required; see 'unshareble'/0 state
> below]; e.g. to avoid subsequent 'atomic'/CAS/LL-SC-like read-modify op
> [it is probably much more expensive than 'a simple write']. I mean:

If 'y' is written ONLY ONCE, and with proper ordering (memory barriers),
then the reader can be sure that a changed value will not change. That's
'ye olde' double-checked initialization pattern. If you can't see that
initialization has occurred, you need to synchronize and try again. If you
CAN see that initialization has occurred, you need only ensure memory
ordering.

If the sequence 'if y then x' is SYNCHRONIZED, then you know no other thread
can change 'y' until you're done with your critical region.

In fully mutable data, without synchronization, you can never be sure that
'y' won't change out from under you at any instant, no matter what value
you've just read. Even if "application logic" states that the value is a
marker of some significance and "shall not change", you can't be sure the
value is up to date.

> < IntAtomicGet() is used to read the *current* value >

There's no such thing as "atomic get". On Alpha or IPF, the actual
implementation would be something along the lines of a compare-and-swap
implementing 'if the current value is what I think it is, make it what it
is'. Otherwise it's just a read, with no ordering or coherency
restrictions, and you can't know that the value is "up to date". While the
compare-and-swap may do what you want, it's "cheating" in that it's really
just memory synchronization. It's no different (to the machine) than any
other form of synchronization, including a spinlock.

If you "atomically set" a state that you know must remain constant until you
release it, then indeed you know nobody else can change it "between the
read and your use of the value", and you can make safe decisions based on
that state. In this case, you're "setting" the state that's already
there... but that really makes no difference. You've just used an atomic
operation to implement your own customized form of lock. Nothing wrong with
that at all, but it's nothing special, either.

So, yes, one may sometimes avoid an atomic sequence by verifying its safety
with an atomic sequence. Just as when one locks a mutex. Granted, on
uniprocessors or multiprocessors with globally ordered memory operations,
IntAtomicGet() may be just a read, and possibly cheaper than a
compare-and-swap. That doesn't generalize, though, so it's just another
possible machine-specific optimization.

Alexander Terekhov

unread,
Nov 26, 2002, 11:28:50 AM11/26/02
to

David Butenhof wrote:
[...]

> > < IntAtomicGet() is used to read the *current* value >
>
> There's no such thing as "atomic get". ....

Okay, let me put it this way... Below you can find a sort of "updated"
COW string example using the hypothetical pthread_refcount_t interface.
Do you see any problems with it? What are they? I hope that you'll
"deduce" the meaning of pthread_refcount_t calls from it with the COW
example used as an illustration. I'm sorry, but it's still in C++. ;-)

pthread_refcount_t interface(*) used [without error checking and I've
not yet had a chance to compile it, sorry]:

pthread_refcount_init()
pthread_refcount_destroy()
pthread_refcount_getvalue()
pthread_refcount_setvalue()
pthread_refcount_increment()
pthread_refcount_decrement_no_msync()

COW string illustration based on Herb Sutter's "gotw-045-test-harness":
<http://www.gotw.ca/gotw/045.htm> <http://www.gotw.ca/gotw/045code.zip>

namespace COW_AtomicInt3 {

#undef NAME
#undef BAGGAGE
#define NAME "COW_AtomicInt3"
#define BAGGAGE

size_t IntAtomicGet( pthread_refcount_t& refs ) {
size_t result;
pthread_refcount_getvalue( &refs, &result );
return result;
}

void IntAtomicIncrement( pthread_refcount_t& refs ) {
pthread_refcount_increment( &refs );
}

size_t IntAtomicDecrement( pthread_refcount_t& refs ) {
int status = pthread_refcount_decrement_no_msync( &refs );
if (PTHREAD_REFCOUNT_DROPPED_TO_ZERO == status )
return 0;
// error checking ...
return 1; // Whatever !0
}

class String;
struct StringBuf;

class AppendStringBufPtr {
StringBuf* ptr;
public:
AppendStringBufPtr( StringBuf* p ) : ptr( p ) {}
StringBuf* operator->() const { return ptr; }
void destruct() { delete ptr; }
};

class CharRefStringBufPtr {
StringBuf* ptr;
public:
CharRefStringBufPtr( StringBuf* p ) : ptr( p ) {}
StringBuf* operator->() const { return ptr; }
void destruct() { delete ptr; }
};

struct StringBuf {
StringBuf();
~StringBuf();
StringBuf( const StringBuf& other );
StringBuf( AppendStringBufPtr other );
StringBuf( CharRefStringBufPtr other );

void Clear();
void Reserve( size_t n );

size_t len;
size_t used;
pthread_refcount_t refs;
char* buf;
BAGGAGE;

void* operator new( size_t n );
void operator delete( void* p );
};

class String {
public:
String();
~String();
String( const String& );

void Clear();
void Append( char );
size_t Length() const;
char& operator[](size_t);

static int nCopies;
static int nAllocs;
private:
StringBuf* data_;
};

int String::nCopies;
int String::nAllocs;

static FastArena fa( NAME, sizeof(StringBuf) );
void* StringBuf::operator new( size_t n ) { return fa.Allocate( n ); }
void StringBuf::operator delete( void* p ) { fa.Deallocate( p ); }

inline StringBuf::StringBuf() : len(0), used(0), buf(0) {
pthread_refcount_init( &refs, 1 );
}

inline StringBuf::~StringBuf() {
pthread_refcount_destroy( &refs );
delete[] buf;
}

inline StringBuf::StringBuf( const StringBuf& other )
: len(other.len), used(other.used), buf(new char[len])
{
++String::nAllocs;
pthread_refcount_init( &refs, 1 );
memcpy( buf, other.buf, used );
}

inline StringBuf::StringBuf( AppendStringBufPtr other )

: len(0), used(0), buf(0)
{
pthread_refcount_init( &refs, 1 );


Reserve( max( other->len, other->used+1 ) );
memcpy( buf, other->buf, used = other->used );
if ( 0 == IntAtomicDecrement( other->refs ) )
other.destruct();
}

inline StringBuf::StringBuf( CharRefStringBufPtr other )

: len(other->len), used(other->used), buf(new char[len])
{
++String::nAllocs;
pthread_refcount_init( &ref, 0 );


memcpy( buf, other->buf, used );
if ( 0 == IntAtomicDecrement( other->refs ) )
other.destruct();
}

inline void StringBuf::Clear() {
delete[] buf;
buf = 0;
len = 0;
used = 0;
}

inline void StringBuf::Reserve( size_t n ) {
<snip>
}

inline String::String() : data_(new StringBuf) { }

inline String::~String() {


if ( 2 > IntAtomicGet( data_->refs ) ||
1 > IntAtomicDecrement( data_->refs ) )

delete data_;
}

inline String::String( const String& other )
{
switch ( IntAtomicGet( other.data_->refs ) ) {
case 00: data_ = new StringBuf( *other.data_ ); break;
default: IntAtomicIncrement( (data_ = other.data_)->refs );
}
++nCopies;
}

inline void String::Append( char c ) {
switch ( IntAtomicGet( data_->refs ) ) {

case 00: pthread_refcount_setvalue( &data_->refs, 1 ); // mutation -> 'shareable'


case 01: data_->Reserve( data_->used+1 ); break;
default: data_ = new StringBuf( AppendStringBufPtr( data_ ) );
} data_->buf[data_->used++] = c;
}

inline char& String::operator[]( size_t n ) {
switch ( IntAtomicGet( data_->refs ) ) {

case 01: pthread_refcount_setvalue( &data_->refs, 0 ); // 'unshareable'


case 00: break;
default: data_ = new StringBuf( CharRefStringBufPtr( data_ ) );
} return data_->buf[n];
}

inline size_t String::Length() const {
return data_->used;
}

inline void String::Clear() {
if ( 2 > IntAtomicGet( data_->refs ) ||
1 > IntAtomicDecrement( data_->refs ) ) {
data_->Clear();

pthread_refcount_setvalue( &data_->refs, 1 ); // mutation -> 'shareable'


}
else {
data_ = new StringBuf;
}
}

} // namespace COW_AtomicInt3

regards,
alexander.

(*) <http://tinyurl.com/2wai> (SKETCHv2, that's "current" one).
<http://tinyurl.com/2wal> (SKETCHv1, that's an older one).

Joseph Seigh

unread,
Nov 26, 2002, 12:07:53 PM11/26/02
to

Alexander Terekhov wrote:
>
> David Butenhof wrote:
> [...]
> > > < IntAtomicGet() is used to read the *current* value >
> >
> > There's no such thing as "atomic get". ....
>
> Okay, let me put it this way... Below you can find a sort of "updated"
> COW string example using the hypothetical pthread_refcount_t interface.
> Do you see any problems with it? What are they? I hope that you'll
> "deduce" the meaning of pthread_refcount_t calls from it with the COW
> example used as an illustration. I'm sorry, but it's still in C++. ;-)
>
> pthread_refcount_t interface(*) used [without error checking and I've
> not yet had a chance to compile it, sorry]:
>
> pthread_refcount_init()
> pthread_refcount_destroy()
> pthread_refcount_getvalue()
> pthread_refcount_setvalue()
> pthread_refcount_increment()
> pthread_refcount_decrement_no_msync()
>
> COW string illustration based on Herb Sutter's "gotw-045-test-harness":
> <http://www.gotw.ca/gotw/045.htm> <http://www.gotw.ca/gotw/045code.zip>
>

Are you trying to combine a reference count with a rwlock? Seems more
problematic than it's worth. Pthread_refcount would need rwlock
functionality and semantics.

A thread-safe internal smart pointer seems like an easier abstraction to work
with. All your read methods just access the string buffer via the smart
pointer without any locks. All your write/modify methods would get a mutex
lock, copy the string buffer, modify the copy, and point the smart buffer to
it. That's it. And don't be tempted to do your string append ops in place.
It's more trouble than it's worth. The appended part wouldn't necessarily
become visible in a consistent manner. Even if you pre-initialized the
spare buffer to zero's (for end of string). Unless of your string class
semantics permitted partial visibility of appends.

Joe Seigh

Alexander Terekhov

unread,
Nov 26, 2002, 12:13:35 PM11/26/02
to

Joseph Seigh wrote:
[...]

> Are you trying to combine a reference count with a rwlock?

No. What makes you think so, Joe?

regards,
alexander.

Joseph Seigh

unread,
Nov 26, 2002, 1:05:01 PM11/26/02
to

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:
> [...]
> > Are you trying to combine a reference count with a rwlock?
>
> No. What makes you think so, Joe?
>

You appear to be trying to distinquish the usage, that a refcount
of one means that no other process currently accessing the data
and therefore it is safe to try something that requires exclusive
access. Which BTW would only work if no other processes attempt to
access the object before that something is done.

Joe Seigh

Alexander Terekhov

unread,
Nov 26, 2002, 1:17:32 PM11/26/02
to

Joseph Seigh wrote:
>
> Alexander Terekhov wrote:
> >
> > Joseph Seigh wrote:
> > [...]
> > > Are you trying to combine a reference count with a rwlock?
> >
> > No. What makes you think so, Joe?
> >
> You appear to be trying to distinquish the usage, that a refcount
> of one means that no other process currently accessing the data
> and therefore it is safe to try something that requires exclusive
> access.

Yes. Because all mutations need to be protected/synchronized by the
callers/string owner(s).

> Which BTW would only work if no other processes attempt to
> access the object before that something is done.

Right. That's the case for all mutations [non-const methods]. "1" is
NOT used for non-const string objects. I've made that mistake [in the
copy c-tor] and was corrected by Mr. Hans Boehm:

http://groups.google.com/groups?selm=3D858C7A.2975BA4A%40web.de
(Subject: Re: Ref-counted strings not thread-safe -- looking for clarification)

What's the problem?

regards,
alexander.

Alexander Terekhov

unread,
Nov 26, 2002, 1:23:23 PM11/26/02
to
< correction >

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:
> >
> > Alexander Terekhov wrote:
> > >
> > > Joseph Seigh wrote:
> > > [...]
> > > > Are you trying to combine a reference count with a rwlock?
> > >
> > > No. What makes you think so, Joe?
> > >
> > You appear to be trying to distinquish the usage, that a refcount
> > of one means that no other process currently accessing the data
> > and therefore it is safe to try something that requires exclusive
> > access.
>
> Yes. Because all mutations need to be protected/synchronized by the
> callers/string owner(s).
>
> > Which BTW would only work if no other processes attempt to
> > access the object before that something is done.
>
> Right. That's the case for all mutations [non-const methods]. "1" is
> NOT used for

>>CONST<<

Joseph Seigh

unread,
Nov 26, 2002, 2:32:18 PM11/26/02
to

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:
> >
> > Alexander Terekhov wrote:
> > >
> > > Joseph Seigh wrote:

..


> > You appear to be trying to distinquish the usage, that a refcount
> > of one means that no other process currently accessing the data
> > and therefore it is safe to try something that requires exclusive
> > access.
>
> Yes. Because all mutations need to be protected/synchronized by the
> callers/string owner(s).
>
> > Which BTW would only work if no other processes attempt to
> > access the object before that something is done.
>
> Right. That's the case for all mutations [non-const methods]. "1" is
> NOT used for non-const string objects. I've made that mistake [in the
> copy c-tor] and was corrected by Mr. Hans Boehm:
>
> http://groups.google.com/groups?selm=3D858C7A.2975BA4A%40web.de
> (Subject: Re: Ref-counted strings not thread-safe -- looking for clarification)
>
> What's the problem?
>

I suppose in theory it could be done. But this is so problematic. You are trying
to coordinate with some external construct, ownership, that is not that well defined to
begin with. You are not going to have a robust solution by definition because there
is too much beyond your scope of control. But this is a C++ thing I suppose.
At least Java warns you not to build safety critical systems with it, and it's
a lot more thread-safe.

Joe Seigh

Hillel Y. Sims

unread,
Nov 26, 2002, 11:24:41 PM11/26/02
to

"David Butenhof" <David.B...@compaq.com> wrote in message
news:g%LE9.13$a_3.2...@news.cpqcorp.net...

>
> > < IntAtomicGet() is used to read the *current* value >
>
> There's no such thing as "atomic get". On Alpha or IPF, the actual
> implementation would be something along the lines of a compare-and-swap
> implementing 'if the current value is what I think it is, make it what it
> is'. Otherwise it's just a read, with no ordering or coherency
> restrictions, and you can't know that the value is "up to date". While the
> compare-and-swap may do what you want, it's "cheating" in that it's really
> just memory synchronization. It's no different (to the machine) than any
> other form of synchronization, including a spinlock.
>

ahh... this is the point I was really trying to clarify. so in the
following:

init: X = 1, Y = 0, P = &X

T1: Y = 2; WMB; P = &Y;

T2: Q = P; MB; i = *Q

...it is never guaranteed that T2 will read Q = &Y, but either way, 'i' can
never be 0, only 1 or 2. Is this "correct"?

And in the following:

init: X = 1, Y = 0, P = &X

T1: P = &Y

T2: Q = P; i = *Q

...even if T1 has processed the write temporally before T2 processes the
read, 'i' can still be either 1 or 0. To truly guarantee that the read
generates Q == &Y after T1 has "gone" == "P = &Y write has fully processed
to main memory", it is necessary to do atomic_get() such as (eg on Alphas)
"do { Q = P; } while (!__CMP_STORE_LONG(&Q, P, P, &Q));"??? Attempt to use
"MB; Q = P" instead will NOT really help, because it is possible that P can
still return old value from external memory even after T1 has already "gone"
== "next instruction after P = &Y is being processed", as there is NO
guarantee that the write of P = &Y has ever reached external memory within
any given (potentially INFINITE) timeframe because of _NO formal
ordering/synchronization semantics_???

Hillel Y. Sims

unread,
Nov 26, 2002, 11:53:29 PM11/26/02
to

"Joseph Seigh" <jsei...@xemaps.com> wrote in message

news:3DE3CDB0...@xemaps.com...


>
> I suppose in theory it could be done. But this is so problematic. You
are trying
> to coordinate with some external construct, ownership, that is not that
well defined to
> begin with. You are not going to have a robust solution by definition
because there
> is too much beyond your scope of control. But this is a C++ thing I
suppose.
> At least Java warns you not to build safety critical systems with it, and
it's
> a lot more thread-safe.
>

This whole scheme is based on the common pthreads idiom/strategy that
_initial_ sharing of thread-local objects between threads must be explicitly
synchronized _by the programmer_. This is not an unreasonable requirement to
place on the programmer; it is not the same as isolating the implicit
sharing that can take place during any other copies. This logic guarantees
that the first thread cannot drop its reference until the second thread is
done copying during the first copy between threads, so the reference count
cannot ever possibly fall 1->0 for any copy operation (if it does, it
indicates an explicit application programming error, failure to synchronize
access to shared data, which is outside the concern of the string object
itself).

String objects may be copied among many threads without any protection for
all other accesses (eg, no protection ever necessary when copying global
string objects). Only the _initial_ act of sharing a thread-local object
across threads (via reference/pointer) must be externally synchronized (eg,
"while (!copyDone) wait(cv);"), such that the first thread cannot drop the
string object out of scope until the second thread has completed copying
(which means incrementing the refcount to >= 2). Given these guidelines, the
reference count cannot ever drop from 1 to 0 during copy as a race
condition, because after all threads but one drop their final references to
the object, they can never again access the same object without again
explicitly synchronizing with the single thread who has the only reference.

(But as far as rwlocks, if anyone wants to help me implement cluster-wide
"counted semaphores" using rwlocks (VMS Lock Mgr), that'd be great!)

David Butenhof

unread,
Nov 27, 2002, 10:44:08 AM11/27/02
to
Hillel Y. Sims wrote:

> "David Butenhof" <David.B...@compaq.com> wrote in message
> news:g%LE9.13$a_3.2...@news.cpqcorp.net...
>>
>> > < IntAtomicGet() is used to read the *current* value >
>>
>> There's no such thing as "atomic get". On Alpha or IPF, the actual
>> implementation would be something along the lines of a compare-and-swap
>> implementing 'if the current value is what I think it is, make it what it
>> is'. Otherwise it's just a read, with no ordering or coherency
>> restrictions, and you can't know that the value is "up to date". While
>> the compare-and-swap may do what you want, it's "cheating" in that it's
>> really just memory synchronization. It's no different (to the machine)
>> than any other form of synchronization, including a spinlock.
>>
>
> ahh... this is the point I was really trying to clarify. so in the
> following:
>
> init: X = 1, Y = 0, P = &X
>
> T1: Y = 2; WMB; P = &Y;
>
> T2: Q = P; MB; i = *Q
>
> ...it is never guaranteed that T2 will read Q = &Y, but either way, 'i'
> can never be 0, only 1 or 2. Is this "correct"?

You know, I've done a lot of thinking and programming with this sort of
"temporality" issue for a long time, and it still makes my head hurt. ;-)

Yes, your conclusion is correct. IF T2 can see the assignment of &Y to P
(the only way it could "possibly" see *Q == 0) then it must also see the
write of 2 to Y.

To stress once again, this is not a symptom of global memory serialization,
but a specific communication protocol between T1 and T2 that enforces the
LOCAL ordering of specific critical memory operations.

Joseph Seigh

unread,
Nov 27, 2002, 12:27:25 PM11/27/02
to

"Hillel Y. Sims" wrote:
>
> This whole scheme is based on the common pthreads idiom/strategy that
> _initial_ sharing of thread-local objects between threads must be explicitly
> synchronized _by the programmer_. This is not an unreasonable requirement to
> place on the programmer; it is not the same as isolating the implicit
> sharing that can take place during any other copies. This logic guarantees
> that the first thread cannot drop its reference until the second thread is
> done copying during the first copy between threads, so the reference count
> cannot ever possibly fall 1->0 for any copy operation (if it does, it
> indicates an explicit application programming error, failure to synchronize
> access to shared data, which is outside the concern of the string object
> itself).
>
> String objects may be copied among many threads without any protection for
> all other accesses (eg, no protection ever necessary when copying global
> string objects). Only the _initial_ act of sharing a thread-local object
> across threads (via reference/pointer) must be externally synchronized (eg,
> "while (!copyDone) wait(cv);"), such that the first thread cannot drop the
> string object out of scope until the second thread has completed copying
> (which means incrementing the refcount to >= 2). Given these guidelines, the
> reference count cannot ever drop from 1 to 0 during copy as a race
> condition, because after all threads but one drop their final references to
> the object, they can never again access the same object without again
> explicitly synchronizing with the single thread who has the only reference.
>

So dropping a reference requires synchronization and signaling so the owner
can safely wait for the refcount to go to 1?

Joe Seigh

Hillel Y. Sims

unread,
Nov 27, 2002, 10:02:09 PM11/27/02
to
"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3DE501F0...@xemaps.com...

>
> So dropping a reference requires synchronization and signaling so the
owner
> can safely wait for the refcount to go to 1?
>

Synchronization is maintained via the atomic_decrement(). If #2 and #1 drop
their references at the same time, only one of them will actually see the
"1->0" status and delete the reference. The final act of #2 was to decrement
the count, so he is done, and #1 does not need to wait for him.

Using A. Terekhov's scheme of portable atomic-counter "pthread_refcount_t"
with encapsulated memory synchronization semantics, I think the general
scheme looks something like (no-msync optimization ignored for clarity):

String::~String()
{
ref->release(); // StringRef* ref
}

StringRef::release()
{
if (pthread_refcount_decrement(&count) == PTHREAD_REFCOUNT_DEAD)
delete *this;
}

hys
(...rvalues on the RIGHT! @#$@#$$@#)

Alexander Terekhov

unread,
Nov 28, 2002, 4:50:56 AM11/28/02
to

"Hillel Y. Sims" wrote:
[...]

> StringRef::release()
> {
> if (pthread_refcount_decrement(&count) == PTHREAD_REFCOUNT_DEAD)
> delete *this;
> }
>
> hys
> (...rvalues on the RIGHT! @#$@#$$@#)

But NOT in IFs! @#$@#$$@#

(it helps to read/check it twice! ;-) ;-) )

regards,
alexander.

Joseph Seigh

unread,
Nov 29, 2002, 6:08:47 PM11/29/02
to
Ok, I see what you're trying to do. The objects aren't threadsafe but you are
using shallowing copying of the object data with deep copying only when needed.

Something like

inline T * getWritableData() {
ref<T> *temp;

if (ptr->refcount != 1) { // atomic read
temp = new ref<T>; // new reference counted data object (with refcount init to 1)
copy(temp->data, ptr->data); // copy data
if (atomicDecrement(ptr->refcount) == 0)
delete ptr;
ptr = temp;
}
return ptr->data;
}

in any of your methods that modify the object. Which is seems to be what is already being done
more or less.

All read methods simply just access the data. No refcount checking required.

Some comments to previous questions.

1) You will never see a zero reference count.
2) If you see 1, it will always be 1.
3) The atomic read is only needed for data consistency (e.g. if int is not atomic),
not for getting the current value which can't be gotten, only past values. If you
need a relatively recent value (say you have a really aggressive memory model) then
a memory barrier may help somewhat.
4) You don't need a cv to copy the data safetly. The refcount is adequate protection.
5) You don't need any memory barriers for data visibility.

Strings have that reserved buffer thing so you could change the above logic to somthing like

if (ptr->refcount != 1 || bufferNotBigEnough) { // atomic read


Joe Seigh

Alexander Terekhov

unread,
Nov 30, 2002, 10:25:14 AM11/30/02
to

Joseph Seigh wrote:
[...]

> Some comments to previous questions.
>
> 1) You will never see a zero reference count.
> 2) If you see 1, it will always be 1.

Uhmm. Joe, please read this:

http://www.gotw.ca/gotw/043.htm
(Reference Counting - Part I)

http://www.gotw.ca/gotw/044.htm
(Reference Counting - Part II)

http://www.gotw.ca/gotw/045.htm
(Reference Counting - Part III)

Wrong conclusions aside(*), it does kinda explain "how it works"
[COW for strings], I think.

Now, perhaps you may also want to "revisit":

http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
(Subject: Re: threadsafe reference counting)

;-)

regards,
alexander.

(*) http://groups.google.com/groups?selm=3D6FAE63.676B1B69%40web.de

Joseph Seigh

unread,
Nov 30, 2002, 12:39:32 PM11/30/02
to

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:
> [...]
> > Some comments to previous questions.
> >
> > 1) You will never see a zero reference count.
> > 2) If you see 1, it will always be 1.
>
> Uhmm. Joe, please read this:
>
> http://www.gotw.ca/gotw/043.htm
> (Reference Counting - Part I)
>
> http://www.gotw.ca/gotw/044.htm
> (Reference Counting - Part II)
>
> http://www.gotw.ca/gotw/045.htm
> (Reference Counting - Part III)
>
> Wrong conclusions aside(*), it does kinda explain "how it works"
> [COW for strings], I think.
>
> Now, perhaps you may also want to "revisit":
>
> http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
> (Subject: Re: threadsafe reference counting)
>

I already know what copy on write is. Part of the problem may be
the problem is not well defined. Are we talking about atomic
strings or non-atomic strings? COW is just an implementation.
But if you don't have the externals defined, there is no way
you can even begin to discuss implementation issues.

Joe Seigh

Alexander Terekhov

unread,
Nov 30, 2002, 1:56:15 PM11/30/02
to

Joseph Seigh wrote:
>
> Alexander Terekhov wrote:
> >
> > Joseph Seigh wrote:
> > [...]
> > > Some comments to previous questions.
> > >
> > > 1) You will never see a zero reference count.
> > > 2) If you see 1, it will always be 1.
[...]

> > Now, perhaps you may also want to "revisit":
> >
> > http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de
> > (Subject: Re: threadsafe reference counting)
> >
>
> I already know what copy on write is. Part of the problem may be
> the problem is not well defined. Are we talking about atomic
> strings or non-atomic strings?

"Non-atomic" ones, I'm sure. Basically, it's std::string as defined
by the ISO/IEC 14882:1998(E) [C++98 Std.] with respect to externals,
plus POSIX's memory synchronization rules 4.10 -- w.r.t. threading;
"atomic" reference count objects [internal pthread_refcount_t ;-) ]
aside.

> COW is just an implementation.

Yes, I just wanted to point out that your N1 and N2 comments aren't
"correct" (are actually WRONG)... if you look at the implementation
[well, a sort of "implementation"] referenced above. ;-) ;-)

regards,
alexander.

Joseph Seigh

unread,
Nov 30, 2002, 3:12:50 PM11/30/02
to

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:
...
> >

> > I already know what copy on write is. Part of the problem may be
> > the problem is not well defined. Are we talking about atomic
> > strings or non-atomic strings?
>
> "Non-atomic" ones, I'm sure. Basically, it's std::string as defined
> by the ISO/IEC 14882:1998(E) [C++98 Std.] with respect to externals,
> plus POSIX's memory synchronization rules 4.10 -- w.r.t. threading;
> "atomic" reference count objects [internal pthread_refcount_t ;-) ]
> aside.
>
> > COW is just an implementation.
>
> Yes, I just wanted to point out that your N1 and N2 comments aren't
> "correct" (are actually WRONG)... if you look at the implementation
> [well, a sort of "implementation"] referenced above. ;-) ;-)
>

If it's non-atomic then you have to have some sort of synchronized or
controlled access to the string object which should mean you will
have at least one valid link to the string data buffer. If this is
not the case, you are doing something wrong, possibly a race condition
due to the order you are doing things in.

Furthermore, for conventional reference counting, zero is not even
a valid value to see (unless you decremented it to zero). It means
you are seeing memory in an undefined state since it is likely to
have been freed already.

Joe Seigh

Alexander Terekhov

unread,
Nov 30, 2002, 3:29:59 PM11/30/02
to

Joseph Seigh wrote:
[...]

> > > COW is just an implementation.
> >
> > Yes, I just wanted to point out that your N1 and N2 comments aren't
> > "correct" (are actually WRONG)... if you look at the implementation
> > [well, a sort of "implementation"] referenced above. ;-) ;-)
> >
> If it's non-atomic then you have to have some sort of synchronized

Yep -- on "user/client" level.

> or controlled access to the string object which should mean you will
> have at least one valid link to the string data buffer. If this is
> not the case, you are doing something wrong, possibly a race condition
> due to the order you are doing things in.
>
> Furthermore, for conventional reference counting, zero is not even
> a valid value to see (unless you decremented it to zero).

Yup -- that's why I use it as "unshareble" state indicator to support
"outstanding" refs/iterators and "safe"/non-shared copying.

regards,
alexander.

Joseph Seigh

unread,
Nov 30, 2002, 5:18:17 PM11/30/02
to

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:

> [...]


> > Furthermore, for conventional reference counting, zero is not even
> > a valid value to see (unless you decremented it to zero).
>
> Yup -- that's why I use it as "unshareble" state indicator to support
> "outstanding" refs/iterators and "safe"/non-shared copying.
>

Why would you need to mark it unshareable? You control the string. It's
perfectly safe to iterate thru it as long as you don't modify it out from
underneath the iterators. What kind of iterator semantics are you trying
to accomplish? The semantics should be identical to deep copies strings.

Joe Seigh

Joseph Seigh

unread,
Dec 1, 2002, 11:00:27 AM12/1/02
to

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:

> Yup -- that's why I use it as "unshareble" state indicator to support
> "outstanding" refs/iterators and "safe"/non-shared copying.
>

Ok, I see what you're trying to do here. As a note to everybody, what is
going on is they're trying to optimize the string class by doing shallow
copying (copying by reference) whenever possible. But iterators are quite
performance sensitive, more so than copying. You could have each interator
do a getWritableData() but even that one if statement might be considered
too expensive. So iterators have higher optimization priority than copying.

I can see why some say that trying to optimize this at all is more trouble than
it's worth.

You *are* trying to do a reader/writer solution. The copy operations are readers
and the interators are writers. Since you need to keep track of the interators
to maintain "unsharable" status anyway, try this (abuse of the reference count).
When you create an iterator, subtract 2 from the reference count of buffer returned
by getWriteableData() which will always be <= 1. When you destroy an iterator, add
2 to the reference count and if the result is zero, delete the reference counted buffer.

For copy operators, do shallow copy only if refcount >= 1.

So, for example, if the reference count is 1 and you create an iterator, you subtract 2
and the reference count goes to -1. If the iterator finishes first, the reference count
goes back to 1. If the strings is destroyed before the iterator finishes, the reference
count goes from -1 to -2 and then when the iterator is destroyed 2 is added making it go
to 0 indicating the buffer should be deleted.

Joe Seigh

Hillel Y. Sims

unread,
Dec 1, 2002, 9:37:05 PM12/1/02
to
"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3DE93AA7...@xemaps.com...

>
> Why would you need to mark it unshareable? You control the string. It's
> perfectly safe to iterate thru it as long as you don't modify it out from
> underneath the iterators. What kind of iterator semantics are you trying
> to accomplish? The semantics should be identical to deep copies strings.
>

{
string s = "hello";
char* pc = &s[0];
string s2 = s; // shared?
*pc = ' ';
cout << s2 << '\n';
}

hys

Alexander Terekhov

unread,
Dec 2, 2002, 4:48:02 AM12/2/02
to

Joseph Seigh wrote:
[...]
> If the iterator finishes first, ....

Joe, the problem is that the C++ standard doesn't allow "smart" [aka
"proxies"] iterators/refs for std::string stuff. They are "dumb". So,
as soon as one had published non-const "dump" reference to the string
content/buffer, s/he should better mark/make it "unshareble"... until
some *explicit* mutation will take place. This is needed to prevent
internal sharing [this would be incorect] "in between", so to speak.
See the Sutter's articles for details...

regards,
alexander.

Joseph Seigh

unread,
Dec 2, 2002, 5:09:07 AM12/2/02
to

"Hillel Y. Sims" wrote:
>
> "Joseph Seigh" <jsei...@xemaps.com> wrote in message
> news:3DE93AA7...@xemaps.com...
> >
> > Why would you need to mark it unshareable? You control the string. It's
> > perfectly safe to iterate thru it as long as you don't modify it out from
> > underneath the iterators. What kind of iterator semantics are you trying
> > to accomplish? The semantics should be identical to deep copies strings.
> >
>
> {
> string s = "hello";
> char* pc = &s[0];
> string s2 = s; // shared?
> *pc = ' ';
> cout << s2 << '\n';
> }
>

Raw pointers would break any data type that has a dynamic internal
structure.

string s = "hello";
char* pc = &s[0];

s = "a string large enough to cause reallocation";
*pc = ' ';
cout << s << '\n';

So I don't think there is any point that can be made with them.

Joe Seigh

Joseph Seigh

unread,
Dec 2, 2002, 6:00:55 AM12/2/02
to

It doesn't? I can see I wasted $57.99 then.

Joe Seigh

Hillel Y. Sims

unread,
Dec 2, 2002, 6:04:53 PM12/2/02
to
"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3DEB32C5...@xemaps.com...

>
> "Hillel Y. Sims" wrote:
> >
> >
> > {
> > string s = "hello";
> > char* pc = &s[0];
> > string s2 = s; // shared?
> > *pc = ' ';
> > cout << s2 << '\n';
> > }
> >
> Raw pointers would break any data type that has a dynamic internal
> structure.
>
> string s = "hello";
> char* pc = &s[0];
> s = "a string large enough to cause reallocation";
> *pc = ' ';
> cout << s << '\n';
>
> So I don't think there is any point that can be made with them.
>
> Joe Seigh

Well philosophically you are correct, but then you must meet std::string
(which I guess is the 'most' desired interface to emulate for COW strings).
Please see Section 21.3.4 of the C++ Standard.
http://www.inf.uni-konstanz.de/~kuehl/cd2/lib-strings.html#lib.string.access

<quote>
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);

Effects:
The reference returned is invalid after any subsequent call to
c_str(), data(), or any non-const member function for the object.

</quote>

In fact, for the example you gave, the reference is indeed no longer valid
after the modification to s, and the code as written is illegal (though not
detectable by the compiler ;-). In my example above, no "non-const" access
is actually performed to s between taking the reference and copying it to
s2. Thus that code should be legal, and of course, we would expect it to
only result in s being modified in-place, not s2. But since there is no way
to track lifespan of pure references/pointers, the string must be marked
'permanently' unshareable once that access is given out. Now of course, if
you give up on pure std::string and write your own that is mostly
compatible, but only as necessary, and leave out some of the cruft, you may
be able to do more interesting things...

Hillel Y. Sims

unread,
Dec 2, 2002, 6:21:03 PM12/2/02
to
"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3DEB3EEA...@xemaps.com...

$57.99? The ISO Standard document can be purchased online as a .pdf file for
only $18.. ;-)

hys

Joseph Seigh

unread,
Dec 2, 2002, 6:31:53 PM12/2/02
to

"Hillel Y. Sims" wrote
...

> Please see Section 21.3.4 of the C++ Standard.
> http://www.inf.uni-konstanz.de/~kuehl/cd2/lib-strings.html#lib.string.access
>
> <quote>
> const_reference operator[](size_type pos) const;
> reference operator[](size_type pos);
>
> Effects:
> The reference returned is invalid after any subsequent call to
> c_str(), data(), or any non-const member function for the object.
>
> </quote>
>

const_reference and reference are required to be raw C pointers? I.e.
char * for a char string.

Joe Seigh

Joseph Seigh

unread,
Dec 2, 2002, 7:36:40 PM12/2/02
to

"Hillel Y. Sims" wrote:
> ...

> Well philosophically you are correct, but then you must meet std::string
> (which I guess is the 'most' desired interface to emulate for COW strings).
> Please see Section 21.3.4 of the C++ Standard.
> http://www.inf.uni-konstanz.de/~kuehl/cd2/lib-strings.html#lib.string.access
>
> <quote>
> const_reference operator[](size_type pos) const;
> reference operator[](size_type pos);
>
> Effects:
> The reference returned is invalid after any subsequent call to
> c_str(), data(), or any non-const member function for the object.
>
> </quote>
>
> In fact, for the example you gave, the reference is indeed no longer valid
> after the modification to s, and the code as written is illegal (though not
> detectable by the compiler ;-). In my example above, no "non-const" access
> is actually performed to s between taking the reference and copying it to
> s2. Thus that code should be legal, and of course, we would expect it to
> only result in s being modified in-place, not s2. But since there is no way
> to track lifespan of pure references/pointers, the string must be marked
> 'permanently' unshareable once that access is given out. Now of course, if
> you give up on pure std::string and write your own that is mostly
> compatible, but only as necessary, and leave out some of the cruft, you may
> be able to do more interesting things...
>

Well, even if the standard doesn't actually require that behavior, I could see
that enough people could be depending on the side effects of common implementations
to make it a de facto standard. Which brings up the question why anyone would
want to make a better imperfect implementation. Not me. I'm a minimalist.
If someone paid me I'd rewrite Java's AWT to be even smaller and simpler, an
anti-Swing as it were. Maybe we need an anti-STL for C++.

Joe Seigh

Hillel Y. Sims

unread,
Dec 2, 2002, 11:29:01 PM12/2/02
to
"Joseph Seigh" <jsei...@xemaps.com> wrote in message
news:3DEBFE17...@xemaps.com...

> Which brings up the question why anyone would
> want to make a better imperfect implementation.

Define "better" vs. "imperfect"? Assuming the only problem is uncontrolled
access to the internal buffer, what if you could write your own version that
protected that better? (this can be done..) Then you could apply the
refcounting trick you suggested to avoid 'permanent' unshareable status
(actually I'm going to look into that...). Then you can write a templated
"generic-string" package that provides functions that operate on "string"
data regardless of the actual string type, and "string_cast<>" which can
cast between different string types (only) as needed (so you can interact
with 3rd party code that relies on std::string, but locally use your
higher-performance string class, with minimal difficulty) (google for
"generic_string.h" or "string_cast"... well my string_cast is better than
boost's... ;-)

;-)

Joseph Seigh

unread,
Dec 3, 2002, 11:43:03 AM12/3/02
to

In general I would restrict raw access to an object's internal data to
within an expression. There, if you have to, you have some control thru
the use of temps, which is what I did with atomic_ptr which is about as
sensitive as you can get to raw access.

For stuff like string classes, you could make your reference types a class like

class reference {
private:
char * ptr;
public:
reference();
~reference();
char * operator ->() { return ptr; }
// etc...
}

In theory, dereferencing it should be as efficient as dereferencing a raw ptr, assuming
decent optimization and the formal C++ grammar doesn't screw up the semantics with
can't get there from here type situations. Then you would get additional things that
raw pointers can't give you like destructors to maintain reference counting.

Kind of hard to compete with std::string. They allowed a lot of undefined behavior
so they didn't have to do any safety checking. You could defer some of the safety
checking to the destructor where there would be less performance impact. If you
used the referenc counting trick I proposed, if a non const reference destructor
saw the reference count update return a even number, meaning the orginal string
buffer was deleted, you could throw an exception because you had potentially updated a
buffer no longer being used.

Joe Seigh

Charles Bryant

unread,
Dec 10, 2002, 8:42:54 PM12/10/02
to
In article <6cDE9.13865$Go1....@news4.srv.hcvlny.cv.net>,
Hillel Y. Sims <use...@phatbasset.com> wrote:
...
>Wow. I am trying to let all this sink in. This is really starting to sound
>like quantum physics... (the "Butenhof Uncertainty Principle"??)

Actually, it's more like Relativity. If two events happen, we
normally consider that one happens before the other, and that an
observer which sees them in the wrong order is the victim of some
sort of illusion. However Relativity shows that sometimes there is no
single 'right' order.

This is like code where one CPU performs "x = 1; y = 2;" and believes
that the assignment to x happens before the assignment to y, while
another CPU sees the assignments in the opposite order. It's a lot
easier to understand if you stop thinking that one CPU is 'right' and
the other is 'wrong'. It then follows that there can be four types
of memory barrier: load/load, load/store, store/load, and
store/store. Then you can see that:

CPU1: x = 1; membar store/store; y = 1;

enforces the ordering on the assignments to x and y. It doesn't
necessarily have anything to do with memory or caches or suchlike. It
merely means that for CPU1, these operations have a definite order.

Also, CPU2: a = x; membar load/load; b = y;

enforces the ordering of the reading of x and y for CPU2 so that, for
CPU2, x must be read before y.

Of course Relativity shows that sometimes you can say that one event
definitely came before another, and the criterion depends on the
speed of light. This can be seen as the equivalent of the time taken for
a value to be sent from one CPU to another. Just as events in
Relativity can be ordered if they are subject to constraints, memory
operations can be ordered if they are subject to constraints, and
memory barriers supply the necessary constraints. All memory
operations cannot have these constraints as that would either force
CPUs to operate slowly or force the communication between them to be
sufficiently fast. Since we normally want both to be fast, this
leaves us with a requirement for explicit memory barriers.

--
Eppur si muove

Alexander Terekhov

unread,
Feb 10, 2003, 5:00:14 PM2/10/03
to

< kinda pthread_refcount_t-"SKETCHv3" >

Alexander Terekhov wrote:
>
> David Butenhof wrote:
> [...]


> > > < IntAtomicGet() is used to read the *current* value >
> >

> > There's no such thing as "atomic get". ....
>
> Okay, let me put it this way... Below you can find a sort of "updated"
> COW string example using the hypothetical pthread_refcount_t interface.
> Do you see any problems with it? What are they? I hope that you'll
> "deduce" the meaning of pthread_refcount_t calls from it with the COW
> example used as an illustration. I'm sorry, but it's still in C++. ;-)
>
> pthread_refcount_t interface(*) used [without error checking and I've
> not yet had a chance to compile it, sorry]:
>
> pthread_refcount_init()
> pthread_refcount_destroy()
> pthread_refcount_getvalue()
> pthread_refcount_setvalue()
> pthread_refcount_increment()
> pthread_refcount_decrement_no_msync()
>
> COW string illustration based on Herb Sutter's "gotw-045-test-harness":
> <http://www.gotw.ca/gotw/045.htm> <http://www.gotw.ca/gotw/045code.zip>
>
> namespace COW_AtomicInt3 {
[...]
> (*) <http://tinyurl.com/2wai> (SKETCHv2, that's "current" one).
> <http://tinyurl.com/2wal> (SKETCHv1, that's an older one).

< 2 x "Forward Inline" >

-------- Original Message --------
Newsgroups: gmane.comp.lib.boost.devel
Subject: Re: Weak ref. via atomicity (RE: Smart pointers:...)

Pavel Vasiliev wrote:
>
> Alexander Terekhov wrote:
>
> [...]
> >> Pavel Vasiliev wrote:
> >> > The true locking/unlocking, not InterlockedIncrement/Decrement()
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> > Nah, pthread_refcount_t. ;-)
>
> >> > even if availabe, is necessary to support weak references.
> >> > [...]
> >>
>
> > It's probably a bit more exciting to take care of all possible races
> > without "a true lock" protecting both counters. I'm not sure that the
> > true locking is *necessary* to support weak references. Do you have
> > an illustration, Pavel?
>
> May be the following code answers the challenge? :-). ...

Not bad. ;-) Well,

//*** ES and error checking aside...

class refs {
public:

refs() {
pthread_refcount_init(&strong_count, 1);
pthread_refcount_init(&weak_count, 1);
}

~refs() {
pthread_refcount_destroy(&weak_count);
pthread_refcount_destroy(&strong_count);
}

//*** Called by existing "strong_ptr".

void acquire_strong() {
pthread_refcount_increment(&strong_count);
}

void release_strong() {
int status = pthread_refcount_decrement(&strong_count);
if (PTHREAD_REFCOUNT_DROPPED_TO_ZERO == status) {
destruct_object();
release_weak();
}
}

void acquire_weak_from_strong() {
acquire_weak();
}

//*** Called by existing "weak_ref".

void acquire_weak() {
pthread_refcount_increment( &weak_count );
}

void release_weak() {
int status = pthread_refcount_decrement_no_msync(&weak_count);
if (PTHREAD_REFCOUNT_DROPPED_TO_ZERO == status)
destruct_self();
}

bool acquire_strong_from_weak() {
int status = pthread_refcount_enroll_one(&strong_count); // _many( &refcount, 1);
if (PTHREAD_REFCOUNT_DROPPED_TO_ZERO == status) // Ouch, did not work [too late].
return false;
return true;
}

private:

void destruct_object(); // "delete p_object".
void destruct_self(); // "delete this".

pthread_refcount_t strong_count;
pthread_refcount_t weak_count;

T* p_object;

}; //*** class refs

Or am I just missing and/or misunderstanding something?

regards,
alexander.

-------- Original Message --------
Newsgroups: gmane.comp.lib.boost.devel
Subject: Re: Weak ref. via atomicity (RE: Smart pointers:...)

Peter Dimov wrote:
>
> Alexander Terekhov wrote:
> [...]
> > bool acquire_strong_from_weak() {
> > int status = pthread_refcount_enroll_one(&strong_count); //
> [...]
> > Or am I just missing and/or misunderstanding something?
>
> You are missing the fact that nobody (even Google) has a clue as to what
> pthread_refcount_enroll_one is/does. ;-)

Ah. Sorry. Basically it's a rather simple "CAS"- ["compare-and-swap"]
or "LL/SC"- [load-locked/store-conditional] based operation; in terms
of proposed Java-atomics(*): <behaviorally, error checking aside>

extern "C" int pthread_refcount_enroll_one(pthread_refcount_t* rc) {
size_t value;
do {
if ( 0 == (value = rc->__get()) )
return PTHREAD_REFCOUNT_DROPPED_TO_ZERO;
} while (!rc->__attemptUpdate(value, value + 1));
return 0;
}

regards,
alexander.

(*) [but WITHOUT any memory synchronization/barriers for _enroll_*()]
http://groups.google.com/groups?threadm=3D8B257C.4D8D58F%40web.de
(Subject: Re: rwlock using pthread_cond (or Win32 events) [last
link retired; stuff is available here: <http://tinyurl.com/5mmj>])

Alexander Terekhov

unread,
Feb 12, 2003, 9:40:58 AM2/12/03
to

Alexander Terekhov wrote:
>
> < kinda pthread_refcount_t-"SKETCHv3" > ....

Here's the "updated" SKETCH:

PTHREAD_REFCOUNT_MAX // upper bound
PTHREAD_REFCOUNT_INITIALIZER(N) // macro for statics INIT
pthread_refcount_init() // mutex like but with initial value
pthread_refcount_destroy() // also mutex like
pthread_refcount_getvalue() // see "COW_AtomicInt3" string example
pthread_refcount_setvalue() // see "COW_AtomicInt3" string example
pthread_refcount_increment() // without any msync whatsoever
pthread_refcount_add() // without any msync but with r.checking
pthread_refcount_decrement() // with msync for proper mut.obj-cleanup
pthread_refcount_subtract() // with msync and with r.checking
pthread_refcount_decrement_nomsync() // without any msync whatsoever
pthread_refcount_subtract_nomsync() // without any msync but with r.checking
pthread_refcount_enroll_one() // without any msync whatsoever
pthread_refcount_enroll_many() // without any msync but with r.checking
pthread_refcount_attr_*() // PROCESS_SHARED and etc. attr-stuff
std::size_t as "value type" // for get/set/add/... "value" param
PTHREAD_REFCOUNT_DROPPED_TO_ZERO // for pthread_refcount_decrement*()
// and pthread_refcount_sub() to
// indicate "dropped to zero" condition
// that MAY be used to safely destroy
// associated ref.counted object or
// simply continue -- increment/setval
// also used for weak->strong "enrolls"
// by pthread_refcount_enroll_one() and
// pthread_refcount_enroll_many()

Uhmm, as WW-aka-Attila says... Kapis?

Comments/suggestions/objections/whatever? (an essay from David Butenhof
would be greatly appreciated as well ;-) )

regards,
alexander.

--
http://www.usenix.org/publications/library/proceedings/c++94/full_papers/ellis.a

Joseph Seigh

unread,
Feb 12, 2003, 2:06:41 PM2/12/03
to

Alexander Terekhov wrote:
>
> Alexander Terekhov wrote:
> >
> > < kinda pthread_refcount_t-"SKETCHv3" > ....
>

...


>
> Uhmm, as WW-aka-Attila says... Kapis?
>
> Comments/suggestions/objections/whatever? (an essay from David Butenhof
> would be greatly appreciated as well ;-) )
>

I think compare and swap would be sufficient and generally more useful
overall. Probably some kind of memory barrier as well.

There is the argument that low level primatives should not be provided because
some idiot might misuse them. The trend is to go in the other direction with
more restrictive contructs like the synchronized blocks in Java. No unmatched
lock/unlock pairs there unless you "cheat" like I did with a condvar class for
Java.

Joe Seigh

Alexander Terekhov

unread,
Feb 12, 2003, 3:06:19 PM2/12/03
to

Joseph Seigh wrote:
[...]

> I think compare and swap would be sufficient and generally more useful
> overall. Probably some kind of memory barrier as well.

That's way too low level, in my view. Also, consider that even something
along the lines of proposed Java's-JSR-166-atomic-classes would inject
way too many barriers if it would be used to implement reference counting
for something like COW strings and shared_ptr<> together with its "weak
brother". I really see >>NO<< reason [unless I'm just missing and/or
misunderstanding something] to NOT standardize ``pthread_refcount_t.''

< Forward Inline >

-------- Original Message --------
Newsgroups: gmane.comp.lib.boost.devel
Subject: Re: Weak ref. via atomicity (RE: Smart pointers:...)

Pavel Vasiliev wrote:
[...]
> pthread_refcount_enroll_one() (== "increment_if_non_zero()") may be
> emulated in Win32 (using InterlockedCompareExchange()).

Yeah, archaic Win95(/Intel386) aside. AFAIK, InterlockedCompareExchange
(/cmpxchg) "Requires Windows 98 or later" (/486+).

>
> The problem is what to do on e.g. current Unix'es + GCC.
> Alternatives are either wait for pthread_refcount_t in many pthreads
> implementations or to explicitly use inline assembler(s).
>
> Other alternatives?
>
> The problem is so obvious that the question may be thought rhetorical.
> But I really ask whether you have some suggestions.

http://lists.boost.org/MailArchives/boost/msg30364.php
("> Well, I guess...")

regards,
alexander.

Joseph Seigh

unread,
Feb 12, 2003, 3:44:34 PM2/12/03
to

Alexander Terekhov wrote:
>
> Joseph Seigh wrote:
> [...]
> > I think compare and swap would be sufficient and generally more useful
> > overall. Probably some kind of memory barrier as well.
>
> That's way too low level, in my view. Also, consider that even something
> along the lines of proposed Java's-JSR-166-atomic-classes would inject
> way too many barriers if it would be used to implement reference counting
> for something like COW strings and shared_ptr<> together with its "weak
> brother". I really see >>NO<< reason [unless I'm just missing and/or
> misunderstanding something] to NOT standardize ``pthread_refcount_t.''
>

If compare and swap is too low level then pthread_refcount_t is too low level.
Careful if you are using that for an argument.

Joe Seigh

ig...@notformail.paco.net

unread,
Feb 13, 2003, 3:55:58 AM2/13/03
to
Alexander Terekhov <tere...@web.de> wrote:
>
> < kinda pthread_refcount_t-"SKETCHv3" >
>
> Alexander Terekhov wrote:
>>
>> David Butenhof wrote:
>> [...]
>> > > < IntAtomicGet() is used to read the *current* value >
>> >
>> > There's no such thing as "atomic get". ....
>>
>> Okay, let me put it this way... Below you can find a sort of "updated"
>> COW string example using the hypothetical pthread_refcount_t interface.
>> Do you see any problems with it? What are they? I hope that you'll
>> "deduce" the meaning of pthread_refcount_t calls from it with the COW
>> example used as an illustration. I'm sorry, but it's still in C++. ;-)
>>
>> pthread_refcount_t interface(*) used [without error checking and I've
>> not yet had a chance to compile it, sorry]:
>>
>> pthread_refcount_init()
>> pthread_refcount_destroy()
>> pthread_refcount_getvalue()

Sorry if I missed something, but how result of getvalue can be safely used?
It have to be protected by something?

--
Igor Khasilev |
PACO Links, igor at paco dot net |

Alexander Terekhov

unread,
Feb 13, 2003, 4:15:19 AM2/13/03
to

Pls see below.

> It have to be protected by something?

No. http://groups.google.com/groups?selm=3DE3A142.7DAACB3B%40web.de

size_t IntAtomicGet( pthread_refcount_t& refs ) {
size_t result;
pthread_refcount_getvalue( &refs, &result );
return result;
}

.
.
.

inline String::~String() {
if ( 2 > IntAtomicGet( data_->refs ) ||
1 > IntAtomicDecrement( data_->refs ) )
delete data_;
}

inline String::String( const String& other )
{
switch ( IntAtomicGet( other.data_->refs ) ) {
case 00: data_ = new StringBuf( *other.data_ ); break;
default: IntAtomicIncrement( (data_ = other.data_)->refs );
}
++nCopies;
}

inline void String::Append( char c ) {
switch ( IntAtomicGet( data_->refs ) ) {
case 00: pthread_refcount_setvalue( &data_->refs, 1 ); // mutation -> 'shareable'
case 01: data_->Reserve( data_->used+1 ); break;
default: data_ = new StringBuf( AppendStringBufPtr( data_ ) );
} data_->buf[data_->used++] = c;
}

inline char& String::operator[]( size_t n ) {
switch ( IntAtomicGet( data_->refs ) ) {
case 01: pthread_refcount_setvalue( &data_->refs, 0 ); // 'unshareable'
case 00: break;
default: data_ = new StringBuf( CharRefStringBufPtr( data_ ) );
} return data_->buf[n];
}

.
.
.

inline void String::Clear() {
if ( 2 > IntAtomicGet( data_->refs ) ||
1 > IntAtomicDecrement( data_->refs ) ) {
data_->Clear();
pthread_refcount_setvalue( &data_->refs, 1 ); // mutation -> 'shareable'
}
else {
data_ = new StringBuf;
}
}

regards,
alexander.

Alexander Terekhov

unread,
Feb 13, 2003, 12:23:02 PM2/13/03
to

Alexander Terekhov wrote:
[...]
> class refs { ....

< Forward Inline >

-------- Original Message --------
Newsgroups: gmane.comp.lib.boost.devel
Subject: Re: Weak ref. via atomicity (RE: Smart pointers:...)

Pavel Vasiliev wrote:
[...]


> > pthread_refcount_decrement() // with msync for proper mut.obj-cleanup

Basically [in terms of proposed/revised Java MM], it's VOLATILE-RELEASE
when count > 1 'prior to decrement' and VOLATILE-ACQUIRE when the count
drops to 0. I think that one might even try to optimize-away VOLATILE-
RELEASE msync [and interlocked instruction(s)] for the'count == 1'-case;
on some platforms, perhaps.

> > pthread_refcount_decrement_nomsync() // without any msync whatsoever
>

> "nomsync" here stands for "no memory view update even when counter drops
> to 0"?

Yes, it's a variant without [VOLATILE-RELEASE/VOLATILE-ACQUIRE] msync.

> If yes, then you probably should not use it in


>
> > void release_weak() {
> > int status = pthread_refcount_decrement_no_msync(&weak_count);
> > if (PTHREAD_REFCOUNT_DROPPED_TO_ZERO == status)
> > destruct_self();
> > }
>

> Consider the following:
> > Given: two threads -- A and B,
> > thread A has "strong" one,
> > thread B has "weak" one,
> > strong_count == 1, weak_count == 2.
>
> Thread A releases the last "strong" reference.
> destruct_object() deletes p_object AND modifies the refs instance
> (e.g. sets p_object to NULL for debug reasons).

Yeah... "cleanup" for intrusively counted objects [with refs embedded
within the object] aside for a moment...

>
> strong_count == 0, weak_count == 1
>
> Thread B releases the last "weak" reference.
> release_weak() calls destruct_self() on expired memory view.
>
> "Or am I just missing and/or misunderstanding something?" :-)

No, you're right. Well, let me put it this way: < ;-) >

Surely you have missed something! See the "// with msync for..."
comment above. It says "for proper mut.obj-cleanup". "mut." stands
for mutable. Obviously, the fact that _no_msync() variant is used
here indicates that destruct_object() [and the rest too] is meant
to be >>'const'<< qualified method[s] that shall not modify any
>>'mutable'<< members (things like "pthread_refcount_t" counts,
mutexes, if any). And all this was crystal clear, I mean.

Good catch. Thanks.
^^^^^(*) Apropos...

regards,
alexander.

(*) http://groups.google.com/groups?selm=3E4B57AB.8BBF5A32%40web.de
(Subject: Re: exceptions)

Alexander Terekhov

unread,
Feb 14, 2003, 8:34:33 AM2/14/03
to

Alexander Terekhov wrote:
[...]

> Pavel Vasiliev wrote:
> [...]
> > > pthread_refcount_decrement() // with msync for proper mut.obj-cleanup
>
> Basically [in terms of proposed/revised Java MM], it's VOLATILE-RELEASE
> when count > 1 'prior to decrement' and VOLATILE-ACQUIRE when the count
> drops to 0. I think that one might even try to optimize-away VOLATILE-
> RELEASE msync [and interlocked instruction(s)] for the'count == 1'-case;
> on some platforms, perhaps.
>
> > > pthread_refcount_decrement_nomsync() // without any msync whatsoever
> >
> > "nomsync" here stands for "no memory view update even when counter drops
> > to 0"?
>
> Yes, it's a variant without [VOLATILE-RELEASE/VOLATILE-ACQUIRE] msync.

[ ..."release_weak() calls destruct_self() on expired memory view"... ]

Here's "SKETCHv4" together with "class refs" illustration.

PTHREAD_REFCOUNT_MAX // upper bound
PTHREAD_REFCOUNT_INITIALIZER(N) // macro for statics INIT
pthread_refcount_init() // mutex like but with initial value
pthread_refcount_destroy() // also mutex like
pthread_refcount_getvalue() // see "COW_AtomicInt3" string example
pthread_refcount_setvalue() // see "COW_AtomicInt3" string example
pthread_refcount_increment() // without any msync whatsoever
pthread_refcount_add() // without any msync but with r.checking

pthread_refcount_increment_positive() // without any msync but with 0-checking
pthread_refcount_add_to_positive() // without any msync but with 0&r.checking


pthread_refcount_decrement() // with msync for proper mut.obj-cleanup

pthread_refcount_subtract() // with msync and with r.checking

pthread_refcount_decrement_nomsync() // without any msync whatsoever

pthread_refcount_subtract_nomsync() // without any msync but with r.checking

pthread_refcount_decrement_rel() // with RELEASE msync-if-'count > 1'
pthread_refcount_subtract_rel() // with RELEASE msync-... and r.checking
pthread_refcount_decrement_acq() // with ACQUIRE msync-if-dropped-to-zero
pthread_refcount_subtract_acq() // with ACQUIRE msync-... and r.checking


pthread_refcount_attr_*() // PROCESS_SHARED and etc. attr-stuff
std::size_t as "value type" // for get/set/add/... "value" param
PTHREAD_REFCOUNT_DROPPED_TO_ZERO // for pthread_refcount_decrement*()
// and pthread_refcount_sub() to
// indicate "dropped to zero" condition
// that MAY be used to safely destroy
// associated ref.counted object or
// simply continue -- increment/setval

// it's also used for "strong_from_weak"
// by pthread_refcount_increment_positive
// and pthread_refcount_add_to_positive


//*** ES and error checking aside...

class refs {
public:

refs() {
pthread_refcount_init(&strong_count, 1);
pthread_refcount_init(&weak_count, 1);
}

~refs() {
pthread_refcount_destroy(&weak_count);
pthread_refcount_destroy(&strong_count);
}

//*** Called by existing "strong_ptr".

void acquire_strong() {
pthread_refcount_increment(&strong_count);
}

void release_strong() {
int status = pthread_refcount_decrement(&strong_count);
if (PTHREAD_REFCOUNT_DROPPED_TO_ZERO == status) {
destruct_object();

status = pthread_refcount_decrement_rel(&weak_count);


if (PTHREAD_REFCOUNT_DROPPED_TO_ZERO == status)
destruct_self();
}
}

void acquire_weak_from_strong() {
acquire_weak();
}

//*** Called by existing "weak_ref".

void acquire_weak() {
pthread_refcount_increment( &weak_count );
}

void release_weak() {
int status = pthread_refcount_decrement_acq(&weak_count);


if (PTHREAD_REFCOUNT_DROPPED_TO_ZERO == status)
destruct_self();
}

bool acquire_strong_from_weak() {
int status = pthread_refcount_increment_positive(&strong_count); // _add_to_positive(&refcount, 1);


if (PTHREAD_REFCOUNT_DROPPED_TO_ZERO == status) // Ouch, did not work [too late].
return false;
return true;
}

private:

void destruct_object(); // "delete p_object".
void destruct_self(); // "delete this".

pthread_refcount_t strong_count;
pthread_refcount_t weak_count;

T* p_object;

}; //*** class refs

regards,
alexander.

Attila Feher

unread,
Feb 15, 2003, 6:09:47 AM2/15/03
to
Alexander Terekhov wrote:
> [ ..."release_weak() calls destruct_self() on expired memory
> view"... ]
>
> Here's "SKETCHv4" together with "class refs" illustration.
>
> PTHREAD_REFCOUNT_MAX // upper bound
> PTHREAD_REFCOUNT_INITIALIZER(N) // macro for statics INIT
[SNIP]

Cool! When will it ship?

Attila


Alexander Terekhov

unread,
Feb 15, 2003, 11:24:26 AM2/15/03
to

Planned GA Dates:

March 1, 2003
poor man's version [POSIX mutex based], electronic only.

March 1, 2003
crazy man's version [Wintel32, "Requires Windows 98 or later"(*)], electronic only.

regards,
alexander.

(*) 32bit only; won't work under SSE/SSE2 relaxed memory model "extension(s)".

Alexander Terekhov

unread,
Feb 18, 2003, 12:07:18 PM2/18/03
to

Alexander Terekhov wrote:
>
> Attila Feher wrote:
> >
> > Alexander Terekhov wrote:
> > > [ ..."release_weak() calls destruct_self() on expired memory view"... ]
> > >
> > > Here's "SKETCHv4" together with "class refs" illustration.
> > >
> > > PTHREAD_REFCOUNT_MAX // upper bound
> > > PTHREAD_REFCOUNT_INITIALIZER(N) // macro for statics INIT
> > [SNIP]
> >
> > Cool! When will it ship?
>
> Planned GA Dates:
>
> March 1, 2003
> poor man's version [POSIX mutex based], electronic only.

Here's beta1:

http://www.terekhov.de/pthread_refcount_t/poor-man/beta1/prefcnt.h
http://www.terekhov.de/pthread_refcount_t/poor-man/beta1/prefcnt.c

regards,
alexander.

Pavel Vasiliev

unread,
Feb 21, 2003, 9:25:44 AM2/21/03
to
Alexander Terekhov <tere...@web.de> wrote:

> Here's "SKETCHv4" together with "class refs" illustration.
[...]

> pthread_refcount_attr_*() // PROCESS_SHARED and etc. attr-stuff

Which other attributes do you expect? Is there real need for attributes?
IMO it would be nice to make pthread_refcount_t as lightweight as possible
(mutex-based implementations aside).

---
Pavel

Alexander Terekhov

unread,
Feb 21, 2003, 10:37:00 AM2/21/03
to

Pavel Vasiliev wrote:
>
> Alexander Terekhov <tere...@web.de> wrote:
>
> > Here's "SKETCHv4" together with "class refs" illustration.
> [...]
> > pthread_refcount_attr_*() // PROCESS_SHARED and etc. attr-stuff
>
> Which other attributes do you expect?

Well, at the moment, I think that the following attribute calls shall
be made "mandatory" for any implementation of "RC" ("Reference counts"
with _POSIX_REFCOUNTS/_SC_REFCOUNTS symbols) threading option.

pthread_refcountattr_init()
pthread_refcountattr_destroy()

If "TSH" (_POSIX_THREAD_PROCESS_SHARED) option is supported, then the
implementations would have to provide

pthread_refcountattr_setpshared()
pthread_refcountattr_getpshared()

calls as well... just like it's done for the "BAR" (barries) stuff,
for example.

Now, another the idea is to have a non-blocking *suboption* "RCNB"
("Reference counts nonblocking" with _POSIX_REFCOUNTS_NONBLOCKING/
_SC_REFCOUNTS_NONBLOCKING symbols).

If that suboption is NOT supported then the implementation would
basically use mutexes for pthread_refcount_t objects. In this case,
implementation shall probably provide the following OPTIONAL
attribute calls:

[for "TPP"(_POSIX_THREAD_PRIO_PROTECT)]

pthread_refcountattr_getprioceiling()
pthread_refcountattr_setprioceiling()

[for "TPP"|"TPI"(_POSIX_THREAD_PRIO_INHERIT)]

pthread_refcountattr_getprotocol()
pthread_refcountattr_setprotocol()

just like it would do for the mutexes.

> Is there real need for attributes?

Yes, I think so. Uhmm,

pthread_refcountattr_settype()/PTHREAD_REFCOUNT_HUGE (vs NORMAL)

would be just one way to exploit the attribute in the future, I guess ;-)

regards,
alexander.

0 new messages