thread-safe reference counting

Stephan Schaefer

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

Hi,

Has anyone tried to implement that ? I tried it and got problems on a
two-processor architecture. The problem seems to be that a decref() call can
delete an object while another instance (2nd processor!) starts to draw a
copy of it. Of course, decref and incref are encapsulated by a
CriticalSection Object (WIN32) in my implementation.

Stephan

George V. Reilly

unread,

Jun 19, 1998, 3:00:00 AM6/19/98

to

Stephan Schaefer <scha...@graphics.cs.uni-bonn.de> wrote:
) Has anyone tried to implement that ? I tried it and got problems on a
) two-processor architecture. The problem seems to be that a decref() call can
) delete an object while another instance (2nd processor!) starts to draw a
) copy of it. Of course, decref and incref are encapsulated by a
) CriticalSection Object (WIN32) in my implementation.

Actually, using InterlockedIncrement and InterlockedDecrement is
the standard Win32 technique for thread-safe reference counting.
--
/George V. Reilly mailto:g...@halcyon.com
ATL, Vim (Vi IMproved), Active Server Pages: http://www.halcyon.com/gvr/
Co-author, "Beginning ATL COM Programming", Wrox Press, 1998

Joe Seigh

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

In article <6mes1s$6c0$1...@halcyon.com>, g...@halcyon.com (George V. Reilly) writes:
|> Stephan Schaefer <scha...@graphics.cs.uni-bonn.de> wrote:
|> ) Has anyone tried to implement that ? I tried it and got problems on a
|> ) two-processor architecture. The problem seems to be that a decref() call can
|> ) delete an object while another instance (2nd processor!) starts to draw a
|> ) copy of it. Of course, decref and incref are encapsulated by a
|> ) CriticalSection Object (WIN32) in my implementation.
|>
|> Actually, using InterlockedIncrement and InterlockedDecrement is
|> the standard Win32 technique for thread-safe reference counting.

InterlockedIncrement and InterlockedDecrement aren't safe if the
reference count is kept in the data object proper. It's possible
that between the time that a pointer to an object is fetched and
the increment of the reference count within that object, for that
object to have been freed and reallocated from that heap, in which
case you've just corrupted another data object.

You need to have an additional mechanism to insure that the
increment operation can be performed safely. This might be a
global lock or some other proprietary solution.

--
Joe Seigh

Bil Lewis

unread,

Jun 22, 1998, 3:00:00 AM6/22/98

to

Let's think about this one *carefully*...

If the ref count is 1, and you drop that pointer... you'll
decrement the count to 0 and free the object. Is it possible
for another thread to be adding a pointer to that object at
exactly that time?

No.

No other thread has a pointer to it, because the ref count is
1! (The pointers are mutable, shared data, hence they do need
to be protected by a mutex. Thus only one thread at a time can
use a given pointer.)

(Am I missing anything here? Don't think so....)

Personally, I'm not going to be using ref counting anytime soon.
I'm a garbage collector kinda guy. For all but a small set of cases
(ie the kind of stuff I work on) GCs are faster, simpler, better.

Geodesic systems has an MT GC for C, C++ even. By replacing all that
complicated goo for keeping track of active objects, X11R6 runs ~25%
faster with a GC.

GC good! RC bad! GC good! RC bad! 4 legs good! 2 legs bad!

-Bil

Joe Seigh wrote:

> InterlockedIncrement and InterlockedDecrement aren't safe if the
> reference count is kept in the data object proper. It's possible
> that between the time that a pointer to an object is fetched and
> the increment of the reference count within that object, for that
> object to have been freed and reallocated from that heap, in which
> case you've just corrupted another data object.
>
> You need to have an additional mechanism to insure that the
> increment operation can be performed safely. This might be a
> global lock or some other proprietary solution.
>
> --
> Joe Seigh

--
================
B...@LambdaCS.com

http://www.LambdaCS.com
Lambda Computer Science
555 Bryant St. #194
Palo Alto, CA,
94301

Phone/FAX: (650) 328-8952

Wolfram Gloger

unread,

Jun 23, 1998, 3:00:00 AM6/23/98

to

Bil Lewis <B...@LambdaCS.com> writes:

> Personally, I'm not going to be using ref counting anytime soon.
> I'm a garbage collector kinda guy. For all but a small set of cases
> (ie the kind of stuff I work on) GCs are faster, simpler, better.

OK, I'll bite :-). Personally, I'm not going to be using garbage
collectors anytime soon. I'm an explicit deallocation kinda guy. For
all cases and implementations I have tested so far, explicit
dealloction is faster and considerably more memory-efficient than
garbage collection. See:

http://www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html

[Note: I'm not saying it's always better, and I certainly admit it
isn't simpler. I'm also not advocating reference counting.]

> Geodesic systems has an MT GC for C, C++ even. By replacing all that
> complicated goo for keeping track of active objects, X11R6 runs ~25%
> faster with a GC.

This I would like to understand in more detail: For all X servers and
X applications (meaning: lots of interaction) I've ever examined, the
time spent in allocation functions (malloc, free, realloc, etc.) was
much less than 3% (usually even less than 0.5%) of the application run
time.

So how could I expect to gain 25% in speed just by switching to a
garbage collector ? Or do you mean there is a version of X11R6 that
is specially tuned for a GC and runs 25% faster than code derived from
the sample implementation ?

> GC good! RC bad! GC good! RC bad! 4 legs good! 2 legs bad!

GC may be a good thing, but (for me, at least) it comes with a much
too heavy price tag.

Regards,
Wolfram.

Joe Seigh

unread,

Jun 23, 1998, 3:00:00 AM6/23/98

to

In article <6mnca8$r...@espresso.cafe.net>, k...@cafe.net (Kaz Kylheku) writes:
...
|>
|> Anyway, the key idea seems to be that when you are using object reference
|> counts, a good paradigm is to have some way of referring to an object using an
|> abstract, globally unique identifier, which can be tested for existence
|> independently of the existence of an object. Even after the object is
|> disposed of, you can use its identifier as a key in a lookup operation to
|> safely determine that the object no longer exists.

Using abstract pointers still has the same problem. It just adds an
extra step of indirection. Once the actual pointer is fetched you still
have the possible race condition. Unless of course you put the reference
count in the abstract pointer itself and that abstract pointer is never
deallocated.

--
Joe Seigh

Joe Seigh

unread,

Jun 23, 1998, 3:00:00 AM6/23/98

to

|> Personally, I'm not going to be using ref counting anytime soon.
|> I'm a garbage collector kinda guy. For all but a small set of cases
|> (ie the kind of stuff I work on) GCs are faster, simpler, better.
|>

|> Geodesic systems has an MT GC for C, C++ even. By replacing all that
|> complicated goo for keeping track of active objects, X11R6 runs ~25%
|> faster with a GC.
|>

|> GC good! RC bad! GC good! RC bad! 4 legs good! 2 legs bad!
|>
|>
|>
|>

|> -Bil

Ok, here's a picture of the possible race condition. Thread 1 wants
to reference an object and thread 2 wants to dereference the same object.
Thread x is any thread beside thread 1 that is allocating a new object.

Thread 1 Thread 2

fetch contents of pointer p
into local storage, p'

atomically decrement the reference count
set p to null
reference count is zero, deallocate object

Thread x

allocate old objects storage from heap
and incidentally set what was the old
reference count to some non zero value.

atomically increment the
reference count pointed to by p'

sees that old value of "reference count"
was non zero. must be ok. right?

Wrong. Not only has thread 1 misinterpreted the alleged reference count, it's
corrupted some other object entirely.

Note. We've not made it clear whether we are distinguishing between thread
references and references by shared global pointers. The original posting
didn't make that distinction and we've maintained its lack. But the basic
principle holds none the less.

At this point, I think we need to restate the problem to clarify the issues
involved. There are really two levels of synchronization involved. One is
the possibly simultaneous updating of the reference counts. The interlocked
updates take care of that problem but they don't take care of the other problem,
the possible race condition.

That other problem is better known as the readers/writers problem. Here the
reader threads are those threads adjusting the reference counts to account
for thread references, and the writer threads are those threads adjusting the
reference counts to account for global shared references.

Using that scheme, a "reader" thread can always safely reference an object
because that objects reference count will always be at least 1 for the
shared global reference to it. And that global reference can't be deleted
by a "writer" thread while "reader" threads have the readers/writers "lock".

Unfortunately, having to acquire a read lock every time a thread wants to
reference an object through a share global pointer can be quite expensive.
So other schemes have be developed to deal with this.

One such scheme was used in what was then VM/XA. The VM/XA kernel is
non-preemptive. Its kernel threads run until they explicitly preempt by
exiting to the dispatcher, a yield in effect.

A scheme was developed so that a "writer" thread that wanted to delete
an object, first deleted all global references to that object, and
then delayed deallocating that object until all processors (running
threads) had exited to the dispatcher at least once. "reader" threads
wanting to safely reference an object under this scheme, had to do it
without a loss of control, i.e. exiting to the dispatcher. Since the
object wouldn't be deallocated and possibly reused if they didn't lose
control, the access of that object was safe.

One of the typical uses of this scheme was to put a spin lock
within an object. The spin locks acted like reference counts in this
case since they prevented the object from being deallocated while it
was being used. The delay scheme allowed the spin lock acquire to safely
run even if the object was about to be deallocated. To deallocate the
object, a thread got the spin lock, deleted the pointer to the object,
set the lock state as "destroyed", and scheduled the deallocation of
the object. Subsequent attempts to get that lock by other threads
still referencing that object failed (lock destroyed) and re-fetched
the global pointer to get a new object or null.

This explanation is probably a bit brief and sketchy, but it serves to
illustrate the fact the race condition being discussed here is real,
was recognized as a problem, had significant performance implications,
and dealt with in a real life commercial multi-processing operating system.

I should point out also that this particular scheme is patented by IBM
and if you want to use it you have to license it from IBM. The patent
only covers the particular delay mechanism not the idea of delaying
a safe interval in general. There are other ways of doing a delay but
not as good, IMO.

And of course there are other schemes for dealing with this problem.
And garbage collection obviates the problem entirely. And I think
my explanation has just made gc look a whole lot more attractive. :-)

--
Joe Seigh

Bil Lewis

unread,

Jun 23, 1998, 3:00:00 AM6/23/98

to

Joe,

> Ok, here's a picture of the possible race condition. Thread 1 wants
> to reference an object and thread 2 wants to dereference the same object.
> Thread x is any thread beside thread 1 that is allocating a new object.
>
> Thread 1 Thread 2
>
> fetch contents of pointer p
> into local storage, p'
>
> atomically decrement the reference count
> set p to null
> reference count is zero, deallocate object
>
> Thread x
>
> allocate old objects storage from heap
> and incidentally set what was the old
> reference count to some non zero value.
>
> atomically increment the
> reference count pointed to by p'
>
> sees that old value of "reference count"
> was non zero. must be ok. right?
>
> Wrong. Not only has thread 1 misinterpreted the alleged reference count, it's
> corrupted some other object entirely.

You're correct. This would be wrong. But that's why you can't do this.
It's not even RC specific. You need to hold that mutex. Shared data requires mutexes.
There's no choice on this part. You have to hold that mutex.

> Unfortunately, having to acquire a read lock every time a thread wants to
> reference an object through a share global pointer can be quite expensive.

Yup.

> One such scheme was used in what was then VM/XA. The VM/XA kernel is
> non-preemptive. Its kernel threads run until they explicitly preempt by
> exiting to the dispatcher, a yield in effect.

Most interesting! But I don't think I want to maintain that code.

> And of course there are other schemes for dealing with this problem.
> And garbage collection obviates the problem entirely. And I think
> my explanation has just made gc look a whole lot more attractive. :-)

D'accord!

-Bil

Message has been deleted

Joe Seigh

unread,

Jun 24, 1998, 3:00:00 AM6/24/98

to

In article <35904C9D...@netscape.com>, Wan-Teh Chang <w...@netscape.com> writes:
|>
|>
|> Bil Lewis wrote:
|>
|> > Joe,

|> At the beginning of this example, both Thread 1 and Thread 2 areusing the object, so the
|> reference count is at least 2.
|>
|> After Thread 2 dereferences the object, the reference
|> count is at least 1 (accounting for Thread 1). The reference
|> count cannot be zero.
|>
|> So I think the example is wrong. Did I miss something?
|>
|> Wan-Teh
|>
The case we're concerned about is when thread 1 initially has no
references to the object. How does it safely increment the
reference count in that case?

--
Joe Seigh

Joe Seigh

unread,

Jun 24, 1998, 3:00:00 AM6/24/98

to

In article <6mou47$v...@espresso.cafe.net>, k...@cafe.net (Kaz Kylheku) writes:

|> In article <1998Jun2...@bose.com>, Joe Seigh <se...@bose.com> wrote:
|> >Using abstract pointers still has the same problem. It just adds an
|> >extra step of indirection. Once the actual pointer is fetched you still
|> >have the possible race condition. Unless of course you put the reference
|> >count in the abstract pointer itself and that abstract pointer is never
|> >deallocated.
|>

|> The abstract pointer is never deallocated (really, we shouldn't be talking
|> about deallocation of pointers, but deallocation of objects which are
|> pointed at).
|>
|> Conventional pointers retain their value after what they point to is
|> freed; however, that value becomes meaningless. In the C language,
|> *any* use of a pointer value that previously referred to storage
|> that has been freed leads to undefined behavior.
|>
|> In contrast, the ``abstract pointer'' (better referred to as a name) retains
|> its meaning even after the object it refers to has disappeared. Using that
|> name as an argument to a search operation does not create a hazard; the
|> operation safely reports that the object which used to have that name is gone.
|>
|> One potential problem that arises is related to the reuse of names.
|>
|> But enough about that, I would like to hear about other interesting
|> ideas!

Ok. If you had a powerpc you could safely increment the reference thread using
something like the following.

loop1: lwz rx, ptr # load global shared pointer value
# test for null not shown here
loop2: lwarx ry, 0, rx # load reserved on reference count
sync # memory barrier
addi ry, 1 # increment reference count
cmp rx, ptr # check if shared global pointer has changed
bne- loop1 # if changed, retry
stwcx. ry, 0, rx # store conditional of reference count
bne- loop2 # retry if reference count update failed

Any writer thread deleting or changing the shared global pointer would
first change the pointer and then decrement the reference count.

For those not familiar with load reserved and store conditional, load reserved
puts a "reserve" for the current processor on the storage location the load
occurred from. Store conditional fails if the "reserve" was lost, i.e. among
other things, if some other processor stored into that storage location in the
interval since the "reserve" was set.

The recheck of the global shared pointer guarantees that your reserve occurred
before the global shared pointer was changed and thus before the decrement of
the reference count. If the object was deallocated the increment of the
reference would either fail on the global pointer recheck or on the store
conditional.

--
Joe Seigh

Stephan Schaefer

unread,

Jun 25, 1998, 3:00:00 AM6/25/98

to

>Has anyone tried to implement that ? I tried it and got problems on a

>two-processor architecture. The problem seems to be that a decref() call
can

>delete an object while another instance (2nd processor!) starts to draw a

>copy of it. Of course, decref and incref are encapsulated by a

>CriticalSection Object (WIN32) in my implementation.
>

One week ago I posted this message and received lots of comments - so it
seems to be an interesting topic !

After all I think that the problem in terms of thread-safe 'automatic'
memory managment is probably best solved with a garbage collector. I dont
want to discuss garbage collectors vs. reference counting here, but once you
have a garbage collector it sems to be simpler to make it thread-safe than
to make a reference-counting scheme thread-safe (I stick to C++ here!).

Back to the topic of this newsgroup however, the core problem for me is,
that in C++ there is no way to lock an object completely by itself. How can
an object make sure that only one thread has access to it ? Even if each
member function starts with a call to lock the object and ends with a call
to unlock the object, the access to the object could be interrupted by
another thread just before calling the lock-function, right ? So what's
really missing here is something like java's 'synchronized' keyword.

The only solution seems to be a global instance that keeps track of
locking/unlocking your objects (as suggested by some of you).

Stephan

David Holmes

unread,

Jun 26, 1998, 3:00:00 AM6/26/98

to

Stephan Schaefer <scha...@graphics.cs.uni-bonn.de> wrote in article
<6mt1mn$r...@news.rhrz.uni-bonn.de>...

> Even if each
> member function starts with a call to lock the object and ends with a
call
> to unlock the object, the access to the object could be interrupted by
> another thread just before calling the lock-function, right ? So what's
> really missing here is something like java's 'synchronized' keyword.

Locking at the start of each method and unlocking at the end is exactly
what occurs in a Java synchronized method. The only difference is that it
is done implicitly in Java and explicitly in C++.

What is the problem having the thread interrupted before acquiring the
lock? Although it is inside the object it hasn't done anything that would
affect other threads accessing that object.

David

Sean Kelley

unread,

Jun 28, 1998, 3:00:00 AM6/28/98

to

I missed the beginning of this thread, so I'm not sure if a
technique such as the following has been discussed. Many years ago I wrote
the following code which is a templated class which overloads the "->"
operator. When a method on the underlying class is called through the "->"
operator, a temporary locking object is created and the underlying method
is invoked (via the semantics of operator->() ). When the method returns
the temporary object is destroyed, releasing the lock.

It seems to me that this could be utilized to make a "monitor" for
an entire class. You can't be selective in which methods to synchronize as
you can in Java.

The class depends on the correct destruction of temporary objects
(i.e. at the end of the enclosing expression.). This rules out using
compilers such as Sunpro 4.2. The gnu compilers work. I haven't tried
Visual C++ 5.0.

Any suggestions, comments would be welcome.

Sean.

----

#include <iostream.h>

template <class T,class L>
class WrapRef {
public:
WrapRef(T* data_,L* lock_);
WrapRef(const WrapRef<T,L>&);
~WrapRef();
const T* operator->() const;
T* operator->();
private:
T* data;
L* lck;

WrapRef<T,L> operator=(const WrapRef<T,L>&);
};

template <class T,class L>
class Wrap {
public:
Wrap(T* data_);
~Wrap();
WrapRef<T,L> operator->();
private:
L lck;
T *data;

Wrap<T,L>& operator=(const Wrap<T,L>&);
Wrap(const Wrap<T,L>&);
};

template <class T,class L>
inline WrapRef<T,L>::WrapRef(T* data_,L* lck_) : data(data_),lck(lck_)
{
lck->lock();
}

template <class T,class L>
inline WrapRef<T,L>::WrapRef(const WrapRef<T,L>& r) : data(r.data),lck(r.lck)
{
}

template <class T,class L>
inline WrapRef<T,L>::~WrapRef()
{
lck->unlock();
}

template <class T,class L>
inline const T* WrapRef<T,L>::operator->() const
{
return data;
}

template <class T,class L>
inline T* WrapRef<T,L>::operator->()
{
return data;
}

template <class T,class L>
inline Wrap<T,L>::Wrap(T *data_) : lck(),data(data_)
{
}

template <class T,class L>
inline Wrap<T,L>::~Wrap()
{
delete data;
}

template <class T,class L>
inline WrapRef<T,L> Wrap<T,L>::operator->()
{
return WrapRef<T,L>(data,&lck);
}

class Lock {
public:
Lock() { cout << "lock created" << endl; }
~Lock() { cout << "lock destroyed" << endl; }
virtual void lock() { cout << "locked" << endl; }
virtual void unlock() { cout << "unlocked" << endl; }
};

class Data {
public:
Data(int arg_) : arg(arg_) { cout << "data constructed" << endl; }
~Data() { cout << "data destroyed" << endl; }
void set(int val_) { arg = val_; cout << "set to " << arg << endl; }
int get() const { cout << "get " << arg << endl; return arg; }
private:
int arg;
};

int main()
{
Wrap<Data,Lock> wrappedData(new Data(34));
int result;

wrappedData->set(92);
result = wrappedData->get();
cout << "result = " << result << endl;
return 0;
}

--
Sean M. Kelley Decision-Science Applications
ske...@dsava.com 1110 N. Glebe Rd. Suite 400
ske...@dgsys.com Arlington, VA 22201
(703)243-2500 #include <std_disclaimer.h>

Benjamin Herrenschmidt

unread,

Jun 29, 1998, 3:00:00 AM6/29/98

to

Joe Seigh <se...@bose.com> wrote:

> The case we're concerned about is when thread 1 initially has no
> references to the object. How does it safely increment the
> reference count in that case?

I think that the main problem is that when dealing with objects that can
be referenced from multiple threads, you either need to clearly define
the "owner" thread that creates the first instance and that is
responsible for giving instances to other threads (via messages for
example) or you need to have some kind of "object repository" that is
maintaining references to objects. This repository uses a mutex to
protect it's lookup/creation mechanism (providing that you have a way to
uniquely identify the object you are looking for).

Anyway, this is not a simple problem that can be solved from within the
shared object itself. Experience showed me that the only reliable way is
to have a "third party" object or thread that is responsible for
maintaining the references for clients.

--
E-Mail: <mailto:bh...@calva.net>
BenH. Web : <http://calvaweb.calvacom.fr/bh40/>