Hi,
Has anyone tried to implement that ? I tried it and got problems on a
two-processor architecture. The problem seems to be that a decref() call can
delete an object while another instance (2nd processor!) starts to draw a
copy of it. Of course, decref and incref are encapsulated by a
CriticalSection Object (WIN32) in my implementation.
Stephan
Actually, using InterlockedIncrement and InterlockedDecrement is
the standard Win32 technique for thread-safe reference counting.
--
/George V. Reilly mailto:g...@halcyon.com
ATL, Vim (Vi IMproved), Active Server Pages: http://www.halcyon.com/gvr/
Co-author, "Beginning ATL COM Programming", Wrox Press, 1998
InterlockedIncrement and InterlockedDecrement aren't safe if the
reference count is kept in the data object proper. It's possible
that between the time that a pointer to an object is fetched and
the increment of the reference count within that object, for that
object to have been freed and reallocated from that heap, in which
case you've just corrupted another data object.
You need to have an additional mechanism to insure that the
increment operation can be performed safely. This might be a
global lock or some other proprietary solution.
--
Joe Seigh
If the ref count is 1, and you drop that pointer... you'll
decrement the count to 0 and free the object. Is it possible
for another thread to be adding a pointer to that object at
exactly that time?
No.
No other thread has a pointer to it, because the ref count is
1! (The pointers are mutable, shared data, hence they do need
to be protected by a mutex. Thus only one thread at a time can
use a given pointer.)
(Am I missing anything here? Don't think so....)
Personally, I'm not going to be using ref counting anytime soon.
I'm a garbage collector kinda guy. For all but a small set of cases
(ie the kind of stuff I work on) GCs are faster, simpler, better.
Geodesic systems has an MT GC for C, C++ even. By replacing all that
complicated goo for keeping track of active objects, X11R6 runs ~25%
faster with a GC.
GC good! RC bad! GC good! RC bad! 4 legs good! 2 legs bad!
-Bil
Joe Seigh wrote:
> InterlockedIncrement and InterlockedDecrement aren't safe if the
> reference count is kept in the data object proper. It's possible
> that between the time that a pointer to an object is fetched and
> the increment of the reference count within that object, for that
> object to have been freed and reallocated from that heap, in which
> case you've just corrupted another data object.
>
> You need to have an additional mechanism to insure that the
> increment operation can be performed safely. This might be a
> global lock or some other proprietary solution.
>
> --
> Joe Seigh
--
================
B...@LambdaCS.com
http://www.LambdaCS.com
Lambda Computer Science
555 Bryant St. #194
Palo Alto, CA,
94301
Phone/FAX: (650) 328-8952
> Personally, I'm not going to be using ref counting anytime soon.
> I'm a garbage collector kinda guy. For all but a small set of cases
> (ie the kind of stuff I work on) GCs are faster, simpler, better.
OK, I'll bite :-). Personally, I'm not going to be using garbage
collectors anytime soon. I'm an explicit deallocation kinda guy. For
all cases and implementations I have tested so far, explicit
dealloction is faster and considerably more memory-efficient than
garbage collection. See:
http://www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html
[Note: I'm not saying it's always better, and I certainly admit it
isn't simpler. I'm also not advocating reference counting.]
> Geodesic systems has an MT GC for C, C++ even. By replacing all that
> complicated goo for keeping track of active objects, X11R6 runs ~25%
> faster with a GC.
This I would like to understand in more detail: For all X servers and
X applications (meaning: lots of interaction) I've ever examined, the
time spent in allocation functions (malloc, free, realloc, etc.) was
much less than 3% (usually even less than 0.5%) of the application run
time.
So how could I expect to gain 25% in speed just by switching to a
garbage collector ? Or do you mean there is a version of X11R6 that
is specially tuned for a GC and runs 25% faster than code derived from
the sample implementation ?
> GC good! RC bad! GC good! RC bad! 4 legs good! 2 legs bad!
GC may be a good thing, but (for me, at least) it comes with a much
too heavy price tag.
Regards,
Wolfram.
Using abstract pointers still has the same problem. It just adds an
extra step of indirection. Once the actual pointer is fetched you still
have the possible race condition. Unless of course you put the reference
count in the abstract pointer itself and that abstract pointer is never
deallocated.
--
Joe Seigh
Ok, here's a picture of the possible race condition. Thread 1 wants
to reference an object and thread 2 wants to dereference the same object.
Thread x is any thread beside thread 1 that is allocating a new object.
Thread 1 Thread 2
fetch contents of pointer p
into local storage, p'
atomically decrement the reference count
set p to null
reference count is zero, deallocate object
Thread x
allocate old objects storage from heap
and incidentally set what was the old
reference count to some non zero value.
atomically increment the
reference count pointed to by p'
sees that old value of "reference count"
was non zero. must be ok. right?
Wrong. Not only has thread 1 misinterpreted the alleged reference count, it's
corrupted some other object entirely.
Note. We've not made it clear whether we are distinguishing between thread
references and references by shared global pointers. The original posting
didn't make that distinction and we've maintained its lack. But the basic
principle holds none the less.
At this point, I think we need to restate the problem to clarify the issues
involved. There are really two levels of synchronization involved. One is
the possibly simultaneous updating of the reference counts. The interlocked
updates take care of that problem but they don't take care of the other problem,
the possible race condition.
That other problem is better known as the readers/writers problem. Here the
reader threads are those threads adjusting the reference counts to account
for thread references, and the writer threads are those threads adjusting the
reference counts to account for global shared references.
Using that scheme, a "reader" thread can always safely reference an object
because that objects reference count will always be at least 1 for the
shared global reference to it. And that global reference can't be deleted
by a "writer" thread while "reader" threads have the readers/writers "lock".
Unfortunately, having to acquire a read lock every time a thread wants to
reference an object through a share global pointer can be quite expensive.
So other schemes have be developed to deal with this.
One such scheme was used in what was then VM/XA. The VM/XA kernel is
non-preemptive. Its kernel threads run until they explicitly preempt by
exiting to the dispatcher, a yield in effect.
A scheme was developed so that a "writer" thread that wanted to delete
an object, first deleted all global references to that object, and
then delayed deallocating that object until all processors (running
threads) had exited to the dispatcher at least once. "reader" threads
wanting to safely reference an object under this scheme, had to do it
without a loss of control, i.e. exiting to the dispatcher. Since the
object wouldn't be deallocated and possibly reused if they didn't lose
control, the access of that object was safe.
One of the typical uses of this scheme was to put a spin lock
within an object. The spin locks acted like reference counts in this
case since they prevented the object from being deallocated while it
was being used. The delay scheme allowed the spin lock acquire to safely
run even if the object was about to be deallocated. To deallocate the
object, a thread got the spin lock, deleted the pointer to the object,
set the lock state as "destroyed", and scheduled the deallocation of
the object. Subsequent attempts to get that lock by other threads
still referencing that object failed (lock destroyed) and re-fetched
the global pointer to get a new object or null.
This explanation is probably a bit brief and sketchy, but it serves to
illustrate the fact the race condition being discussed here is real,
was recognized as a problem, had significant performance implications,
and dealt with in a real life commercial multi-processing operating system.
I should point out also that this particular scheme is patented by IBM
and if you want to use it you have to license it from IBM. The patent
only covers the particular delay mechanism not the idea of delaying
a safe interval in general. There are other ways of doing a delay but
not as good, IMO.
And of course there are other schemes for dealing with this problem.
And garbage collection obviates the problem entirely. And I think
my explanation has just made gc look a whole lot more attractive. :-)
--
Joe Seigh
You're correct. This would be wrong. But that's why you can't do this.
It's not even RC specific. You need to hold that mutex. Shared data requires mutexes.
There's no choice on this part. You have to hold that mutex.
> Unfortunately, having to acquire a read lock every time a thread wants to
> reference an object through a share global pointer can be quite expensive.
Yup.
> One such scheme was used in what was then VM/XA. The VM/XA kernel is
> non-preemptive. Its kernel threads run until they explicitly preempt by
> exiting to the dispatcher, a yield in effect.
Most interesting! But I don't think I want to maintain that code.
> And of course there are other schemes for dealing with this problem.
> And garbage collection obviates the problem entirely. And I think
> my explanation has just made gc look a whole lot more attractive. :-)
D'accord!
-Bil
--
Joe Seigh
Ok. If you had a powerpc you could safely increment the reference thread using
something like the following.
loop1: lwz rx, ptr # load global shared pointer value
# test for null not shown here
loop2: lwarx ry, 0, rx # load reserved on reference count
sync # memory barrier
addi ry, 1 # increment reference count
cmp rx, ptr # check if shared global pointer has changed
bne- loop1 # if changed, retry
stwcx. ry, 0, rx # store conditional of reference count
bne- loop2 # retry if reference count update failed
Any writer thread deleting or changing the shared global pointer would
first change the pointer and then decrement the reference count.
For those not familiar with load reserved and store conditional, load reserved
puts a "reserve" for the current processor on the storage location the load
occurred from. Store conditional fails if the "reserve" was lost, i.e. among
other things, if some other processor stored into that storage location in the
interval since the "reserve" was set.
The recheck of the global shared pointer guarantees that your reserve occurred
before the global shared pointer was changed and thus before the decrement of
the reference count. If the object was deallocated the increment of the
reference would either fail on the global pointer recheck or on the store
conditional.
--
Joe Seigh
One week ago I posted this message and received lots of comments - so it
seems to be an interesting topic !
After all I think that the problem in terms of thread-safe 'automatic'
memory managment is probably best solved with a garbage collector. I dont
want to discuss garbage collectors vs. reference counting here, but once you
have a garbage collector it sems to be simpler to make it thread-safe than
to make a reference-counting scheme thread-safe (I stick to C++ here!).
Back to the topic of this newsgroup however, the core problem for me is,
that in C++ there is no way to lock an object completely by itself. How can
an object make sure that only one thread has access to it ? Even if each
member function starts with a call to lock the object and ends with a call
to unlock the object, the access to the object could be interrupted by
another thread just before calling the lock-function, right ? So what's
really missing here is something like java's 'synchronized' keyword.
The only solution seems to be a global instance that keeps track of
locking/unlocking your objects (as suggested by some of you).
Stephan
Locking at the start of each method and unlocking at the end is exactly
what occurs in a Java synchronized method. The only difference is that it
is done implicitly in Java and explicitly in C++.
What is the problem having the thread interrupted before acquiring the
lock? Although it is inside the object it hasn't done anything that would
affect other threads accessing that object.
David
I missed the beginning of this thread, so I'm not sure if a
technique such as the following has been discussed. Many years ago I wrote
the following code which is a templated class which overloads the "->"
operator. When a method on the underlying class is called through the "->"
operator, a temporary locking object is created and the underlying method
is invoked (via the semantics of operator->() ). When the method returns
the temporary object is destroyed, releasing the lock.
It seems to me that this could be utilized to make a "monitor" for
an entire class. You can't be selective in which methods to synchronize as
you can in Java.
The class depends on the correct destruction of temporary objects
(i.e. at the end of the enclosing expression.). This rules out using
compilers such as Sunpro 4.2. The gnu compilers work. I haven't tried
Visual C++ 5.0.
Any suggestions, comments would be welcome.
Sean.
----
#include <iostream.h>
template <class T,class L>
class WrapRef {
public:
WrapRef(T* data_,L* lock_);
WrapRef(const WrapRef<T,L>&);
~WrapRef();
const T* operator->() const;
T* operator->();
private:
T* data;
L* lck;
WrapRef<T,L> operator=(const WrapRef<T,L>&);
};
template <class T,class L>
class Wrap {
public:
Wrap(T* data_);
~Wrap();
WrapRef<T,L> operator->();
private:
L lck;
T *data;
Wrap<T,L>& operator=(const Wrap<T,L>&);
Wrap(const Wrap<T,L>&);
};
template <class T,class L>
inline WrapRef<T,L>::WrapRef(T* data_,L* lck_) : data(data_),lck(lck_)
{
lck->lock();
}
template <class T,class L>
inline WrapRef<T,L>::WrapRef(const WrapRef<T,L>& r) : data(r.data),lck(r.lck)
{
}
template <class T,class L>
inline WrapRef<T,L>::~WrapRef()
{
lck->unlock();
}
template <class T,class L>
inline const T* WrapRef<T,L>::operator->() const
{
return data;
}
template <class T,class L>
inline T* WrapRef<T,L>::operator->()
{
return data;
}
template <class T,class L>
inline Wrap<T,L>::Wrap(T *data_) : lck(),data(data_)
{
}
template <class T,class L>
inline Wrap<T,L>::~Wrap()
{
delete data;
}
template <class T,class L>
inline WrapRef<T,L> Wrap<T,L>::operator->()
{
return WrapRef<T,L>(data,&lck);
}
class Lock {
public:
Lock() { cout << "lock created" << endl; }
~Lock() { cout << "lock destroyed" << endl; }
virtual void lock() { cout << "locked" << endl; }
virtual void unlock() { cout << "unlocked" << endl; }
};
class Data {
public:
Data(int arg_) : arg(arg_) { cout << "data constructed" << endl; }
~Data() { cout << "data destroyed" << endl; }
void set(int val_) { arg = val_; cout << "set to " << arg << endl; }
int get() const { cout << "get " << arg << endl; return arg; }
private:
int arg;
};
int main()
{
Wrap<Data,Lock> wrappedData(new Data(34));
int result;
wrappedData->set(92);
result = wrappedData->get();
cout << "result = " << result << endl;
return 0;
}
--
Sean M. Kelley Decision-Science Applications
ske...@dsava.com 1110 N. Glebe Rd. Suite 400
ske...@dgsys.com Arlington, VA 22201
(703)243-2500 #include <std_disclaimer.h>
> The case we're concerned about is when thread 1 initially has no
> references to the object. How does it safely increment the
> reference count in that case?
I think that the main problem is that when dealing with objects that can
be referenced from multiple threads, you either need to clearly define
the "owner" thread that creates the first instance and that is
responsible for giving instances to other threads (via messages for
example) or you need to have some kind of "object repository" that is
maintaining references to objects. This repository uses a mutex to
protect it's lookup/creation mechanism (providing that you have a way to
uniquely identify the object you are looking for).
Anyway, this is not a simple problem that can be solved from within the
shared object itself. Experience showed me that the only reliable way is
to have a "third party" object or thread that is responsible for
maintaining the references for clients.
--
E-Mail: <mailto:bh...@calva.net>
BenH. Web : <http://calvaweb.calvacom.fr/bh40/>