I need a fast mechanism to increment, decrement and test an integer value
for implementing a reference counting class in a multithreaded environment.
Using a mutex would be possible, but mutexes cause a too high overhead for a
count++ or count--.
Does anyone know a faster solution?
The 2 methods to synchronize look like this:
void _ref()
{
m_refCount++;
}
void _unref()
{
if (--m_refCount <= 0)
delete this;
}
Wolfgang
If, after considering the above arguments, it's still a problem,
there's probably some platform/system-specific mojo you can use
for this (some low-level primitive you can access in a [perhaps
very] platform-specific way).
HTH,
--ag
>
> Does anyone know a faster solution?
>
> The 2 methods to synchronize look like this:
>
> void _ref()
> {
> m_refCount++;
> }
>
> void _unref()
> {
> if (--m_refCount <= 0)
> delete this;
> }
>
> Wolfgang
--
Artie Gold, Austin, TX (finger the cs.utexas.edu account for more
info)
mailto:ag...@bga.com or mailto:ag...@cs.utexas.edu
--
Clone Bernie!
> I need a fast mechanism to increment, decrement and test an integer value
> for implementing a reference counting class in a multithreaded environment.
IMHO, even with some "fast and safe" mechanism to increment, decrement
and
test an integer value (e.g. compare_and_swap, etc) it will NOT result in
a correct solution on some platforms; you would have a problem with
respect
to destructors and memory visibility on multiprocessors with memory
reads
reordering:
> void _unref()
> {
> if (--m_refCount <= 0)
> delete this;
> }
w/o read memory barrier
void _unref()
{
if (--m_refCount <= 0) {
READ_MEMORY_BARRIER();
delete this;
}
}
destructor(s) could see/use garbage!
regards,
alexander.
Wolfgang Platzer wrote:
>
> Hi,
>
> I need a fast mechanism to increment, decrement and test an integer value
> for implementing a reference counting class in a multithreaded environment.
> Using a mutex would be possible, but mutexes cause a too high overhead for a
> count++ or count--.
>
(hmm..., my first posting seems to have gone nowhere. It will probably show
up when this one does).
If these reference counts are contained in the objects, how are you going
to guarantee that the object would always be there when you try to increment
the count. It's possible for it to get deallocated between the time you
got the address of the object and attempted to increment the count. This
is a simple race condition problem. Nothing to do with memory visibility
and reordering.
Joe Seigh
Wolfgang Platzer wrote:
>
> Hi,
>
> I need a fast mechanism to increment, decrement and test an integer value
> for implementing a reference counting class in a multithreaded environment.
> Using a mutex would be possible, but mutexes cause a too high overhead for a
> count++ or count--.
>
> Does anyone know a faster solution?
If the reference count is in the object, how do you ensure that the object
doesn't get deleted just as you are about in increment the reference count?
Joe Seigh
In many cases, reference-counting semantics will take care of the problem you
mention, because one normally only performs an AddRef on an object to which the
thread already has at least one counted reference. Under normal circumstances,
somebody will do something like this:
my_refcounted_obj_ptr->AddRef();
which means that the owner (function/thread) that performs the addref already has
one (counted) outstanding reference (the 'my_refcounted_obj_ptr'-pointer). As long
as the current thread 'owns' the reference, it can safely perform the addref,
knowing that no other thread can bring the reference count to zero.
I can think of two general scenarios where this does not hold :
1. Transferring the pointer to another thread, as in the next example:
...
my_refcounted_obj_ptr = CreateNewObjectWithRefCountIsOne();
StartAThreadThatUsesThisObject( my_refcounted_obj_ptr);
my_refcounted_obj_ptr->Release(); // DANGER, we don't know if the new thread has
//AddReffed yet!!!
...
This scenario is hard to prevent in a software-only manner, but can be dealt with
in design guidelines such as 'always AddRef on behalf of the receiving thread when
transferring refcounted pointers'. In this example, it would mean that the
'StartAThreadThatUsesThisObject'-function should AddRef the pointer before
starting the new thread. Please note that the problem here is not so much that
the Release is done at the same time as the AddRef, but more that the Release can
even be done _before_ the AddRef.
2. During construction of the object. Typically this can be solved by either
having a special creation function which creates the object and does one AddRef on
behalf of the client, the way CoCreateInstance(...) works.
In conclusion this means that when taking into account the circumstances described
above, the AddRef and Release functions may be called simultaneously, but there
will never be a delete of the object 'just before' the AddRef starts.
Kind Regards,
Danny Havenith.
[...]
> I can think of two general scenarios where this does not hold :
> 1. Transferring the pointer to another thread, as in the next example:
> ...
> my_refcounted_obj_ptr = CreateNewObjectWithRefCountIsOne();
> StartAThreadThatUsesThisObject( my_refcounted_obj_ptr);
> my_refcounted_obj_ptr->Release(); // DANGER, we don't know if the new thread has
> //AddReffed yet!!!
>...
>This scenario is hard to prevent in a software-only manner, but can be dealt with
>in design guidelines such as 'always AddRef on behalf of the receiving thread when
>transferring refcounted pointers'.
IMHO, this scenario is easy to prevent in a software-only manner; use
"smart"
pointers (instead of "raw" pointers). Explicit (public) AddRef/Release
is
dangerous..
thread_safe_smart_ptr< MyObject > my_obj_ptr = new MyObject();
StartAThreadThatUsesThisObject( my_obj_ptr );
also, I think that using compare_and_swap for ++ and -- with just
one READ_MEMORY_BARRIER -- in the case "--counter == 0" would result
in a correct solution with less overhead than mutex.lock/unlock.
unfortunately, neither compare_and_swap nor READ_MEMORY_BARRIER is
portable :(
regards,
alexander.
I'm writing all of this because one of the remarks in this thread (-of discussion)
seemed to suggest that reference counting is not suited for multi-threaded cases, while
I've experienced that it can be very useful, especially in multi-threaded designs.
Alexander Terekhov wrote:
> > my_refcounted_obj_ptr = CreateNewObjectWithRefCountIsOne();
> > StartAThreadThatUsesThisObject( my_refcounted_obj_ptr);
> > my_refcounted_obj_ptr->Release(); // DANGER, we don't know if the new thread has
>
> I fully agree that explicit addref/release is not the way to do it. The point I was
> trying to make was the fact that a newly created thread should receive an object that
> has already been addreffed on it's behalf, because otherwise the function that starts
> the thread cannot be sure that the addref has been done. For illustration, I used the
> explicit release.
yup. however, I think that threads should not
"receive objects", instead smart pointers should
be used to track all references to objects. If
multiple threads can access some refcounted object
then they have private copies of smart
pointer and/or they just operate on one or more
shared copies of smart pointer (which could be
a part of some other shared object(s)); threads
do not need to be counted -- copies of smart
pointer do.
> I'm writing all of this because one of the remarks in this thread (-of discussion)
> seemed to suggest that reference counting is not suited for multi-threaded cases, while
> I've experienced that it can be very useful, especially in multi-threaded designs.
agree. the point I was trying to make was that
atomicity is just one of two issues with respect
to correctness of ref. counting in the MT
environment. Visibility of eventual updates for
proper destruction is another issue. Currently
only lock/unlock could fulfill both requirements
in a portable way. However, I believe that it
could be done with less overhead and still be
portable because:
a) read-only refcounted objects do not have a
problem with respect to visibility/destruction.
Simple compare_and_swap or similar would work
fine with them (and that is already part of
portable locks/spinlocks).
b) IMHO, it is probably possible to define a
portable interface for read-write refcounted
objects: e.g:
int pthread_rwo_refcounter_inc( pthread_rwo_refcounter_t* );
int pthread_rwo_refcounter_dec( pthread_rwo_refcounter_t* );
// pthread_rwo_refcounter_t Obj::refCount
void _ref()
{
pthread_rwo_refcounter_inc( &pObj->refCount );
}
void _unref()
{
if ( PTHREAD_REFCOUNTER_NULL ==
pthread_rwo_refcounter_dec( &pObj->refCount ) )
delete pObj;
}
so that in the scenario:
Thread A:
lock( mutex );
pObj->update( ... ); [U]
unlock( mutex ); // [0]
.
.
.
/* implicit call */
_unref(); // t1 [1]
---
Thread B:
.
.
.
/* implicit call */
_unref(); // t2 > t1 [2] delete -> ~Obj();
.
.
.
it would be guaranteed that
a) write/decrement in [1] will become visible to other
threads (B) after or at the same time as the preceding
updates [U] in that thread (A) - write/decrement [1]
should not be reordered across WRITE_MEMORY_BARRIER
built into unlock() [0].
b) reads (in ~Obj()) [2] should not be reordered
across _unref() -- READ_MEMORY_BARRIER should
precede the "delete".
or am I missing something or presuming something
which could not be done in a portable way with
less overhead than lock/unlock??
regards,
alexander.
The question is based on the fact that the design/implementation is not
suited to your use case. To handle your posit, there are a number of
ways to insure that the objects being pointed to only get deleted when
it is "safe".
First, a quick and dirty way, is to use a guard in all methods where the
"hole" exists, in other words escalate the lock further out than an
instruction boundary. This would also require overloading the delete
method on a class by class basis.
Second, a more elegant way, is that any object that the reference
reaches 0 could be put in a generation heap (mark and sweep), and
deleted after a reasonable amount of time/events/whatever have expired.
Studies of object birth/activity/death show that it is a fairly tight
temporal grouping and make this approach generally robust.
--
Frank V. Castellucci
http://corelinux.sourceforge.net
OOA/OOD/C++ Standards and Guidelines for Linux
I worked on a patent that did this kind of thing in the mid 80's at IBM.
It was more than just robust, though. IBM customers tended to get upset
if they didn't get 100% system up time.
Joe Seigh