Is this correct implementation of COM style Release:
void Release()
{
// no atomic if single owner release the object
if (m_ref == 1 || atomic_decrement(&m_ref) == 0) {
delete this;
}
}
Memory visibility is guaranteed by the synchronization used to acquire
reference to the object.
It's quite common to have single owner of ref counted object for which
the above might be effective optimization.
Rani
Yes, it's correct in general. If a thread sees that he is the only
owner of an object then he can delete the object w/o any further
"examination". However you still have to preserve correct ordering,
something along the lines of:
if (m_ref.load(std::memory_order_relaxed) == 1 || m_ref.fetch_add
(-1, std::memory_order_release) == 0)
{
std::atomic_thread_fence(std::memory_order_acquire);
delete this;
}
--
Dmitry V'jukov
This is similar to the share_ptr correspondence.
I don't think that there is case in which memory visibility is an
issue since coherency is guaranteed when the thread got the reference
(access protected using some synchronization that guarantees
visibility).
BTW - same reasoning seem to work for the first AddRef (for objects
that constructed from ref-count==0 for the favor of "sp = new T").
Rani
The thread which drops to zero needs to synchronize with the other threads
that previously decremented the counter:
________________________________________________________
struct object
{
word32 m_state1; // = 0
word32 m_state2; // = 0
uword32 m_count; // =2
void thread_a()
{
m_state1 = 666;
// release membar
if (m_count == 1 || atomic_faa(&m_count, -1) == 1)
{
// we need acquire membar here to sync with
// thread b's store to `m_state2'.
// acquire membar
assert(m_state2 == -666);
}
}
void thread_b()
{
m_state2 = -666;
// release membar
if (m_count == 1 || atomic_faa(&m_count, -1) == 1)
{
// we need acquire membar here to sync with
// thread a's store to `m_state1'.
// acquire membar
assert(m_state1 == 666);
}
}
};
________________________________________________________
This example can trip the assertions if you omit the acquire membar.
So there is write to m_state1 without any additional synchronization?
Seems like broken code for COM style ref-counting:
// use some synchronization to get reference
// use the object
pObject->Release();
// no m_state1 assumptions in destructor (final delete)
Rani
> So there is write to m_state1 without any additional synchronization?
Yeah. Why should it require synchronization right at the point of mutation
when only one thread writes to it? It's much better to defer the acquire
until after the drop-to-zero condition wrt a general purpose reference
counting implementation.
> Seems like broken code for COM style ref-counting:
> // use some synchronization to get reference
> // use the object
> pObject->Release();
> // no m_state1 assumptions in destructor (final delete)
It's not broken. The drop-to-zero condition should make mutations performed
by threads fully visible to the destructor in a general purpose reference
counting algorithm. This requires acquire semantics...
OK. I get your point. Many thanks for the clarification (and being
patient).
Here a use case that I commonly encounter:
class CMyData : public CRefObject // COM style AddRef/Release
{
public:
// implicit cleanup, call close if needed (last ref)
~CRefObject() { if (m_hTimer) Close(); }
// explicit cleanup (not dependent on ref-counting)
// only one caller allowed
void Close()
{
// wait and delete m_hTimer
m_hTimer = NULL;
}
};
Rani
Only if *this is immutable and atomic_decrement() provides 'sink load'
barrier (part of release barrier).
For mutable *this you'd need something along the lines of
void Release() {
size_t ref = m_ref;
if (ref > 1) {
atomic_thread_fence(memory_order_release);
do {
if (atomic_compare_exchange_weak_explicit(&m_ref, &ref, ref - 1,
memory_order_relaxed, memory_order_relaxed)) return;
} while (ref > 1);
}
assert(ref == 1);
atomic_thread_fence(memory_order_acquire);
delete this;
}
or (the above is best for Power/Sparc-RMO, not quite so for Itanium)
void Release() {
size_t ref = m_ref;
while (ref > 1)
if (atomic_compare_exchange_weak_explicit(&m_ref, &ref, ref - 1,
memory_order_relaxed, memory_order_release)) return;
assert(ref == 1);
atomic_thread_fence(memory_order_acquire);
delete this;
}
regards,
alexander.
Thanks for the info.
I think that the conservative InterlockedDecrement handles all this
being a full barrier.
Can you also show implementation of AddRef?
I don't think that it requires any barrier and atomic increment is
sufficient.
Thanks,
Rani
Yep but it is unnecessary heavy. InterlockedDecrementIfGreaterThanOne()
with release on success and acquire on failure would do it much better.
>
> Can you also show implementation of AddRef?
> I don't think that it requires any barrier and atomic increment is
> sufficient.
Yup.
void AddRef() {
atomic_fetch_add_explicit(&m_ref, 1, memory_order_relaxed);
}
regards,
alexander.
Much better? what is the perf diff compared with full barrier?
>
> > Can you also show implementation of AddRef?
> > I don't think that it requires any barrier and atomic increment is
> > sufficient.
>
> Yup.
Thanks.
> void AddRef() {
> atomic_fetch_add_explicit(&m_ref, 1, memory_order_relaxed);
Is this the same as InterlockedIncreament on x86/x64 (running windows)
as the memory model is conservative and assure visibility regardless
of interlocked-ops? If not, is there a way cheaper than interlocked-op
to express your AddRef on x86/x64?
For some reason I suspect that my original works, by accident, on
windows (or otherwise tons of such code will break).
Thanks,
Rani
It depends on performance test case of course.
>
> >
> > > Can you also show implementation of AddRef?
> > > I don't think that it requires any barrier and atomic increment is
> > > sufficient.
> >
> > Yup.
>
> Thanks.
>
> > void AddRef() {
> > atomic_fetch_add_explicit(&m_ref, 1, memory_order_relaxed);
>
> Is this the same as InterlockedIncreament on x86/x64 (running windows)
> as the memory model is conservative and assure visibility regardless
> of interlocked-ops? If not, is there a way cheaper than interlocked-op
> to express your AddRef on x86/x64?
No way unless x86/x64 already introduced CHEAPLOCK prefix instruction or
something like that. ;-)
regards,
alexander.