> - VZOOM allows a thread to acquire basically any number persistent
> references to a data object without using any atomic operations
> and/or StoreLoad style memory barriers.
I copy it from here:
http://groups.google.ru/group/comp.programming.threads/browse_frm/thread/dc39b17aed1220c3/a525016336dbdbaa?lnk=gst&q=Chris+Thomasson+vZOOM+acquire&rnum=6&hl=ru#a525016336dbdbaa
The question is:
Do you mean that:
1. Thread can acquire references to a data object without using any
atomic operations and/or StoreLoad style memory barriers.
Or you mean:
2. Thread can acquire references to a data object without using any
atomic_op and/or StoreLoad membars _amortized_.
I.e. first and some other acquire operations of the data object
require atomic_op and/or StoreLoad membars and all other acquire
operations don't require.
Dmitriy V'jukov
It goes like this:
Lock-Based PDR uses 2, and the Lock-Free PDR w/ automatic epoch detection
uses 1.
> It goes like this:
>
> Lock-Based PDR uses 2, and the Lock-Free PDR w/ automatic epoch detection
> uses 1.
Is this assertion correct wrt Lock-Free PDR w/ automatic epoch
detection:
Reader thread can start, acquire reference to shared object, release
reference (really release reference, i.e. object can be deleted), and
finish. And no atomic ops and no StoreLoad membars will be executed.
Is this right?
Dmitriy V'jukov
PDR by definition will not delete objects until they are no longer
referenced. The PDR implementation takes into account the memory
model, compiler and platform dependencies.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
> PDR by definition will not delete objects until they are no longer
> referenced.
But PDR can do the opposite - not delete object even if object is not
referenced already, if some cost amortization scheme is used.
I want to know, is Chris Thomasson cheating saying that reader can
acquire objects w/o atomic_ops and store_load_membars and some kind of
powerful cost amortization scheme is used that can amortize around
acquires/releases/epochs/object... Or reader really can do "the full
cycle" and truly not issue such operations at all...
Dmitriy V'jukov
The acquire semantics are usually gotten with dependent loads. The
release semantics are amortized by the PDR scheme. You don't need
to do a store/load member before dropping the reference. The PDR
scheme makes sure that at least one was done by every thread before
deleting the object. The store/loads don't necessarily have to be
explicitly done by each thread. They can be implicit withcommon
thread activity.
There's no cheating. Just neat tricks.
> The acquire semantics are usually gotten with dependent loads. The
> release semantics are amortized by the PDR scheme. You don't need
> to do a store/load member before dropping the reference. The PDR
> scheme makes sure that at least one was done by every thread before
> deleting the object. The store/loads don't necessarily have to be
> explicitly done by each thread. They can be implicit withcommon
> thread activity.
>
> There's no cheating. Just neat tricks.
It's seems that I don't understand something...
Here is code from your atomic_ptr:
template<typename T>
atomic_ptr_ref<T>* atomic_ptr::getrefptr() {
ref<T> oldval, newval;
oldval.ecount = xxx.ecount;
oldval.ptr = xxx.ptr;
do {
newval.ecount = oldval.ecount + 1;
newval.ptr = oldval.ptr;
}
while (atomic_cas(&xxx, &oldval, &newval) == 0);
return atomic_load_depends(&oldval.ptr);
}
This is the "acquire" operation and here we have cas that must issue
store_load_membar...
And the same in Chris's pc_sample...
What I miss?
Dmitriy V'jukov
The CAS isn't required to have any store/load semantics. It's just used
to update the ephemeral count in the pointer.
Maybe it's not required, but does any real-world platform support a CAS
that doesn't imply (at least) a store-load barrier? Or, more to the point,
that doesn't imply the overhead that a store-load barrier would have?
--
David Hopwood <david....@industrial-designer.co.uk>
Yes because the reader thread in your question actually executes the
barriers we need in order to track the epoch(s). The thread really does not
need to do "anything explicitly" EXCEPT use memory barrier and interlocked
RMW operation free *distributed proxy reference counting (DPRC) or Joe
Seighs memory barrier free SMR algorithms acquire and release references to
shared data-structures.
The reason you need to use DPRC or SMR is because you need something to
track object lifetimes in a way that allows them to persist across multiple
successive epochs..
DPRC is different than SMR in that SMR relies on hazard pointers per-thread,
and DPRC uses arranged per-thread counter arrays. So, DPRC can manage
multiple objects without adding anything to the per-thread structure; SMR
cannot really do this as-is.
---
*: this is what I named the internal counting algorithm that vZOOM PDR, both
the lock-based and lock-free, is based on.
[...]
The automatic epoch detection makes the claims valid on all the compatible
platforms, which include a lot of the more "popular ones". Threads have to
go through various states in the scheduler, execute syscalls, user-to-kernel
transitions or whatever during their lifetimes. We can make use of "highly
non-portable" methods to organize these per-thread activities into an epoch
based polling environment, which can then be used to implement robust PDR.
You can even program signals in a way that implements this type of epoch
detection scheme...
You can get naked atomic-ops with the SPARC. You have to add the memory
barriers explicitly in the assembly code.
> Yes because the reader thread in your question actually executes the
> barriers we need in order to track the epoch(s). The thread really does
> not need to do "anything explicitly" EXCEPT use memory barrier and
> interlocked RMW operation free *distributed proxy reference counting
> (DPRC) or Joe Seighs memory barrier free SMR algorithms acquire and
> release references to shared data-structures.
A thread can acquire or release references to shared data-structures by
using memory barrier AND interlocked RMW free algorithms such as, DPRC, or
Joe Seighs SMR.
:^)
atomic_ptr_plus and pc_sample are not examples of an automated epoch
detection scheme. There just normal lock-free proxy collectors which is a
form of PDR. The term 'PDR' could also apply to atomically thread-safe
reference counted pointers such as atomic_ptr, or:
> You can even program signals in a way that implements this type of epoch
> detection scheme...
Humm... Actually the signal-based epoch detector is a fairly good option
when you can't poll the OS. The general idea is that any signal handler
which executes a membar can be identified as a quiescent state by a
dedicated polling entity. You use the polling entity to issue the signals,
and then poll all the state in order to detect any quiescent periods.
> The CAS isn't required to have any store/load semantics. It's just used
> to update the ephemeral count in the pointer.
My brains near the explosion...
I always think this way:
1. We increment counter (actually: load counter, increment, store
counter)
2. Load pointer to object
3. Load object itself
Yes, we need DependantLoad membar between 2 and 3.
And we also need StoreLoad membar between 1 and 2, because we don't
want reorder of counter store and pointer load...
And you say, that we need only DependantLoad membar here...
I try to explain this:
While we loading counter in point 1, we also load pointer to object
(you use 64-bit CAS in atomic_ptr)
So loading of object can't become earlier than increment of counter,
provided we use DependantLoad membar...
Point in loading counter and pointer to object in one atomic
operation. Right?
And what about implementation of Chris Thomasson pc_sample? There is
code like this:
pc_deferred_t* pc_inc( pc_t *_this )
{
/* inc master count */
return PC_SYS_GETPTR( atomic_xaddword( &_this->pc_anchor, 1 ) );
}
Here we use 32-bit atomic operation.
In one 32-bit word we store pointer to object and low bits contain
counter.
Hence strictly saying, here is no pointer loading, here is loading of
some value, that turn into pointer after some manipulations. Is
processor determine data dependency here?
Dmitriy V'jukov
> atomic_ptr_plus and pc_sample are not examples of an automated epoch
> detection scheme. There just normal lock-free proxy collectors which is a
> form of PDR. The term 'PDR' could also apply to atomically thread-safe
> reference counted pointers such as atomic_ptr, or:
> http://appcore.home.comcast.net/vzoom/refcount/
I see. I was thinking, that vZOOM is like atomic_ptr_plus and
pc_sample...
Now I understand, that vZOOM is something like quiescent-state-based
reclamation or epoch-based reclamation...
Dmitriy V'jukov
The dependent loads are data dependencies. The data might get moved
around a bit after the initial load. It's always safe to move the
dependent load "membar" onto one of the later loads.
Yup.