http://cpt.pastebin.com/f30d8ab51
http://groups.google.com/group/relacy/web
This proxy collector is an N = 2 based wait-free proxy garbage collector
where N stands for the number of collector objects used to perform
collection cycles. The overhead in user objects is a single pointer which
used to link it into a collector's deferment list. As for example usage,
here is the lock-free stack code I am using in C++0x:
_____________________________________________________________________
class stack
{
std::atomic<node*> m_head;
public:
stack()
: m_head(NULL)
{
}
public:
void push(node* n)
{
node* head = m_head.load(std::memory_order_relaxed);
do
{
n->m_next.store(head, std::memory_order_relaxed);
}
while (! m_head.compare_exchange_weak(
head,
n,
std::memory_order_release));
}
node* flush()
{
return m_head.exchange(NULL, std::memory_order_acquire);
}
node* pop()
{
node* head = m_head.load(std::memory_order_acquire);
node* xchg;
do
{
if (! head) return NULL;
xchg = head->m_next.load(std::memory_order_relaxed);
}
while (! m_head.compare_exchange_weak(
head,
xchg,
std::memory_order_acquire));
return head;
}
};
_____________________________________________________________________
Notice that I am not using DWCAS? It's simply not needed when you send nodes
through the proxy collector. Here is the Relacy test class which
demonstrates a simple usage case:
_____________________________________________________________________
#define ITERS 7
#define DEFER 4
#define THREADS 4
struct proxy_test
: rl::test_suite<proxy_test, THREADS>
{
typedef proxy<DEFER> proxy_type;
proxy_type g_proxy;
stack g_stack;
void thread(unsigned int tidx)
{
unsigned count = 0;
for (unsigned int i = 0; i < ITERS; ++i)
{
g_stack.push(new node);
if (! (i % 2))
{
proxy_type::collector& c = g_proxy.acquire();
g_proxy.collect(c, g_stack.pop());
g_proxy.release(c);
}
}
if (! (tidx % 2))
{
node* n = g_stack.flush();
proxy_type::collector& c = g_proxy.acquire();
while (n)
{
node* next = n->m_next.load(std::memory_order_acquire);
g_proxy.collect(c, n);
n = next;
}
g_proxy.release(c);
}
for (unsigned int i = count; i < ITERS; ++i)
{
proxy_type::collector& c = g_proxy.acquire();
g_proxy.collect(c, g_stack.pop());
g_proxy.release(c);
}
}
};
_____________________________________________________________________
Notice how I am surrounding access to the `stack::pop()' by acquiring and
releasing a `proxy_type::collector' object? This ensures that any nodes
within the stack will be fully accessible within the acquire/release pair.
You can defer popped nodes for subsequent destruction or reuse by calling
the `proxy_type::collect()' procedure within an acquire/release scope. By
deferring and only destroying or reuse nodes until they are quiescent, you
can solve ABA and memory reclamation in one shot. Everything just works. I
will describe the algorithm itself in greater detail in another post,
probably later on today...
Enjoy!
;^)
Don't you need memory_order_release there as well - to 'publish' the
new value of head == NULL?
Otherwise another thread could call flush and still see head != NULL,
even though it used acquire - it was not matched with release. (?)
> node* pop()
> {
> node* head = m_head.load(std::memory_order_acquire);
> node* xchg;
>
> do
> {
> if (! head) return NULL;
>
> xchg = head->m_next.load(std::memory_order_relaxed);
> }
>
> while (! m_head.compare_exchange_weak(
> head,
> xchg,
> std::memory_order_acquire));
>
Same here?
> return head;
> }};
>
> Don't you need memory_order_release there as well - to 'publish' the
> new value of head == NULL?
No. This value is properly published by the atomic RMW itself.
> Otherwise another thread could call flush and still see head != NULL,
> even though it used acquire - it was not matched with release. (?)
No. I would only need acquire/release semantics if I was swapping the head
of the list with a non-NULL pointer. The acquire/release semantics are about
visibility wrt user data and internal node state, not the anchor of list
itself. That is handled by atomic RMW.
> > node* pop()
> > {
> > node* head = m_head.load(std::memory_order_acquire);
> > node* xchg;
> >
> > do
> > {
> > if (! head) return NULL;
> >
> > xchg = head->m_next.load(std::memory_order_relaxed);
> > }
> >
> > while (! m_head.compare_exchange_weak(
> > head,
> > xchg,
> > std::memory_order_acquire));
> >
> Same here?
I don't think so... If the next pointer is non-NULL then I am just
"publishing" an item that was already inserted into the list with proper
release semantics.
> > return head;
> > }};
<sigh> Right. There's no other data to worry about reordering with.
I've been thinking about this stuff too much lately - you start to
second guess yourself!
>
> > Same here?
>
> I don't think so... If the next pointer is non-NULL then I am just
> "publishing" an item that was already inserted into the list with proper
> release semantics.
>
Of course.
[...]
Here is pseudo-code with some comments that might be easier to read:
< please excuse any typos! ;^) >
______________________________________________________________________
#define THRESHOLD 10
// represents a deferred object.
struct node
{
node* m_next;
};
void destroy_nodes(nodes*);
// manages 2 collector objects and coordinates the overall
// deferment process.
struct proxy
{
// defers objects for destruction.
struct collector
{
node* m_defer; // = NULL; the deferment list.
uword32 m_defer_count; // = 0; the deferment count.
uword32 m_ref_count; // = 0; the inner count.
~collector()
{
destroy_nodes(m_defer);
}
};
uword32 m_current; // = 0; current collector and outer count.
uword32 m_quiesce; // = 0; quiescence process binary semaphore.
node* m_defer; // = NULL; the main deferment backlink.
collector m_collect[2]; // the collector objects.
~proxy()
{
destroy_nodes(m_defer);
}
// acquires a reference to the current collector object.
collector& acquire()
{
// acquire reference to the current collector object.
return m_collect[FETCH_ADD(&m_current, 0x20U) & 0xFU];
}
// releases a reference to the previously acquired collector
// object `c'.
void release(collector& c)
{
// release reference to the previously acquired collector
// object.
uword32 ref_count = FETCH_SUB(&c.m_ref_count, 0x20U);
// check for drop-to-zero _and_ quiescence cycle
// completion conditions.
if ((ref_count & 0xFFFFFFF0U) == 0x30U)
{
// we are in a completed quiescence cycle!
// get nodes from previous cycle.
node* n = m_defer;
// transfer nodes from current cycle.
m_defer = c.m_defer;
// reset the collector.
c.m_defer = NULL;
c.m_defer_count = 0;
c.m_ref_count = 0;
// allow another thread to start a cycle.
STORE(&m_quiesce, 0);
// destroy nodes from previous cycle.
destroy_nodes(n);
// That's all folks! ;^)
}
}
// collects nodes into the collector object `c'.
void collect(collector& c, node* n)
{
// link node into the collectors deferment list.
n->m_next = SWAP(&c.m_defer, n);
// bump the count of deferred nodes and check for
// threshold limit.
if ((FETCH_ADD(&c.m_defer_count) + 1) >= THRESHOLD)
{
// try to acquire the quiescence process bin-sema.
if (! SWAP(&m_quiesce, 1))
{
// Select the next collector.
uword32 old = LOAD(&m_current);
old = (old & 0x1U) ? 0 : 0x1U;
// Swap the outer counter with a zero count, install
// next collector index and get previous collector.
old = SWAP(&m_current, old);
collector& oc = m_collect[old & 0xFU];
// Decode the reference count of the previous outer
// counter and decrement the old collectors inner count.
FETCH_ADD(&c.m_ref_count, (old & 0xFFFFFFF0U) + 0x10U);
}
}
}
};
______________________________________________________________________
Here is the pseudo-code in pastebin just in case you're newsreader mangles
it:
http://cpt.pastebin.com/f17f8a052
IMHO, it's a very simple and straightforward proxy collector algorithm; any
questions?
;^)
> // collects nodes into the collector object `c'.
> void collect(collector& c, node* n)
> {
>
> // link node into the collectors deferment list.
> n->m_next = SWAP(&c.m_defer, n);
>
>
> // bump the count of deferred nodes and check for
> // threshold limit.
> if ((FETCH_ADD(&c.m_defer_count) + 1) >= THRESHOLD)
> {
>
> // try to acquire the quiescence process bin-sema.
> if (! SWAP(&m_quiesce, 1))
> {
>
> // Select the next collector.
> uword32 old = LOAD(&m_current);
> old = (old & 0x1U) ? 0 : 0x1U;
>
>
> // Swap the outer counter with a zero count, install
> // next collector index and get previous collector.
> old = SWAP(&m_current, old);
> collector& oc = m_collect[old & 0xFU];
>
>
> // Decode the reference count of the previous outer
> // counter and decrement the old collectors inner count.
> FETCH_ADD(&c.m_ref_count, (old & 0xFFFFFFF0U) + 0x10U);
Actually, I can strip some unnecessary/redundant logic out of this
`collect()' procedure because the collector `oc' shall be the same collector
as `c'. assert(&oc == &c) should always pass. Here is another test in Relacy
which shows how reader threads can traverse nodes within a lock-free stack:
http://cpt.pastebin.com/f537f2c7c
This codes has a simpler `prv_quiesce_begin()' procedure with the assertion
that I mentioned in place.
BTW, this code relies on a patched version of Relacy. There is a little bug
within Relacy 2.0 that can trigger non-repeatable false data-race errors in
long running tests which take many iterations to finally pass. I will ask
Dmitriy if I can include the patch he so very kindly provided me with along
with my further posts of long running tests until he uploads it up to his
website:
http://groups.google.com/group/relacy/web
The patched files simply correct the issue for me.
Thanks Dmitriy!
:^)
Here it is:
http://groups.google.com/group/relacy/browse_frm/thread/a253bb6b22df8b94
--
Dmitriy V'jukov
http://cpt.pastebin.com/f71480694
is another version of the proxy gc algorithm. This one allows for any thread
to start a quiescence process without having to acquire a collector object
first. It also allows for a thread to check to see if it's previously
acquired collector object has a quiescence process in progress; here is more
context:
http://article.gmane.org/gmane.comp.lib.boost.devel/198747
(read all...)