a new wait-free proxy gc algorithm...

Chris M. Thomasson

unread,

Jan 10, 2010, 1:54:31 PM1/10/10

to

Here is a model of the algorithm and an example usage case (e.g., solving
ABA and memory reclamation in the classic lock-free stack algorithm) in
Relacy 2.0:

http://cpt.pastebin.com/f30d8ab51

http://groups.google.com/group/relacy/web

This proxy collector is an N = 2 based wait-free proxy garbage collector
where N stands for the number of collector objects used to perform
collection cycles. The overhead in user objects is a single pointer which
used to link it into a collector's deferment list. As for example usage,
here is the lock-free stack code I am using in C++0x:
_____________________________________________________________________
class stack
{
std::atomic<node*> m_head;

public:
stack()
: m_head(NULL)
{

}

public:
void push(node* n)
{
node* head = m_head.load(std::memory_order_relaxed);

do
{
n->m_next.store(head, std::memory_order_relaxed);
}

while (! m_head.compare_exchange_weak(
head,
n,
std::memory_order_release));
}

node* flush()
{
return m_head.exchange(NULL, std::memory_order_acquire);
}

node* pop()
{
node* head = m_head.load(std::memory_order_acquire);
node* xchg;

do
{
if (! head) return NULL;

xchg = head->m_next.load(std::memory_order_relaxed);
}

while (! m_head.compare_exchange_weak(
head,
xchg,
std::memory_order_acquire));

return head;
}
};
_____________________________________________________________________

Notice that I am not using DWCAS? It's simply not needed when you send nodes
through the proxy collector. Here is the Relacy test class which
demonstrates a simple usage case:
_____________________________________________________________________
#define ITERS 7
#define DEFER 4
#define THREADS 4

struct proxy_test
: rl::test_suite<proxy_test, THREADS>
{
typedef proxy<DEFER> proxy_type;

proxy_type g_proxy;
stack g_stack;

void thread(unsigned int tidx)
{
unsigned count = 0;

for (unsigned int i = 0; i < ITERS; ++i)
{
g_stack.push(new node);

if (! (i % 2))
{
proxy_type::collector& c = g_proxy.acquire();
g_proxy.collect(c, g_stack.pop());
g_proxy.release(c);
}
}

if (! (tidx % 2))
{
node* n = g_stack.flush();
proxy_type::collector& c = g_proxy.acquire();

while (n)
{
node* next = n->m_next.load(std::memory_order_acquire);
g_proxy.collect(c, n);
n = next;
}

g_proxy.release(c);
}

for (unsigned int i = count; i < ITERS; ++i)
{
proxy_type::collector& c = g_proxy.acquire();
g_proxy.collect(c, g_stack.pop());
g_proxy.release(c);
}
}
};
_____________________________________________________________________

Notice how I am surrounding access to the `stack::pop()' by acquiring and
releasing a `proxy_type::collector' object? This ensures that any nodes
within the stack will be fully accessible within the acquire/release pair.
You can defer popped nodes for subsequent destruction or reuse by calling
the `proxy_type::collect()' procedure within an acquire/release scope. By
deferring and only destroying or reuse nodes until they are quiescent, you
can solve ABA and memory reclamation in one shot. Everything just works. I
will describe the algorithm itself in greater detail in another post,
probably later on today...

Enjoy!

;^)

frege

unread,

Jan 10, 2010, 11:00:31 PM1/10/10

to

On Jan 10, 1:54 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> node* flush()
> {
> return m_head.exchange(NULL, std::memory_order_acquire);
> }
>

Don't you need memory_order_release there as well - to 'publish' the
new value of head == NULL?
Otherwise another thread could call flush and still see head != NULL,
even though it used acquire - it was not matched with release. (?)

> node* pop()
> {
> node* head = m_head.load(std::memory_order_acquire);
> node* xchg;
>
> do
> {
> if (! head) return NULL;
>
> xchg = head->m_next.load(std::memory_order_relaxed);
> }
>
> while (! m_head.compare_exchange_weak(
> head,
> xchg,
> std::memory_order_acquire));
>

Same here?

> return head;
> }};
>

Chris M. Thomasson

unread,

Jan 11, 2010, 3:30:41 AM1/11/10

to

"frege" <gottlo...@gmail.com> wrote in message
news:37d6406c-d3ed-488b...@m16g2000yqc.googlegroups.com...

On Jan 10, 1:54 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > node* flush()
> > {
> > return m_head.exchange(NULL, std::memory_order_acquire);
> > }
> >

> Don't you need memory_order_release there as well - to 'publish' the
> new value of head == NULL?

No. This value is properly published by the atomic RMW itself.

> Otherwise another thread could call flush and still see head != NULL,
> even though it used acquire - it was not matched with release. (?)

No. I would only need acquire/release semantics if I was swapping the head
of the list with a non-NULL pointer. The acquire/release semantics are about
visibility wrt user data and internal node state, not the anchor of list
itself. That is handled by atomic RMW.

> > node* pop()
> > {
> > node* head = m_head.load(std::memory_order_acquire);
> > node* xchg;
> >
> > do
> > {
> > if (! head) return NULL;
> >
> > xchg = head->m_next.load(std::memory_order_relaxed);
> > }
> >
> > while (! m_head.compare_exchange_weak(
> > head,
> > xchg,
> > std::memory_order_acquire));
> >

> Same here?

I don't think so... If the next pointer is non-NULL then I am just
"publishing" an item that was already inserted into the list with proper
release semantics.

> > return head;
> > }};

frege

unread,

Jan 11, 2010, 12:56:00 PM1/11/10

to

On Jan 11, 3:30 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "frege" <gottlobfr...@gmail.com> wrote in message

>
> news:37d6406c-d3ed-488b...@m16g2000yqc.googlegroups.com...
> On Jan 10, 1:54 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
> > > node* flush()
> > > {
> > > return m_head.exchange(NULL, std::memory_order_acquire);
> > > }
>
> > Don't you need memory_order_release there as well - to 'publish' the
> > new value of head == NULL?
>
> No. This value is properly published by the atomic RMW itself.
>
> > Otherwise another thread could call flush and still see head != NULL,
> > even though it used acquire - it was not matched with release. (?)
>
> No. I would only need acquire/release semantics if I was swapping the head
> of the list with a non-NULL pointer. The acquire/release semantics are about
> visibility wrt user data and internal node state, not the anchor of list
> itself. That is handled by atomic RMW.
>

<sigh> Right. There's no other data to worry about reordering with.
I've been thinking about this stuff too much lately - you start to
second guess yourself!

>
> > Same here?
>
> I don't think so... If the next pointer is non-NULL then I am just
> "publishing" an item that was already inserted into the list with proper
> release semantics.
>

Of course.

Chris M. Thomasson

unread,

Jan 14, 2010, 8:20:52 PM1/14/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:blp2n.1852$Mv3....@newsfe05.iad...

> Here is a model of the algorithm and an example usage case (e.g., solving
> ABA and memory reclamation in the classic lock-free stack algorithm) in
> Relacy 2.0:
>
>
> http://cpt.pastebin.com/f30d8ab51

[...]

Here is pseudo-code with some comments that might be easier to read:

< please excuse any typos! ;^) >
______________________________________________________________________
#define THRESHOLD 10

// represents a deferred object.
struct node
{
node* m_next;
};

void destroy_nodes(nodes*);

// manages 2 collector objects and coordinates the overall
// deferment process.
struct proxy
{

// defers objects for destruction.
struct collector
{
node* m_defer; // = NULL; the deferment list.
uword32 m_defer_count; // = 0; the deferment count.
uword32 m_ref_count; // = 0; the inner count.

~collector()
{
destroy_nodes(m_defer);
}
};

uword32 m_current; // = 0; current collector and outer count.
uword32 m_quiesce; // = 0; quiescence process binary semaphore.
node* m_defer; // = NULL; the main deferment backlink.
collector m_collect[2]; // the collector objects.

~proxy()
{
destroy_nodes(m_defer);
}

// acquires a reference to the current collector object.
collector& acquire()
{

// acquire reference to the current collector object.
return m_collect[FETCH_ADD(&m_current, 0x20U) & 0xFU];
}

// releases a reference to the previously acquired collector
// object `c'.
void release(collector& c)
{

// release reference to the previously acquired collector
// object.
uword32 ref_count = FETCH_SUB(&c.m_ref_count, 0x20U);

// check for drop-to-zero _and_ quiescence cycle
// completion conditions.
if ((ref_count & 0xFFFFFFF0U) == 0x30U)
{

// we are in a completed quiescence cycle!

// get nodes from previous cycle.
node* n = m_defer;

// transfer nodes from current cycle.
m_defer = c.m_defer;

// reset the collector.
c.m_defer = NULL;
c.m_defer_count = 0;
c.m_ref_count = 0;

// allow another thread to start a cycle.
STORE(&m_quiesce, 0);

// destroy nodes from previous cycle.
destroy_nodes(n);

// That's all folks! ;^)
}
}

// collects nodes into the collector object `c'.
void collect(collector& c, node* n)
{

// link node into the collectors deferment list.
n->m_next = SWAP(&c.m_defer, n);

// bump the count of deferred nodes and check for
// threshold limit.
if ((FETCH_ADD(&c.m_defer_count) + 1) >= THRESHOLD)
{

// try to acquire the quiescence process bin-sema.
if (! SWAP(&m_quiesce, 1))
{

// Select the next collector.
uword32 old = LOAD(&m_current);
old = (old & 0x1U) ? 0 : 0x1U;

// Swap the outer counter with a zero count, install
// next collector index and get previous collector.
old = SWAP(&m_current, old);
collector& oc = m_collect[old & 0xFU];

// Decode the reference count of the previous outer
// counter and decrement the old collectors inner count.
FETCH_ADD(&c.m_ref_count, (old & 0xFFFFFFF0U) + 0x10U);
}
}
}
};
______________________________________________________________________

Here is the pseudo-code in pastebin just in case you're newsreader mangles
it:

http://cpt.pastebin.com/f17f8a052

IMHO, it's a very simple and straightforward proxy collector algorithm; any
questions?

;^)

Chris M. Thomasson

unread,

Jan 14, 2010, 11:34:57 PM1/14/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:anP3n.28182$0U1....@newsfe16.iad...

> "Chris M. Thomasson" <n...@spam.invalid> wrote in message
> news:blp2n.1852$Mv3....@newsfe05.iad...
>> Here is a model of the algorithm and an example usage case (e.g., solving
>> ABA and memory reclamation in the classic lock-free stack algorithm) in
>> Relacy 2.0:
>>
>>
>> http://cpt.pastebin.com/f30d8ab51
>
> [...]
>
> Here is pseudo-code with some comments that might be easier to read:
>
>
>
>
> < please excuse any typos! ;^) >
> ______________________________________________________________________

[...]

> // collects nodes into the collector object `c'.
> void collect(collector& c, node* n)
> {
>
> // link node into the collectors deferment list.
> n->m_next = SWAP(&c.m_defer, n);
>
>
> // bump the count of deferred nodes and check for
> // threshold limit.
> if ((FETCH_ADD(&c.m_defer_count) + 1) >= THRESHOLD)
> {
>
> // try to acquire the quiescence process bin-sema.
> if (! SWAP(&m_quiesce, 1))
> {
>
> // Select the next collector.
> uword32 old = LOAD(&m_current);
> old = (old & 0x1U) ? 0 : 0x1U;
>
>
> // Swap the outer counter with a zero count, install
> // next collector index and get previous collector.
> old = SWAP(&m_current, old);
> collector& oc = m_collect[old & 0xFU];
>
>
> // Decode the reference count of the previous outer
> // counter and decrement the old collectors inner count.
> FETCH_ADD(&c.m_ref_count, (old & 0xFFFFFFF0U) + 0x10U);

Actually, I can strip some unnecessary/redundant logic out of this
`collect()' procedure because the collector `oc' shall be the same collector
as `c'. assert(&oc == &c) should always pass. Here is another test in Relacy
which shows how reader threads can traverse nodes within a lock-free stack:

http://cpt.pastebin.com/f537f2c7c

This codes has a simpler `prv_quiesce_begin()' procedure with the assertion
that I mentioned in place.

Chris M. Thomasson

unread,

Jan 14, 2010, 11:49:33 PM1/14/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:WeS3n.14712$YP1....@newsfe15.iad...
[...]

> Actually, I can strip some unnecessary/redundant logic out of this
> `collect()' procedure because the collector `oc' shall be the same
> collector as `c'. assert(&oc == &c) should always pass. Here is another
> test in Relacy which shows how reader threads can traverse nodes within a
> lock-free stack:
>
>
> http://cpt.pastebin.com/f537f2c7c

BTW, this code relies on a patched version of Relacy. There is a little bug
within Relacy 2.0 that can trigger non-repeatable false data-race errors in
long running tests which take many iterations to finally pass. I will ask
Dmitriy if I can include the patch he so very kindly provided me with along
with my further posts of long running tests until he uploads it up to his
website:

http://groups.google.com/group/relacy/web

The patched files simply correct the issue for me.

Thanks Dmitriy!

:^)

Dmitriy Vyukov

unread,

Jan 15, 2010, 6:49:55 AM1/15/10

to

On Jan 15, 7:49 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Chris M. Thomasson" <n...@spam.invalid> wrote in messagenews:WeS3n.14712$YP1....@newsfe15.iad...

> [...]
>
> > Actually, I can strip some unnecessary/redundant logic out of this
> > `collect()' procedure because the collector `oc' shall be the same
> > collector as `c'. assert(&oc == &c) should always pass. Here is another
> > test in Relacy which shows how reader threads can traverse nodes within a
> > lock-free stack:
>
> >http://cpt.pastebin.com/f537f2c7c
>
> BTW, this code relies on a patched version of Relacy. There is a little bug
> within Relacy 2.0 that can trigger non-repeatable false data-race errors in
> long running tests which take many iterations to finally pass. I will ask
> Dmitriy if I can include the patch he so very kindly provided me with along
> with my further posts of long running tests until he uploads it up to his
> website:
>
> http://groups.google.com/group/relacy/web

Here it is:
http://groups.google.com/group/relacy/browse_frm/thread/a253bb6b22df8b94

--
Dmitriy V'jukov

Chris M. Thomasson

unread,

Jan 23, 2010, 7:45:48 PM1/23/10

to

Here:

http://cpt.pastebin.com/f71480694

is another version of the proxy gc algorithm. This one allows for any thread
to start a quiescence process without having to acquire a collector object
first. It also allows for a thread to check to see if it's previously
acquired collector object has a quiescence process in progress; here is more
context:

http://article.gmane.org/gmane.comp.lib.boost.devel/198747
(read all...)