Proxy collector question

Toebs Douglass

unread,

Jan 16, 2011, 1:35:33 PM1/16/11

to

I've spent the last two weeks thinking about deferred reclamation.

I'm currently thinking about proxy collection.

My current thought is this; you have a proxy collector object, which is
a ref count and an add-only lock-free singly-linked list.

Threads put their delete candidates into this list. When a thread goes
into an non-quiescent state, it notes the current proxy object,
increments the ref counter on the proxy object. When it goes quiescent,
it derefs.

When a thread decides its time for reclamation, it raises a flag (so no
other thread will also do so at the same time) and gets on with
reclamation - it makes a new proxy object, atomically swaps it with the
original and then puts the old proxy object into a singled-threaded
doubly-linked list of proxy objects. (Threads note which proxy object
they're using when they go non-quiescent, so they will deref the correct
object).

The thread then scans that list and any proxy object which has gone down
to a ref count of zero can have all its delete candidates reclaimed (and
the proxy object can be deallocated).

Now, this is a bad design because you need two atomic ops per non-
quiescent RCU state, but it is a simple design, which is where I'm at
right now - learning curve.

The unresolved problem however is how to swap the current proxy object
for a new proxy object.

The proxy object consists of its ref count and its list. So a thread
which wants to use it gets the addy of the proxy object and THEN
increments the ref count.

There is I think a race condition, whereby a thread gets the proxy
object address, then doesn't run for a while due to OS scheduling,
during which time another thread performs reclamation - it swaps in a
new object and puts the old object on the list of old objects. But if
that old object already has a ref count of 0, it will be deallocated.

So, clearly, I'm missing something - this isn't quite how proxy
collection is done. Any words from the wise to me on my the right line
of thought?

Dmitriy Vyukov

unread,

Jan 17, 2011, 12:15:11 AM1/17/11

to

I would suggest you to print out the original Chris Thomasson's
pc_sample and study in detail how it works:
http://webpages.charter.net/appcore/misc/pc_sample_h.html
http://webpages.charter.net/appcore/misc/pc_sample_c.html
Also search this group for "pc_sample".

--
Dmitriy V'jukov
All about lock-free algorithms, multicore, scalability, parallel
computing and related topics
www.1024cores.net

Toebs Douglass

unread,

Jan 17, 2011, 2:15:59 AM1/17/11

to

In article <c32e34f8-5026-44f2-b5e0-1fef62b2b4f3@
1g2000yqz.googlegroups.com>, dvy...@gmail.com says...

> > So, clearly, I'm missing something - this isn't quite how proxy

> > collection is done. ï¿œAny words from the wise to me on my the right line
> > of thought?

> I would suggest you to print out the original Chris Thomasson's
> pc_sample and study in detail how it works:
> http://webpages.charter.net/appcore/misc/pc_sample_h.html
> http://webpages.charter.net/appcore/misc/pc_sample_c.html
> Also search this group for "pc_sample".

Thanks, Dmitry!

Looking at it with my bleary 8am morning eyes right now =-)

Dmitriy Vyukov

unread,

Jan 17, 2011, 2:23:01 AM1/17/11

to

On Jan 17, 10:15 am, Toebs Douglass <a...@b.com> wrote:
> In article <c32e34f8-5026-44f2-b5e0-1fef62b2b4f3@

> 1g2000yqz.googlegroups.com>, dvyu...@gmail.com says...

>
> > > So, clearly, I'm missing something - this isn't quite how proxy

> > > collection is done. Any words from the wise to me on my the right line

> > > of thought?
> > I would suggest you to print out the original Chris Thomasson's
> > pc_sample and study in detail how it works:

At least that's what I did :)

> >http://webpages.charter.net/appcore/misc/pc_sample_h.html
> >http://webpages.charter.net/appcore/misc/pc_sample_c.html
> > Also search this group for "pc_sample".
>
> Thanks, Dmitry!
>
> Looking at it with my bleary 8am morning eyes right now =-)

Perhaps, you are better to start with the concept of Differential
Reference Counting:
http://www.1024cores.net/home/lock-free-algorithms/object-life-time-management/differential-reference-counting

Dmitriy Vyukov

unread,

Jan 17, 2011, 2:28:40 AM1/17/11

to

On Jan 16, 9:35 pm, Toebs Douglass <a...@b.com> wrote:

> I've spent the last two weeks thinking about deferred reclamation.
>
> I'm currently thinking about proxy collection.
>
> My current thought is this; you have a proxy collector object, which is
> a ref count and an add-only lock-free singly-linked list.

Centralized mutable data structures are a multicore anti-pattern.

> Threads put their delete candidates into this list. When a thread goes
> into an non-quiescent state, it notes the current proxy object,
> increments the ref counter on the proxy object. When it goes quiescent,
> it derefs.
>
> When a thread decides its time for reclamation, it raises a flag (so no
> other thread will also do so at the same time) and gets on with
> reclamation - it makes a new proxy object, atomically swaps it with the
> original and then puts the old proxy object into a singled-threaded
> doubly-linked list of proxy objects. (Threads note which proxy object
> they're using when they go non-quiescent, so they will deref the correct
> object).

Don't fall into that trap. A thread MUST defer an object to the most
current epoch as determined AFTER the object is made unaccessible. Not
to some previous epoch.

> The thread then scans that list and any proxy object which has gone down
> to a ref count of zero can have all its delete candidates reclaimed (and
> the proxy object can be deallocated).
>
> Now, this is a bad design because you need two atomic ops per non-
> quiescent RCU state, but it is a simple design, which is where I'm at
> right now - learning curve.

It depends on what operation you are implementing. If you are
implementing an O(1) operation (like queue enqueue/dequeue) then it's
costly. However if you are implementing an O(N) operation (like list
traversal), then 2 mutations of shared state are amortized over N.

Toebs Douglass

unread,

Jan 17, 2011, 5:49:56 PM1/17/11

to

In article <ba124c90-03b2-431c-a741-bb59f4ccfe56@
1g2000yqz.googlegroups.com>, dvy...@gmail.com says...

> On Jan 16, 9:35 pm, Toebs Douglass <a...@b.com> wrote:

> > My current thought is this; you have a proxy collector object, which
is
> > a ref count and an add-only lock-free singly-linked list.
>
> Centralized mutable data structures are a multicore anti-pattern.

Man... don't gimmi that! lols :-)

You can just "it's bad" :-)

More seriously, I know. Anything which has to be shared ends up being a
bottleneck. Like plain ref counting only works up to a hundred or so
CPUs.

I know per-list is necessary for scaling (so, basically, you have to do
it), but the proxy collector with a ref count is nice because you no
longer have to keep track of every threads quiescent states (which is
what I think I need to do in the per-list approach, although I'm
wondering if I can get around it using the same sort of technique as a
proxy object).

> > When a thread decides its time for reclamation, it raises a flag (so
no
> > other thread will also do so at the same time) and gets on with
> > reclamation - it makes a new proxy object, atomically swaps it with the
> > original and then puts the old proxy object into a singled-threaded
> > doubly-linked list of proxy objects. (Threads note which proxy object
> > they're using when they go non-quiescent, so they will deref the correct
> > object).
>
> Don't fall into that trap. A thread MUST defer an object to the most
> current epoch as determined AFTER the object is made unaccessible. Not
> to some previous epoch.

Don't worry - it doesn't happen. Threads always use the CURRENT proxy
object - my concern was what happens if a thread gets the current proxy
object, only to find it replaced underneath it.

> > The thread then scans that list and any proxy object which has gone
down
> > to a ref count of zero can have all its delete candidates reclaimed (and
> > the proxy object can be deallocated).
> >
> > Now, this is a bad design because you need two atomic ops per non-
> > quiescent RCU state, but it is a simple design, which is where I'm at
> > right now - learning curve.
>
> It depends on what operation you are implementing. If you are
> implementing an O(1) operation (like queue enqueue/dequeue) then it's
> costly. However if you are implementing an O(N) operation (like list
> traversal), then 2 mutations of shared state are amortized over N.

It's expensive - it'd be for every operation, e.g. queue, dequeue, push,
pop, etc. But hey, learning curve stuff here.

James

unread,

Jan 17, 2011, 6:41:00 PM1/17/11

to

"Toebs Douglass" <a...@b.com> wrote in message
news:MPG.279d58d77...@news-europe.giganews.com...

> I've spent the last two weeks thinking about deferred reclamation.
>
> I'm currently thinking about proxy collection.

[...]

Here is the proxy collector code that I have "extensively examined" in the
past:

http://webpages.charter.net/appcore/misc/pc_sample_h_v1.html

IMVHO, it's fairly straightforward. IMHO, he is using DWCAS to make things
really simple. You should closely examine the actual referencing counting
that's going on here. Chris is incrementing outer count with 2, and
decrementing the inner count by 2. And when a "proxy swap" occurs he adds 1
during the "combining" phase of differential reference counting (e.g., Refer
to the _excellent_ article posted by Dmitriy; basically adding outer count
to the inner count) to change the parity of the inner count to "collection
in progress" or whatever. Anyway, I think inc/dec by 2 and use +1 in swap is
a nice/decent trick. Anyway, here Chris' "old" proxy collector, according to
him:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/3ded0e3c5496eaa0/04f8548488759196

And here is most recent one I can find; (IMHO, pretty interesting
algorithm):

http://cpt.pastebin.com/f71480694

It's implemented using Dmitriy's Relacy Race Detector:

http://www.1024cores.net/home/relacy-race-detector

Dmitriy Vyukov

unread,

Jan 18, 2011, 2:20:22 AM1/18/11

to

On Jan 18, 1:49 am, Toebs Douglass <a...@b.com> wrote:
> > > My current thought is this; you have a proxy collector object, which
> is
> > > a ref count and an add-only lock-free singly-linked list.
>
> > Centralized mutable data structures are a multicore anti-pattern.
>
> Man... don't gimmi that! lols :-)
>
> You can just "it's bad" :-)
>
> More seriously, I know. Anything which has to be shared ends up being a
> bottleneck. Like plain ref counting only works up to a hundred or so
> CPUs.
>
> I know per-list is necessary for scaling (so, basically, you have to do
> it), but the proxy collector with a ref count is nice because you no
> longer have to keep track of every threads quiescent states (which is
> what I think I need to do in the per-list approach, although I'm
> wondering if I can get around it using the same sort of technique as a
> proxy object).

I'm not sure that do you mean here, but what I mean is as follows.
Each thread has a private defer list.
Reference counting is still used for epoch change detection. But it
has nothing to do with actual object reclamation, it merely announces
that "now current epoch is X+1". At the first opportunity, threads
notice the change and flush their private defer lists.
So one has both per-thread lists and simple centralized epoch change
detection.
Perhaps it must also include some form of "reclamation helping". That
is, if a thread fails to flush his private list in a timely fashion,
some other thread should help him.

--
Dmitriy V'jukov

Toebs Douglass

unread,

Jan 19, 2011, 5:48:44 PM1/19/11

to

In article <b61812fd-abd9-4e6a-a9cc-
476ede...@m13g2000yqb.googlegroups.com>, dvy...@gmail.com says...

> On Jan 18, 1:49 am, Toebs Douglass <a...@b.com> wrote:

> I'm not sure that do you mean here,

*nod* it's not explained - and with these things, they're complex enough
you really have to explain them properly if you want someone to know
what you're thinking.

> Each thread has a private defer list.
> Reference counting is still used for epoch change detection. But it
> has nothing to do with actual object reclamation, it merely announces
> that "now current epoch is X+1". At the first opportunity, threads
> notice the change and flush their private defer lists.

This is indeed what I have designed.

> So one has both per-thread lists and simple centralized epoch change
> detection.

Except that at the moment, epochs are on a per-thread basis; there is a
shared state indicating whether or not a thread is currently active in
the data structure or not, and a count of how many times the thread has
become quiescent; each thread, when it has say 100 objects in its
private defer list, compares the global state with a local copy it made
earlier - and that compare leads to the decision whether or not to
increment the epoch for that thread.

But I think this can be done more efficiently. Thinking about this
currently.

> Perhaps it must also include some form of "reclamation helping". That
> is, if a thread fails to flush his private list in a timely fashion,
> some other thread should help him.

Well, I think that has to happen anyway, because when a thread ends, it
may not be in a position to fully reclaim. Although it strikes me now
you could just keep that thread around and then, say every second, get
it to check for a new epoch (and so to try to reclaim more).

James

unread,

Jan 20, 2011, 9:48:02 AM1/20/11

to

"James" <n...@spam.invalid> wrote in message
news:ih2mnk$f5k$1...@speranza.aioe.org...

> "Toebs Douglass" <a...@b.com> wrote in message
> news:MPG.279d58d77...@news-europe.giganews.com...
>> I've spent the last two weeks thinking about deferred reclamation.
>>
>> I'm currently thinking about proxy collection.

Perhaps you should read this when you get some time Toebs :

http://www.rdrop.com/users/paulmck/RCU/hart_ipdps06.pdf

[...]

Dmitriy Vyukov

unread,

Jan 21, 2011, 2:32:24 AM1/21/11

to

On Jan 20, 1:48 am, Toebs Douglass <a...@b.com> wrote:

> > Perhaps it must also include some form of "reclamation helping". That
> > is, if a thread fails to flush his private list in a timely fashion,
> > some other thread should help him.
>
> Well, I think that has to happen anyway, because when a thread ends, it
> may not be in a position to fully reclaim. Although it strikes me now
> you could just keep that thread around and then, say every second, get
> it to check for a new epoch (and so to try to reclaim more).

Been there, done that.
Toebs, when you introduce implicit waiting under the covers like that
*always* think about potential induced deadlocks. And the answer is
usually *No*.

enter_non_quiescent_region();
...
th = create_thread(...);
...
join_thread(th);
...
leave_non_quiescent_region();

Ouch!

Toebs Douglass

unread,

Jan 22, 2011, 6:24:59 AM1/22/11

to

In article <7e2a125f-62cc-451c-a1a1-941702cd0542
@e20g2000vbn.googlegroups.com>, dvy...@gmail.com says...

> On Jan 20, 1:48 am, Toebs Douglass <a...@b.com> wrote:

> > Well, I think that has to happen anyway, because when a thread ends,
it
> > may not be in a position to fully reclaim. Although it strikes me now
> > you could just keep that thread around and then, say every second, get
> > it to check for a new epoch (and so to try to reclaim more).
>
> Been there, done that.
> Toebs, when you introduce implicit waiting under the covers like that
> *always* think about potential induced deadlocks. And the answer is
> usually *No*.
>
> enter_non_quiescent_region();
> ...
> th = create_thread(...);
> ...
> join_thread(th);
> ...
> leave_non_quiescent_region();
>
> Ouch!

Very ouch :-)

Right now - but maybe not in the future - that's not a problem for me.
I'm using RCU only for deferred reclamation and the non-quiescent
regions are all inside data structure function calls which cannot be
subdivided by the user, e.g. "pop from stack", "enqueue to queue", etc.

However, I am thinking about a singly-linked list, and there you need to
be able to iterate over the list, which implies a cursor, which implies
a long-term non-quiescent state, and I'm not sure that can work at all
anyway... but it would also have the potential to be a problem in the
way you describe here.