Lock-free algorithms and reference counting

Joe Seigh

unread,

Nov 9, 2006, 12:31:16 PM11/9/06

to

Does anyone know of any write lock-free algorithms where
on a modification it's not known whether an object in a
collection has been removed or not? For example with
lock-free fifo and lifo queue when you remove a node
it's no longer in the queue.

I'm wondering whether refcounting buys you anything.
It would seem to be unnecessary overhead. I'm
thinking of dropping or rather, not putting it in,
support for refcounting in RCU+SMR. I can't think
of a reason why it would be needed.

This is not about refcounting in general or for other
specific uses.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Joe Seigh

unread,

Nov 13, 2006, 2:34:53 PM11/13/06

to

Joe Seigh wrote:
> Does anyone know of any write lock-free algorithms where
> on a modification it's not known whether an object in a
> collection has been removed or not? For example with
> lock-free fifo and lifo queue when you remove a node
> it's no longer in the queue.
>
> I'm wondering whether refcounting buys you anything.
> It would seem to be unnecessary overhead. I'm
> thinking of dropping or rather, not putting it in,
> support for refcounting in RCU+SMR. I can't think
> of a reason why it would be needed.
>

Well refcounting support in RCU+SMR is out then.

I tested a proxy collector using RCU+SMR and realized that
you need acquire and release membars since you can't
use dependent load on the proxy since it doesn't point
into the data struture. That sort of slows it down
to where it's only as fast or 10% faster than refcounted
proxy collector.

I also figured out how to make proxy collectors support
write lock-free, not just read lock-free. Except the
only write lock-free algorithms I can think of off
hand are lock-free lifo and fifo queues where non
proxy PDR would be better.

Chris Thomasson

unread,

Nov 14, 2006, 7:49:28 PM11/14/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:crednWmmTa3PVcXY...@comcast.com...

> Joe Seigh wrote:
>> Does anyone know of any write lock-free algorithms where
>> on a modification it's not known whether an object in a
>> collection has been removed or not? For example with
>> lock-free fifo and lifo queue when you remove a node
>> it's no longer in the queue.
>>
>> I'm wondering whether refcounting buys you anything.
>> It would seem to be unnecessary overhead. I'm
>> thinking of dropping or rather, not putting it in,
>> support for refcounting in RCU+SMR. I can't think
>> of a reason why it would be needed.
>>
>
> Well refcounting support in RCU+SMR is out then.
>
> I tested a proxy collector using RCU+SMR and realized that
> you need acquire and release membars since you can't
> use dependent load on the proxy since it doesn't point
> into the data struture. That sort of slows it down
> to where it's only as fast or 10% faster than refcounted
> proxy collector.

My patent application has something called 'distributed proxy reference
counting (DPRC)' which makes use of per-thread counter arrays. The polling
logic takes this information into account before I makes any decisions wrt
calling a deferred objects "status" function pointer. It only checks the
per-thread counters after epoch, therefore reader threads can modify their
counter arrays without membars in the actual proxy critical-region. The DPRC
algorithm describes the "proxy critical-region" in the abstract so that it
can be used with proxy collector logic that is based on vZOOM, RCU+SMR, RCU,
SMR, atomic counting, ect ect...

> I also figured out how to make proxy collectors support
> write lock-free, not just read lock-free.

Our collector algorithms are already work with lock-free stack or queue,
hashed stack/queue ect... Are you referring to a special kind of writer
algorithm?

Read-Only Iteration: lock-free lifo
---------
ref = acquire proxy-ref
traverse lock-free stack ...
release proxy-ref(ref)

Writer Produce :
lock-free lifo push
---------
normal writer lock-free stack push ...

Writer Consume :
lock-free lifo pop
---------
ref = acquire proxy-ref
do writer lock-free stack ...
which implys reads...
to pop an objX...
ref->defer_next_epoch(objX)...
release proxy-ref(ref)...

That's how I would do it.

Chris Thomasson

unread,

Nov 14, 2006, 8:36:53 PM11/14/06

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:EIqdnSKPE_tk_cfY...@comcast.com...

> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:crednWmmTa3PVcXY...@comcast.com...
>> Joe Seigh wrote:
>>> Does anyone know of any write lock-free algorithms where
>>> on a modification it's not known whether an object in a
>>> collection has been removed or not? For example with
>>> lock-free fifo and lifo queue when you remove a node
>>> it's no longer in the queue.
>>>
>>> I'm wondering whether refcounting buys you anything.
>>> It would seem to be unnecessary overhead. I'm
>>> thinking of dropping or rather, not putting it in,
>>> support for refcounting in RCU+SMR. I can't think
>>> of a reason why it would be needed.
>>>
>>
>> Well refcounting support in RCU+SMR is out then.
>>
>> I tested a proxy collector using RCU+SMR and realized that
>> you need acquire and release membars since you can't
>> use dependent load on the proxy since it doesn't point
>> into the data struture. That sort of slows it down
>> to where it's only as fast or 10% faster than refcounted
>> proxy collector.
>
> My patent application has something called 'distributed proxy reference
> counting (DPRC)' which makes use of per-thread counter arrays.

I think I posted some pseudo-code that described DPRC some more on this
group a while ago...

Chris Thomasson

unread,

Nov 21, 2006, 9:10:46 PM11/21/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:EvydnTZJTp6a9M7Y...@comcast.com...

> Does anyone know of any write lock-free algorithms where
> on a modification it's not known whether an object in a
> collection has been removed or not? For example with
> lock-free fifo and lifo queue when you remove a node
> it's no longer in the queue.

For lock-free linked-lists you have to identify the nodes removed from it as
logically deleted, or actually deleted. This information is kind of required
for any concurrent operations that either push or remove nodes...

> I'm wondering whether refcounting buys you anything.
> It would seem to be unnecessary overhead. I'm
> thinking of dropping or rather, not putting it in,
> support for refcounting in RCU+SMR. I can't think
> of a reason why it would be needed.

Well, I created by reference counting algorithm so that I could search
through a data-structure and return multiple references to nodes that
matched the search criteria. For instance, I can do searches like this:

// search results
struct my_result_t {
// list of refs to matching nodes
node *front;
};

int my_multi_search(list_t *l, my_result_t *r, // criteria) {
int count = 0;
r->front = 0;

do {
// search, inc count and fill the result
// with references to every node we find.

} while(searching());

return count;
}

Do you know of any other reference counting that can do this without using
any atomic operations or memory barriers? IMHO, very lightweight reference
counting, like DPRC, can be fairly useful when you want to return results
from searches, and other various activities..

Joe Seigh

unread,

Nov 21, 2006, 10:21:47 PM11/21/06

to

Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:EvydnTZJTp6a9M7Y...@comcast.com...
>
>>Does anyone know of any write lock-free algorithms where
>>on a modification it's not known whether an object in a
>>collection has been removed or not? For example with
>>lock-free fifo and lifo queue when you remove a node
>>it's no longer in the queue.
>
>
> For lock-free linked-lists you have to identify the nodes removed from it as
> logically deleted, or actually deleted. This information is kind of required
> for any concurrent operations that either push or remove nodes...
>
>

Right, so refcounting wouldn't be of any additional benefit since
you already know what node has to be deallocated at some point.

>
>
>
>>I'm wondering whether refcounting buys you anything.
>>It would seem to be unnecessary overhead. I'm
>>thinking of dropping or rather, not putting it in,
>>support for refcounting in RCU+SMR. I can't think
>>of a reason why it would be needed.
>
>
> Well, I created by reference counting algorithm so that I could search
> through a data-structure and return multiple references to nodes that
> matched the search criteria. For instance, I can do searches like this:
>
>

[...]

>
> Do you know of any other reference counting that can do this without using
> any atomic operations or memory barriers? IMHO, very lightweight reference
> counting, like DPRC, can be fairly useful when you want to return results
> from searches, and other various activities..
>
>

Read references are a different thing. I can handle them without having to
put in any special logic for RCU+SMR. I just need to document how it's
used. Also maybe put together some PDR based refcount pointers, the don't
increment if zero ones like rcuref has.

It's all on the list of stuff to do at some point.

Chris Thomasson

unread,

Nov 22, 2006, 11:37:04 PM11/22/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:EvydnTZJTp6a9M7Y...@comcast.com...

> Does anyone know of any write lock-free algorithms where
> on a modification it's not known whether an object in a
> collection has been removed or not? For example with
> lock-free fifo and lifo queue when you remove a node
> it's no longer in the queue.
>
> I'm wondering whether refcounting buys you anything.
> It would seem to be unnecessary overhead. I'm
> thinking of dropping or rather, not putting it in,
> support for refcounting in RCU+SMR. I can't think
> of a reason why it would be needed.
>
> This is not about refcounting in general or for other
> specific uses.

I think I just came up with a way to do efficient 100% lock-free atomic
reference counting without DWCAS! It works on SPARC 32/64! Humm, not sure
what to do with it just yet...

:O

Chris Thomasson

unread,

Nov 25, 2006, 1:09:38 PM11/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:2Z6dnfZrjIwxXP7Y...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:EvydnTZJTp6a9M7Y...@comcast.com...

[...]

> Read references are a different thing. I can handle them without having
> to
> put in any special logic for RCU+SMR. I just need to document how it's
> used. Also maybe put together some PDR based refcount pointers, the don't
> increment if zero ones like rcuref has.
>
> It's all on the list of stuff to do at some point.

Can you return an arbitrary number of references to nodes per-search? Did
you create a dynamic hazard pointer implementation? (e.g., adjust number of
hazards at runtime.)

I found a way around extending the SMR algorithm in order to return
plurality of references to plurality of shared data-structures per search...

Joe Seigh

unread,

Nov 25, 2006, 1:32:11 PM11/25/06

to

Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:2Z6dnfZrjIwxXP7Y...@comcast.com...
>
>>Chris Thomasson wrote:
>>
>>>"Joe Seigh" <jsei...@xemaps.com> wrote in message
>>>news:EvydnTZJTp6a9M7Y...@comcast.com...
>
> [...]
>
>
>>Read references are a different thing. I can handle them without having
>>to
>>put in any special logic for RCU+SMR. I just need to document how it's
>>used. Also maybe put together some PDR based refcount pointers, the don't
>>increment if zero ones like rcuref has.
>>
>>It's all on the list of stuff to do at some point.
>
>
> Can you return an arbitrary number of references to nodes per-search? Did
> you create a dynamic hazard pointer implementation? (e.g., adjust number of
> hazards at runtime.)
>
> I found a way around extending the SMR algorithm in order to return
> plurality of references to plurality of shared data-structures per search...
>
>

You can do PDR based refcounting, e.g. w/ RCU (rcuref), or with SMR or whatever.
The stuff I dropped support for was using refcounting to manage the global
references rather than the read references by threads. I don't think it's
needed since you know when the last global reference is dropped anyhow.

You would want some form of refcounting for long term read references by
threads since using large numbers of hazard pointers for this is problematic
not only from the point of managing large numbers of hazard pointers but
from the fact they're not really copyable (they are but only in the same
order the hazard pointer scan is in, which probably won't be convenient
for the application).

Chris Thomasson

unread,

Nov 25, 2006, 1:59:08 PM11/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:-badnTaJFcjmFvXY...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:2Z6dnfZrjIwxXP7Y...@comcast.com...
>>
>>>Chris Thomasson wrote:
>>>
>>>>"Joe Seigh" <jsei...@xemaps.com> wrote in message
>>>>news:EvydnTZJTp6a9M7Y...@comcast.com...
>>
>> [...]
>>
>>
>>>Read references are a different thing. I can handle them without having
>>>to
>>>put in any special logic for RCU+SMR. I just need to document how it's
>>>used. Also maybe put together some PDR based refcount pointers, the
>>>don't
>>>increment if zero ones like rcuref has.
>>>
>>>It's all on the list of stuff to do at some point.
>>
>>
>> Can you return an arbitrary number of references to nodes per-search? Did
>> you create a dynamic hazard pointer implementation? (e.g., adjust number
>> of hazards at runtime.)
>>
>> I found a way around extending the SMR algorithm in order to return
>> plurality of references to plurality of shared data-structures per
>> search...
>
> You can do PDR based refcounting, e.g. w/ RCU (rcuref), or with SMR or
> whatever.

Using atomic operations right? I got a way to do it without them, or memory
barriers... Its heavily distributed per-thread reference counting.
Per-thread proxy counting... E.g., more than one object can share the same
proxy counter... The PDR scan logic takes this into account by iterating a
generation of deferred objects against the per-thread counters... A very
simple hash on the pointer value gets to the counter. The counter is can
also be in per-object metadata... It allows a thread to grab multiple
references to an arbitrary number of nodes. The references are also
copyable. This is done through an indirection of the proxy counters...

Also, I should mention another form of reference counting I invented here:

http://groups.google.com/group/comp.arch/browse_frm/thread/24c40d42a04ee855

What do you think of the reference counting algorithm I created for my
memory allocator?

P.S. -- This allocator is one of the fastest and most scaleable memory
allocators out there... It was included in my initial round vZOOM
submission... It outperforms Hoard by a wide margin... It outperforms these
algorithms by a wide margin:

http://portal.acm.org/citation.cfm?id=996893.996848

http://portal.acm.org/citation.cfm?id=773039.512451&coll=GUIDE&dl=ACM...

http://citeseer.ist.psu.edu/johnson91concurrent.html

Luckily, the patent application for this is underway... I had to do it. This
memory allocator is $FUXKING blazing fast!

Any thoughts?

Joe Seigh

unread,

Nov 25, 2006, 2:30:32 PM11/25/06

to

The problem with proxy collectors is their granularity. With refcount
based pointers you can keep the graularity to a minimum. And they
would be used for relatively long references so overhead probaby isn't
a big issue.

Chris Thomasson

unread,

Nov 25, 2006, 2:48:05 PM11/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:kYidnYFeZ5i0BPXY...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:-badnTaJFcjmFvXY...@comcast.com...
>>>Chris Thomasson wrote:
>>>>"Joe Seigh" <jsei...@xemaps.com> wrote in message
>>>>news:2Z6dnfZrjIwxXP7Y...@comcast.com...
>>>>>Chris Thomasson wrote:
>>>>>>"Joe Seigh" <jsei...@xemaps.com> wrote in message
>>>>>>news:EvydnTZJTp6a9M7Y...@comcast.com...
[...]

>> Using atomic operations right? I got a way to do it without them, or
>> memory barriers... Its heavily distributed per-thread reference counting.
>> Per-thread proxy counting... E.g., more than one object can share the
>> same proxy counter...

[...]

>
> The problem with proxy collectors is their granularity.

The granularity of my reference counting is comparable with normal reference
counting. I can use the counting to make objects persist outside of a proxy
garbage collected region. Then the granularity of the proxy collector is not
that big of an issue. However, I see your point.

> With refcount
> based pointers you can keep the graularity to a minimum. And they
> would be used for relatively long references so overhead probaby isn't
> a big issue.

Well, if they are frequently and perhaps persistently searching and
operating on the references returns from the search, then the overhead would
become an issue, no?

Joe Seigh

unread,

Nov 25, 2006, 2:58:54 PM11/25/06

to

Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message

> news:kYidnYFeZ5i0BPXY...@comcast.com...

>
>
>>With refcount
>>based pointers you can keep the graularity to a minimum. And they
>>would be used for relatively long references so overhead probaby isn't
>>a big issue.
>
>
> Well, if they are frequently and perhaps persistently searching and
> operating on the references returns from the search, then the overhead would
> become an issue, no?
>
>

How many simultaneous references are going to be held and for how long?
The proxy stuff I usually make scoped so you know the interval over
which the references are held. You can make the interval as long as
you want but you tie up resources longer. If that's not acceptable
then you go to reference counting. I'm keeping it simple with a few well
specified trade-offs for now. If I make it more complicated I don't
have a realistic way of testing it. I'm pushing it on the stuff I
have now.

Chris Thomasson

unread,

Nov 25, 2006, 3:22:45 PM11/25/06

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:DNCdncvZXbKgv_jY...@comcast.com...

It had a race-condition... Luckily, it was fixable... Humm. This counting
algorithm seems to be working out... Humm...

Chris Thomasson

unread,

Nov 25, 2006, 3:32:34 PM11/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:86WdnRztmdjoPfXY...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:kYidnYFeZ5i0BPXY...@comcast.com...
>>
>>
>>>With refcount
>>>based pointers you can keep the graularity to a minimum. And they
>>>would be used for relatively long references so overhead probaby isn't
>>>a big issue.
>>
>>
>> Well, if they are frequently and perhaps persistently searching and
>> operating on the references returns from the search, then the overhead
>> would become an issue, no?
>>
>>
> How many simultaneous references are going to be held and for how long?

Maybe 20, 30, 100 nodes meet the criteria for a search... Perhaps I need to
operate on all of them. Maybe I release all but one, and hold it for a
longer period of time. I was thinking of a usage pattern like this contrived
example:

--------------

// local vzoom object
vzoom_t *vzthis = vzoom_self();

for (;;) {

// search return
node_t *result[0x100];
node_t *holds[0x100];
int depth, hdepth = 0;

// so the search in collected region
vzoom_proxy_t *ref = vzthis->acquire_proxy(); {
search(vzthis, &result, &depth);
ref->release();
}

// examine an work on the results
for(int i = 0; i < depth; ++i) {
if (search_op_result(result[i])) {
vzoom_copy(vzthis, &result[i], &holds[hdepth]);
++hdepth;
}
}

// release work
vzoom_proxy_t *ref = vzthis->acquire_proxy(); {
ref->drop_and_release(result[i]), depth);
}

// do something else...

// whatever...

// release work
vzoom_proxy_t *ref = vzthis->acquire_proxy(); {
ref->drop_and_release(holds[hdepth], hdepth);
}

// do something else...

// whatever...
}

> The proxy stuff I usually make scoped so you know the interval over
> which the references are held.

Yes. I use lightweight refcounting to get objects to persist outside of the
proxy scope.

> You can make the interval as long as
> you want but you tie up resources longer. If that's not acceptable
> then you go to reference counting.

Right. It just, what kind of reference counting algorithm? This can all be
done without atomic operations...

> I'm keeping it simple with a few well
> specified trade-offs for now. If I make it more complicated I don't
> have a realistic way of testing it. I'm pushing it on the stuff I
> have now.

I hear ya.

;^)