Distributed Proxy Reference Counting: The VZOOM Algorithm...

Chris Thomasson

unread,

Sep 18, 2005, 1:14:33 AM9/18/05

to

I created a highly portable and efficient method for "Virtually
Zero-Overhead Object Management" in multithreaded programs. It basically
allows an application thread in a plurality of application threads, to
acquire a plurality of "persistent" and/or "non-persistent" references to a
plurality of data objects that are shared between the plurality of
application threads, without using any atomic operations and/or StoreLoad
style memory barriers. Also, it allows for a plurality of persistent
references to exist across two or more successive "application thread
epochs"...

Believe it or not, this can be achieved in a portable manner by relying on
basic C compiler instruction reordering rules regarding POSIX mutexs. I
don't feel like linking 40+ pages of documentation and figures here if
nobody's interested; However, I am willing to discuss a particular
embodiment of VZOOM with those that are...

:)

BTW, VZOOM (designed in January 2005) does not infringe on any prior art,
claims or teachings made by the RCU or SMR patents. Thank God!

Oh yeah, its a lot more flexible than SMR or RCU...

;)

Joe Seigh

unread,

Sep 18, 2005, 8:16:39 AM9/18/05

to

Chris Thomasson wrote:
> I created a highly portable and efficient method for "Virtually
> Zero-Overhead Object Management" in multithreaded programs. It basically
> allows an application thread in a plurality of application threads, to
> acquire a plurality of "persistent" and/or "non-persistent" references to a
> plurality of data objects that are shared between the plurality of
> application threads, without using any atomic operations and/or StoreLoad
> style memory barriers. Also, it allows for a plurality of persistent
> references to exist across two or more successive "application thread
> epochs"...

[...]

>
>
> BTW, VZOOM (designed in January 2005) does not infringe on any prior art,
> claims or teachings made by the RCU or SMR patents. Thank God!
>
> Oh yeah, its a lot more flexible than SMR or RCU...
>
> ;)
>
>

Patent it! Or have you already applied for one? "plurality" is patent speak.
BTW, you can also patent improvements on things already patented. If the
improvements are on other people's patents, they're called blocking patents,
otherwise they're called defensive patents (I think).

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Chris Thomasson

unread,

Sep 19, 2005, 2:35:13 AM9/19/05

to

>> BTW, VZOOM (designed in January 2005) does not infringe on any prior art,
>> claims or teachings made by the RCU or SMR patents. Thank God!
>>
>> Oh yeah, its a lot more flexible than SMR or RCU...
>>
>> ;)
> Patent it! Or have you already applied for one? "plurality" is patent
> speak.
> BTW, you can also patent improvements on things already patented. If the
> improvements are on other people's patents, they're called blocking
> patents,
> otherwise they're called defensive patents (I think).

Yes, I have filed for a patent. I figured I needed to do it if I wanted to
be able to use it at all. I basically designed VZOOM in a way that gets
around some annoying limitations in RCU, and some major scalability and
performance problems with SMR. I figured out that SMR and RCU are not very
practical for a "general solution" to the reader/writer problem. For those
who are interested, let me very briefly explain why:

RCU
-------

The main problem with RCU is it simply doesn't allow a reference to exist
over a "quiescent state". This can be very dangerous and tedious to design
around. Calling a function that might go into a "quiescent state" can be a
potentially disastrous and annoying situation. Also, RCU is simply no good
for high numbers of sustained writes. The RCU teachings were not designed
for high numbers of writes, and some implementations can run out of memory
during such conditions...

- VZOOM allows a thread to acquire a "persistent" reference to a data
object. In RCU speak, persistent references can exist across multiple
quiescent points. Plus, VZOOM can efficiently chew on very high numbers of
sustained writes. It also allows the user to adjust the depth limits of the
queues at runtime, so you can use it to manage the memory of lock-free
algorithms that are optimized for high numbers of writes.

SMR
------

The main problems with SMR is that real world implementations require a
nasty StoreLoad style memory barrier for every single acquired reference,
and it was not designed to allow a thread to acquire an arbitrary number of
references. If a thread tries to grab a new reference, and all of its hazard
pointers are already used, your kind of screwed. You have to allocate more
pointers, which brings in the added overhead of a "hazard pointer allocation
scheme". Also, you also have to think about how an abundance of freshly
allocated hazard pointers ( some might be in use, some are just about to be
used, ect... ) will be presented to the "scanning logic". A solution will
most certainly add more overhead via memory barriers and possibly atomic
operations...

- VZOOM allows a thread to acquire basically any number persistent
references to a data object without using any atomic operations and/or
StoreLoad style memory barriers.

As you can see, VZOOM != SMR and/or RCU. It was designed from the ground up
to be flexible enough to be used as a generic solution for many portable
user-space multithreaded applications that are interested in drastically
reducing overhead with regard to threads referencing shared data objects. It
is also a highly distributed algorithm. This makes it work well with NUMA
architectures. There is also an embodiment of VZOOM that can be used in
multithreaded OS Kernels, but it was really designed to be used in
user-space. I could go on and on...

:)

Any thoughts?

Chris Thomasson

unread,

Sep 19, 2005, 6:05:36 AM9/19/05

to

> I figured out that SMR and RCU are not very
> practical for a "general solution" to the reader/writer problem. For those
> who are interested, let me very briefly explain why:

I also believe that SUN has a patent application out on something called the
"Repeat Offender Problem (ROP)". AFAICT it uses a DWCAS to place a "guard"
on a data object. It looks like SUN's algorithm may be more expensive than
SMR...

Any idea why SUN may be using DWCAS for this?

Joe Seigh

unread,

Sep 19, 2005, 8:14:53 AM9/19/05

to

ROP is similar to SMR in some aspects. Sun has gone beyond DWCAS (or DCAS)
at this point. They're into KCSS (k compare, single swap) and transactional
memory for obstruction-free (not lock-free) algorithms.

Chris Thomasson

unread,

Sep 19, 2005, 6:55:21 AM9/19/05

to

> ROP is similar to SMR in some aspects. Sun has gone beyond DWCAS (or
> DCAS)
> at this point. They're into KCSS (k compare, single swap)

IIRC, they used DWCAS to implement KCSS. I believe it had a lot of atomic
operations under the hood, both ROP and KCSS; better off using an optimized
mutex?

;)

> and transactional
> memory for obstruction-free (not lock-free) algorithms.

http://groups.google.com/group/comp.programming.threads/msg/e43fce88eb0775c8

Does "obstruction-free" rely on backoff algorithms in order to avoid
live-lock? BTW, How are they managing the memory for all of that... Do they
use Java GC?

Joe Seigh

unread,

Sep 19, 2005, 9:14:26 AM9/19/05

to

Chris Thomasson wrote:
>>ROP is similar to SMR in some aspects. Sun has gone beyond DWCAS (or
>>DCAS)
>>at this point. They're into KCSS (k compare, single swap)
>
>
> IIRC, they used DWCAS to implement KCSS. I believe it had a lot of atomic
> operations under the hood, both ROP and KCSS; better off using an optimized
> mutex?
>

Just regular CAS AFAICT.

>
>
>
>
>>and transactional
>>memory for obstruction-free (not lock-free) algorithms.
>
>
> http://groups.google.com/group/comp.programming.threads/msg/e43fce88eb0775c8
>
> Does "obstruction-free" rely on backoff algorithms in order to avoid
> live-lock? BTW, How are they managing the memory for all of that... Do they
> use Java GC?
>

They're write "obstruction-free" so they don't need GC. Obstruction-free
means the contention level isn't high enough to cause problems so no
back off is needed.

Chris Thomasson

unread,

Sep 19, 2005, 7:46:25 AM9/19/05

to

> They're write "obstruction-free" so they don't need GC.

I see.

> Obstruction-free means the contention level isn't high enough to cause
> problems so no
> back off is needed.

Humm... IMHO, "obstruction-free" algorithms seem a bit expensive and
unstable under pressure. Plus, the implementations I remember seemed to use
a lot of CAS loops to get the job done.