You can use this logic for any GC scheme, not just RCU, e.g. hazard pointers,
and proxy GC. And if the proxy GC was refcounted and *all* of the deleted
object were used as proxy objects, I suppose you could use the proxy refcount.
You wouldn't even have to use check for zero or less than zero logic since
the refcount would be guaranteed to be greater than zero. Except it would
mess up the GC granularity somewhat. So best to keep it simple.
For anyone who got lost at this point, typically if you do reader lock-free
lookups, you have to return by value. If you return by reference you need
to mechanism to protect the returned object. Typically you use reference
counting with modified logic to take into account that attempts to increment
the refcount may take place after you've deleted the object from the collection.
Anyway, one more thing I should think of integrating into atomic-ptr-plus
When you get lemons, you make lemonade.
When you get hardware, you make software.
Or maybe not. The rcuref stuff allows copyable refcounted pointers. I'm
not sure why they wanted to do that. But it has subtle effects that I
don't think they realize are there, or will until it's too late. But
that's their problem I suppose.
Humm, very very interesting...
Indeed, I must admit when I first glanced at your post Joe I thought I was
going to find something that Linux did which could possibly infringe on my
patent application claims. "Luckily, for me at least", the Linux folks, wrt
rcuref.h, seem to be using actual "atomic-ops/membars" to adjust the
reference counters. From what I can immediately gather, they are using some
form of "per-object counting", in other words "the counters are 'shared' "!
That's probably why atomic-ops are being used. They even use lock-hashing
when *cmpxchg is not available...
IMHO, non-distributed counting logic, like the stuff in rcuref.h, is at odds
with the spirit of RCU in general. RCU is all about reducing reader
overheads to basically zero. Unfortunately, they definitely do not seem to
have a zero-overhead ref-counting solution... I guess they still haven't
"discovered" the basic "trick" involved with zero-overhead ref-counting...
Humm, the VZOOM claims that block various RCU and/or SMR improvements could
turn out to be fairly useful after all... Anyway...
I really do wonder why Linux would seemingly subject reader threads to
atomic-ops/membars when they simply did not have to? Heck, their solution
would even render your RCU+SMR suboptimal. All the efforts that you put into
removing the SMR membars would basically be wasted if somebody used it to
guard a reference counting scheme that relied on atomic-ops! Yikes!...
I have not benchmarked VZOOM against the type of ref-counting that Linux has
gone with. Well, now I guess I am going to have to... My guess is that it
will completely blow it away wrt "reads per-second, per-thread"...
What do you think about all of this?
> I mentioned in the LKML that instead of
> atomic_inc_not_zero() logic,
Ahh, a basic rule wrt Proxy-GC based reference counting: Trying to increment
a refcount that is '< 1' means that the object has probably already been
"deferred for destruction", by RCU in this case. Is that what you are
getting at here... Removing the need for CAS? If so:
* They could possibly remove rcuref.h's lock-hashing scheme if they did not
have to use CAS to detect a '< 1' refcount condition...
Also, if they used the type of ref-counting that VZOOM uses, they would not
really need to even worry about this condition simply because the ref-counts
are specially checked by the polling logic itself...
> For anyone who got lost at this point, typically if you do reader
> lookups, you have to return by value. If you return by reference you need
> to mechanism to protect the returned object.
Yes, I solve this issue by using a zero-overhead distributed counting
mechanism, that will be documented during the next couple of days.
Basically, reader threads can return as many object references as they want,
everything is meticulously and efficiently tracked...
Any thoughts anybody?
Virtually Zero-Overhead Object Management (VZOOM)
For what its worth, here is my atomic ref-counted pointer scheme that is
based on the original SMR algorithm:
This design is basically "crowbar proof", but it suffers from a TON of extra
overhead as you will see when I put it up against VZOOM in a demo
application that is going to be released in the next couple of days...
RCU read reference can't be held over a quiescent state (preemption usually)
so RCU can only be used for relatively short term references. If you want
longer term you want to use something else like reference counting which is
better for long term. Reference counting also has better granularity. If a
processor/thread didn't go thru quiescent states frequently, deferred frees
would pile up. So RCU has relatively poor granularity but really low overhead.
Since RCU is already in the picture, it is used to solve the how to safely
increment the reference count problem.
I'm dropping the issue and hoping everybody stops paying attention (assuming
they had started paying attention in the first place :) )
> This design is basically "crowbar proof"
For those who are interested on how to use it, the following crude C++
wraps the smr-based low-level atomic ref-counting C API. I posted the basic
algorithm a while back:
It has all the basic operations that you would expect in a lock-free
smart-pointer. Including the ability to XCHG/CAS smart-pointers. Its simple
counting scheme is compatible with virtually any existing proxy-gc design.
Basically, it has to use a special method, which Joe has already addressed,
to "acquire" references. However, in my smart-pointer implementation the
only time the special logic is actually used is when a thread "initially"
tries to acquire a copy of a refcount. After a thread has initially
incremented the refcount, then it is then allowed to use a "normal"
atomic_inc/dec for subsequent adjustments...
> Reference counting also has better granularity. If a
> processor/thread didn't go thru quiescent states frequently, deferred
> would pile up.
Yes. AFAICT, the RCU teachings did not seem to address this issue; however,
as you know, there are several solutions to the potential problem. VZOOM can
use several methods to decide when its okay to block writer threads when the
number of per-thread and/or per-polling-thread deferred objects hits a
configurable "high-watermark". A simple distributed scheme based on
per-thread condvars or semaphores can be used to allow writer threads to
wait for per-thread and/or per-polling-thread synchronization-epochs, or for
the total number of per-thread and/or per-polling-thread deferred objects to
hit a certain configurable "low-watermark"...
> So RCU has relatively poor granularity but really low overhead.
Unfortunately, IMHO, the ref-counting method they are using renders RCU's
low overhead features basically meaningless...
> Since RCU is already in the picture, it is used to solve the how to safely
> increment the reference count problem.
Humm, I wonder if a Linux OS implementer would even be "slightly interested"
in an algorithm like VZOOM... I mean, it's 100% compatible with all existing
"RCU-compliant" algorithms, and it allows for refs to exist across quiescent
Do you think that "they" might be into the "business" of "borrowing"
somebody's ideas?.... Perhaps, even our ideas... Humm...