You can get rid of the expensive memory barrier when you set hazard pointers
by combining it with a version of RCU that uses context switches (voluntary
and involuntary), signals, syscalls or anything that causes processor
serialization as quiescent states. You delay checking the hazard pointers
until you have an RCU checkpoint (i.e. all threads have quiesced at least once).
If you're using a polling for SMR/hazard pointers this won't cause any appreciable
extra delay.
The processor serialization from the RCU quiesce points ensure that the hazard
pointer operations are seen in correct order without requiring a memory barrier.
However, you still need to ensure the compiler correctly orders the instructions.
This will probably require inline assembler. On pipelined architectures setting
a hazard pointer is pretty close to a raw pointer load, about 2X on powerpc.
On Solaris and possibly Aix, you can use /proc to get context switching counts
for threads to use as quiesce points. For other unices and Linux, you will need
to periodically signal each thread to execute a signal handler that increments
a signal count to use as quiesce points. For windows, there's no asynchronous
signal facility or /proc, but there's a hack that will work. I won't disclose it since
I don't support windows and I want to see how long it takes someone else
to figure it out.
If you can determine thread state, i.e. running or not running, you can use the
not running as a quiesce state. This takes care of some forward progress
problems RCU has with suspended threads not quiescing periodically if
they've been indefinitely suspended.
The advantage of this over pure RCU is that it allows you to hold local
references for indefinite periods of time. With RCU you have to keep
the references short since RCU is a proxy GC and a thread not quiescing
holds up GC for everything.
Comparison with deferred reference counting. Deferred reference counting
has the problem that in C/C++ it has no control over how local references
are expressed and used, unlike Java VM's. Although not likely, a pointer
doesn't have to like like a pointer to something, it could be an offset or
a hash of something. More problematic is a pointer put into a locally
owned heap object. The thread stack only scan would never catch this
since the reference isn't on the stack. And although a stack GC scan is
cheaper than scanning the heap plus stacks, it's still expensive. Hazard
pointers get around this problem by explicity and formally declaring local
references/usage.
There's an alternative method of pointer usage declation where you'd
simply store the pointer into the hazard pointer w/o the reload and
recheck, and at scan time check the stack frame register save
area as well as the hazard pointers. This is cheaper than scanning
the whole stack but as the hazard pointer reload and recheck are very
cheap on a pipelined cpu to begin with, I don't think it's worth it.
On Z architecture you could use a MVC instruction to store the global
pointer into the hazard pointer and then load from the hazard pointer
rather than the usual hazard pointer logic, since the MVC instruction
isn't interruptable. I don't know if it's any faster but you could do it.
I think that's it.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.