The original message for this can be found here:
https://groups.google.com/forum/#!original/comp.arch/k1Qr520dcDk/4vl_jiVRBQAJ
within the following thread:
https://groups.google.com/d/topic/comp.arch/k1Qr520dcDk/discussion
I am wondering how bad this hack can really be in C++. Take careful note
of the SPECIAL macro... Here is the original text:
Fwiw, lets see if I can create some pseudo code on the fly right here of
a _highly_ experimental atomic "mostly" lock-free stack I did a while
back... Pardon any typos... So, this stack should be "different" in that
it uses a loopless atomic exchange operation, well on the x86 anyway,
for the push and pop, or flush to be more accurate. LOCK XCHG, or isn't
LOCK implied for XCHG on x86? I think so, but whatever. atomic
exchange... Here we go!
Lets create a special address that comes in handy later...
Membars aside for a moment:
// hackish!
#define SPECIAL ((node*)0xDEADBEEF)
Lets create a node:
struct node
{
node* next;
// user defined payload
};
Okay. Lets create a stack:
struct stack
{
node* head; // initialize with a nullptr
};
Fine. Lets create a function that adds a new node to this stack:
void stack_push(stack* self, node* n)
{
n->next = SPECIAL;
node* previous = XCHG(&self->head, n);
n->next = previous;
}
Okay! That's it for push. Nice and simple, basically wait-free and gets
LIFO order: Okay for a stack. Now, lets allow one to remove all items
from this stack at once:
node* stack_flush(stack* self)
{
return XCHG(&self->head, nullptr);
}
Perfect! A thread can grab all of the nodes at once. Pretty fast, and
wait-free. Now... What can we do with the list of nodes returned from
stack_flush, assuming its not nullptr? Well.. We can process it! Lets
define something called user_process_node. Its a function that takes a
pointer to a node as input, and does whatever a user wants with its user
defined payload. I do not even need to define it, just declare its
interface. So, user_process_node(node*) is it.
Now, lets create a "driver" that eventually calls into
user_process_node, call it node_process. It takes a list of nodes and
executes each one of them... So, something like...
Humm, well, before I show the driver, perhaps I should show a very
important function first, call it node_next_wait. Perhaps, the most
important. Its going to turn people off for a moment, but it can be sort
of, "amortized" across the complexity of user processing in a sense.
node* node_next_wait(node* n)
{
node* next = nullptr;
while ((next = n->next) == SPECIAL) backoff();
return next;
}
Well, there ya go. This is blocking after all, however its still
interesting to me, it returns the next pointer. Okay, lets define the
driver function node_process:
void node_process(node* head)
{
node* n = head;
while (n)
{
// user processing, _before_ we wait...
// should soak up some time...
user_process_node(n);
// wait for the next work
n = node_next_wait(n);
}
}
Okay, this can processes a list of nodes using wait free atomics for the
stack, and use some blocking, spinning on nodes, however, the user
processing should act as a sort of backoff anyway, minimizing actual
spins doing nothing.
Think of something like:
node* work = stack_flush(work_stack);
node_process(work);
Well, if work is not nullptr, then actual user work is going to be
performed.
[...]
I have this coded up in C++ somewhere. I need to find it.