low-overhead mpsc queue

2776 views
Skip to first unread message

Dmitriy V'jukov

unread,
May 13, 2008, 3:05:46 AM5/13/08
to lock-free
-------------------------------------------------------------
Non-intrusive version:
-------------------------------------------------------------

struct mpscq_node_t
{
mpscq_node_t* volatile next;
void* state;

};

struct mpscq_t
{
mpscq_node_t* volatile head;
mpscq_node_t* tail;

};

void mpscq_create(mpscq_t* self, mpscq_node_t* stub)
{
stub->next = 0;
self->head = stub;
self->tail = stub;

}

void mpscq_push(mpscq_t* self, mpscq_node_t* n)
{
n->next = 0;
mpscq_node_t* prev = XCHG(&self->head, n); // serialization-point
wrt producers
prev->next = n; // serialization-point wrt consumer

}

mpscq_node_t* mpscq_pop(mpscq_t* self)
{
mpscq_node_t* tail = self->tail;
mpscq_node_t* next = tail->next; // serialization-point wrt
producers
if (next)
{
self->tail = next;
tail->state = next->state;
return tail;
}
return 0;

}


-------------------------------------------------------------
And here is intrusive version:
-------------------------------------------------------------

struct mpscq_node_t
{
mpscq_node_t* volatile next;

};

struct mpscq_t
{
mpscq_node_t* volatile head;
mpscq_node_t* tail;
mpscq_node_t stub;

};

#define MPSCQ_STATIC_INIT(self) {&self.stub, &self.stub, {0}}

void mpscq_create(mpscq_t* self)
{
self->head = &self->stub;
self->tail = &self->stub;
self->stub.next = 0;

}

void mpscq_push(mpscq_t* self, mpscq_node_t* n)
{
n->next = 0;
mpscq_node_t* prev = XCHG(&self->head, n);
//(*)
prev->next = n;

}

mpscq_node_t* mpscq_pop(mpscq_t* self)
{
mpscq_node_t* tail = self->tail;
mpscq_node_t* next = tail->next;
if (tail == &self->stub)
{
if (0 == next)
return 0;
self->tail = next;
tail = next;
next = next->next;
}
if (next)
{
self->tail = next;
return tail;
}
mpscq_node_t* head = self->head;
if (tail != head)
return 0;
mpscq_push(self, &self->stub);
next = tail->next;
if (next)
{
self->tail = next;
return tail;
}
return 0;

}

Disadvantages:
- Push function is blocking wrt consumer. I.e. if producer blocked in
(*), then consumer is blocked too. Fortunately 'window of
inconsistency' is extremely small - producer must be blocked exactly
in (*). Actually it's disadvantage only as compared with totally lock-
free algorithm. It's still much better lock-based algorithm.

Advantages:
+ Wait-free and fast producers. One XCHG is maximum what one can get
with multi-producer non-distributed queue.
+ Extremely fast consumer. On fast-path it's atomic-free, XCHG
executed per node batch, in order to grab 'last item'.
+ No need for node order reversion. So pop operation is always "O(1)".
+ Intrusive. No need for additional internal nodes.
+ ABA-free.
+ No need for PDR!

WOW!

I am ready to suffer the disadvantage provided all advantages. No need
for PDR means that one can use this algorithm out-of-the-box. No need
for thread registration/deregistration, periodic activity, deferred
garbage etc.

Original post in comp.programming.threads (see for discussion):
http://groups.google.ru/group/comp.programming.threads/browse_frm/thread/33f79c75146582f3

Dmitriy V'jukov
Reply all
Reply to author
Forward
0 new messages