lock free queue implementation

wolfie2x

unread,

Jul 21, 2008, 11:48:57 AM7/21/08

to

this is a simplified version of a lock free queue I wrote (single
writer, single reader).
(I have tested it on Linux+Intel and Solarisx86+AMD64 multicore)

* are the memory barriers correct/sufficient ?
* are the voliatiles really necessary? (not much of a performance
difference though)
* this seems to work perfectly even without the barriers & volatiles.
why?
* any other thoughts?

// -------------- CODE ---------------------------

class XNode1
{
public:
XNode1() {p_Data = 0; p_Next = 0;}
void Reset() {p_Data = 0; p_Next = 0;}

void* p_Data;
XNode1* p_Next;
};

class XLockFreeQueue1
{
public:
XLockFreeQueue1();
~XLockFreeQueue1();
void Push(void* pData);
void* Pop();

XNode1* p_Head;
volatile XNode1* p_Tail;
volatile XNode1* p_Read; // last read position.
};

XLockFreeQueue1::XLockFreeQueue1()
{
XNode1* pNode = new XNode1();
p_Head = pNode;
p_Tail = pNode;
p_Read = pNode;
}

void XLockFreeQueue1::Push(void* pData)
{
XNode1* pNode;
if (p_Head == p_Read)
{
pNode = new XNode1();
}
else
{
pNode = p_Head;
p_Head = p_Head->p_Next;
pNode->Reset();
}

pNode->p_Data = pData;
p_Tail->p_Next = pNode;
__asm__ __volatile__("":::"memory"); // make sure the next
instruction happens last.
p_Tail = pNode;
}

void* XLockFreeQueue1::Pop()
{
void* pData = 0;
//if (p_Read->p_Next) // this will fail on solaris+AMD64(8-core)!!
if (p_Read != p_Tail)
{
p_Read = p_Read->p_Next;
__asm__ __volatile__("":::"memory"); // do we need this?
pData = p_Read->p_Data;
}
return pData;
}

thx
Sampath Tilakumara

David Schwartz

unread,

Jul 22, 2008, 10:24:31 AM7/22/08

to

On Jul 21, 8:48 am, wolfie2x <wolfi...@gmail.com> wrote:

> * are the memory barriers correct/sufficient ?

They are sufficient on every current x86 CPU, but they are not
guaranteed sufficient by the architecture guidelines. Current x86 CPUs
do speculative fetches, and compiler-only memory barriers don't work
around there. However, all x86 CPUs (at least that I know of, as of
today) peg these speculative fetches to the cache line. If the cache
line is invalidated, so is the fetch.

Intel has, in the past, specifically stated that this may change on
future x86 CPUs. However, it is unlikely that they would do so because
so much Windows code assumes this is so. (And probably a lot of non-
Windows code as well.)

DS

Chris M. Thomasson

unread,

Jul 30, 2008, 2:52:14 AM7/30/08

to

"wolfie2x" <wolf...@gmail.com> wrote in message
news:475d72e8-3b76-4cd1...@x36g2000prg.googlegroups.com...

> this is a simplified version of a lock free queue I wrote (single
> writer, single reader).
> (I have tested it on Linux+Intel and Solarisx86+AMD64 multicore)
>
> * are the memory barriers correct/sufficient ?

You have _zero_ explicit memory barriers in the code. Luckily, a single
producer/consumer queue does not need explicit membars on x86. However, if
you tried to port this over to another architecture... Well, you have
problems with memory visibility on both the producer and consumer.

> * are the voliatiles really necessary? (not much of a performance
> difference though)

volatile will only get you into trouble; use assembly language for the code.

Read this entire thread:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/29ea516c5581240e

> * this seems to work perfectly even without the barriers & volatiles.
> why?

__asm__ __volatile__("":::"memory");

acts as a compiler barrier only.

> * any other thoughts?

AFAICT, you have a massive memory leak. There also might be a subtle
race-condition wrt the `p_Read' variable, I need to examine it some more...

> // -------------- CODE ---------------------------

[...]

lock free queue implementation - thoughts?

wolfie2x

David Schwartz

Chris M. Thomasson