Hi,
You may also find the following work interesting:
http://www.springerlink.com/content/5q160889g79j5533/
and the companion presentation slides:
http://calvados.di.unipi.it/storage/talks/2012_SPSC_Europar.pdf
All the SPSC queues implementation presented in the work
do not use RMW operations and are intrusive.
One queue is bounded (named SPSC, aka fastforward queue)
and 2 queues are unbounded (the dSPSC and the uSPSC).
The dSPSC queue is a node based SPSC queue with node cache
(basically it is the Michael & Scott 2-lock MPMC queue algorithm
optimized for the SPSC case).
The uSPSC queue combines the SPSC and the dSPSC queues in order
to obtain a buffer-based (cache-oblivious) unbounded lock-free SPSC queue
(I already know that Dmitry do not like too much such 'software
engineering approach' ;) ).
You can find the real C++ implementation of the queues in the FastFlow
source code (SourceForge svn).
Hope this helps.
Regards,
Massimo