http://relacy.pastebin.com/f35048b7a
simulates a user-space RCU algorithm that is driven by a polling thread. 
Reader threads simply execute a synchronization operation on a 
episodic/periodic basis. Or, the reader threads can dynamically "opt-out" of 
the RCU polling process altogether. A reader thread would only do this 
before a long blocking operation, or lengthy number crunching. Imagine a 
reader thread whose sole purpose was to execute search requests that are 
frequently generated by vary many users on a database server:
_______________________________________________________________
#define READER_SEARCH_SYNC_RATIO 256
void reader_search_thread(size_t this_rcu_idx)
{
    g_rcu.activate(this_rcu_idx);
    for (unsigned long i = 1 ;; ++i)
    {
        if (! try_to_pop_search_requests())
        {
            // really slow path!
g_rcu.deactivate(this_rcu_idx);
wait_for_search_requests();
            g_rcu.activate(this_rcu_idx);
        }
execute_search_requests();
        if (! (i % READER_SEARCH_SYNC_RATIO))
        {
            // slow path
            g_rcu.sync(this_rcu_idx);
        }
    }
    g_rcu.deactivate(this_rcu_idx);
}
_______________________________________________________________
This would execute a StoreLoad membar once every 256 search operations under 
load; that's pretty good... 
public:
    thread_safe_stack()
    :   m_head(NULL)
    {
}
public:
    void push(T* nhead, T* ntail)
    {
        T* head = m_head.load(std::memory_order_relaxed);
        do
        {
            VAR(ntail->m_next) = head;
        }
        while (! m_head.compare_exchange_weak(
                 head,
                 nhead,
                 std::memory_order_release));
    }
    T* flush(T* head = NULL)
    {
        return m_head.exchange(head, std::memory_order_acquire);
    }
};
_________________________________________________________________
The `thread_safe_stack<T>::flush()' function should have 
`memory_order_acq_rel' semantics when `head' is not NULL:
    T* flush(T* head = NULL)
    {
        if (! head)
        {
            return m_head.exchange(NULL, std::memory_order_acquire);
        }
        return m_head.exchange(head, std::memory_order_acq_rel);
    }
`head' is always NULL in the test case, so nothing was found. BTW, the 
reason the membar is needed is because your actually consuming and producing 
in a single atomic operation when `head' is not NULL.