My question turned up while implementing spin-wait
when a thread needs to wait for another thread to complete a short work,
so short that locking/releasing a mutex might take longer than the work.
The problem the way I see it is the standard does not explicitly define a
clear way to refresh the cpu cache before re-loading an atomic variable.
In my example code below, atomic_oneway_flag::spin_wait_flag has 3 suggested
implementations. They are equivalent w.r.t memory ordering as seen by user
code. But they might not be equal in speed.
Specifically, it seems that practically (correct me if I'm wrong),
implementation-2 of atomic_oneway_flag::spin_wait_flag below
is faster/better than implementation-1.
I.e. load( ..., memory_order_acquire ) or fence( memory_order_acquire )
Now, is it somehow implied by the standard that an acquire operation might
refresh the following loads faster than a relaxed operation?
I cannot see that it is.
Thus, I would expect to do something like implementation-3 below using an
operation that explicitly and specifically refreshes the cache for the next
relaxed load.
class atomic_oneway_flag
{
//data
private: atomic<bool> m;
//ctors
public: atomic_oneway_flag()
:
m(false)
{
}
//methods
public: void turn_on()
{
std::atomic::store( m, true, memory_order_release );
}
public: bool test()
{
bool x( std::atomic::load( m, memory_order_relaxed ) );
if( x ) {
std::atomic::fence( memory_order_acquire );
}
return x;
}
#if USE_IMPLEMENTATION() == 1
public: void spin_wait_flag() //implementation 1
{
while( true ) {
bool x( std::atomic::load( m, memory_order_relaxed ) );
if( x ) {
std::atomic::fence( memory_order_acquire );
return;
}
}
}
#elif USE_IMPLEMENTATION() == 2
public: void spin_wait_flag() //implementation 2
{
while( true ) {
bool x( std::atomic::load( m, memory_order_acquire ) );
/* if x is false acquire might cause */
/* cpu to refresh faster for next load */
if( x ) {
return;
}
}
}
#elif USE_IMPLEMENTATION() == 3
public: void spin_wait_flag() //implementation 3
{
while( true ) {
bool x( std::atomic::load( m, memory_order_relaxed ) );
if( x ) {
std::atomic::fence( memory_order_acquire );
return;
} else {
/* Some code that refreshes the cache for the */
/* following relaxed load. */
/* Supposedly std::atomic::load_memory_barrier(); */
}
}
}
#elif
#error
#endif
};
//user code
atomic_oneway_flag flag;
//thread 1
... do some very short work
flag.turn_on();
//threads 2..N
flag.spin_wait_flag(); //while thread 1 does short work.
... do some work
regards,
itaj
--
[ comp.std.c++ is moderated. To submit articles, try posting with your ]
[ newsreader. If that fails, use mailto:
std-cpp...@vandevoorde.com ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ:
http://www.comeaucomputing.com/csc/faq.html ]