http://relacy.pastebin.com/f40769514
http://groups.google.com/group/relacy
(you can obtain a copy of Relacy here...)
Here is some pseudo-code of a greatly simplified version of algorithm which
can only be used to synchronizes two threads:
<pseudo-code typed in news reader>
__________________________________________________________________
// IMPORTANT NOTE: the "main thread owns" the mutex on creation...
class two_thread_mutex
{
enum constant
{
OTHER_THREAD = 0,
MAIN_THREAD = 1
};
private:
atomic_word m_state; // = MAIN_THREAD
public:
void other_thread_acquire()
{
while (ATOMIC_LOAD(&m_state) == MAIN_THREAD)
sched_yield();
MEMBAR #LoadStore | #LoadLoad;
}
void other_thread_release()
{
MEMBAR #LoadStore | #StoreStore;
ATOMIC_STORE(&m_state, MAIN_THREAD);
}
public:
void main_thread_acquire()
{
while (ATOMIC_LOAD(&m_state) == OTHER_THREAD)
sched_yield();
MEMBAR #LoadStore | #LoadLoad;
}
void main_thread_release()
{
MEMBAR #LoadStore | #StoreStore;
ATOMIC_STORE(&m_state, OTHER_THREAD);
}
};
// example usage
static int g_shared = 0;
static barrier g_barrier;
void other_thread()
{
for (;;)
{
g_barrier.other_thread_acquire();
++g_shared;
g_barrier.other_thread_release();
sched_yield();
}
}
void main_thread()
{
// since the main thread has access we
// already own the `g_shared' variable
++g_shared;
g_barrier.main_thread_release();
for (;;)
{
sched_yield();
g_barrier.main_thread_acquire();
++g_shared;
g_barrier.main_thread_release();
}
}
__________________________________________________________________
This has got to be more efficient than traditional StoreLoad based mutual
exclusion implementations... However, it's not quite the same as traditional
mutual exclusion. Each time the other thread releases ownership, it
automatically passes ownership the main thread. And vise-versa. This means
that each thread is "dependant" on the others actions when they attempt to
acquire ownership. BTW, this algorithm is easily distributed and can be used
to create StoreLoad membar free read-write synchronization. This may be
actually useful in certain situations...
Any thoughts? Can you spot any holes in it?
Thanks.
> This has got to be more efficient than traditional StoreLoad based mutual
> exclusion implementations... However, it's not quite the same as traditional
> mutual exclusion. Each time the other thread releases ownership, it
> automatically passes ownership the main thread. And vise-versa. This means
> that each thread is "dependant" on the others actions when they attempt to
> acquire ownership. BTW, this algorithm is easily distributed and can be used
> to create StoreLoad membar free read-write synchronization. This may be
> actually useful in certain situations...
>
> Any thoughts? Can you spot any holes in it?
What's the sense in a mutex that always forces the worst possible
behavior? Strict alternation is the worst conceivable mutex access
pattern.
Can you think of any situation in which this makes any sense?
DS
Agreed. However, getting rid of the expensive membar is defiantly a plus.
I was originally using a variant of it as a barrier wait algorithm for N
threads and a single "sync" thread. Here is quick example:
http://groups.google.com/group/comp.arch/msg/1deabd28115f2f3e
But that's basically reversed logic in the sense that the worker threads are
the default owner. This allows for barrier semantics. The main thread can
wait until each worker thread reaches the barrier; perform an operation on
shared state; then release all the workers from the barrier. Worker threads
can wait at the barrier and only be released when all other workers have
arrived and have been released by the main sync thread.
> Can you think of any situation in which this makes any sense?
perhaps something like:
// this would be an example of a read-write syncronization using the
primtive:
_______________________________________________________________
#define READERS 4
static barrier<READERS> g_barrier;
void reader_threads(size_t this_barrier_idx)
{
for (;;)
{
g_barrier.acquire(this_barrier_idx);
// read sensor data
g_barrier.release(this_barrier_idx);
// peform complex processing
}
}
void writer_thread()
{
// gather data from various sensor arrays
// which are frequently producing new information
// update sensor data
g_barrier.release();
for (;;)
{
// gather data from various sensor arrays
// which are frequently producing new information
g_barrier.acquire();
// update sensor data
g_barrier.release();
}
}
_______________________________________________________________
The lock accesses in the frequently executed loop are made up of simple
loads and stores. There is no performance destroying StoreLoad. Reader
threads will wait for the writer thread to produce the more recent data. The
writer is not dependant on the forward progress of the readers. The readers
only depend on the writer thread to constantly produce data. The more
frequent the data production, the faster the reaction time.
How would a traditional read write mutex design perform better in such a
scenario?
Actually, I got that ass backwards!!!!!!!! the writer is dependant on the
reader. Sorry for that non-sense!
This version of it would only be good for scenarios in which threads were
frequently acquiring/releasing the mutex in a loop.
Oh well. Anyway, the original version of the algorithm works good as a
barrier when implemented as:
http://groups.google.com/group/comp.arch/msg/1deabd28115f2f3e
;^)
The barrier variant works okay when one thread needs to spawn N jobs, do
some other work, then wait for all spawned tasks to complete and retrieve a
result.
MAIN_THREAD = 0,
OTHER_THREAD = 1
would probably be better, then for a static one it would be properly
initted by default.