simple, distributed spin-wait mutual exclusion w/o StoreLoad membar...

Chris M. Thomasson

unread,

Nov 18, 2009, 2:34:26 AM11/18/09

to

Here is the algorithm and some example uses modeled in the excellent Relacy
Race Detector 2.0:

http://relacy.pastebin.com/f40769514

http://groups.google.com/group/relacy
(you can obtain a copy of Relacy here...)

Here is some pseudo-code of a greatly simplified version of algorithm which
can only be used to synchronizes two threads:

<pseudo-code typed in news reader>
__________________________________________________________________

// IMPORTANT NOTE: the "main thread owns" the mutex on creation...
class two_thread_mutex
{
enum constant
{
OTHER_THREAD = 0,
MAIN_THREAD = 1
};

private:
atomic_word m_state; // = MAIN_THREAD

public:
void other_thread_acquire()
{
while (ATOMIC_LOAD(&m_state) == MAIN_THREAD)
sched_yield();

MEMBAR #LoadStore | #LoadLoad;
}

void other_thread_release()
{
MEMBAR #LoadStore | #StoreStore;

ATOMIC_STORE(&m_state, MAIN_THREAD);
}

public:
void main_thread_acquire()
{
while (ATOMIC_LOAD(&m_state) == OTHER_THREAD)
sched_yield();

MEMBAR #LoadStore | #LoadLoad;
}

void main_thread_release()
{
MEMBAR #LoadStore | #StoreStore;

ATOMIC_STORE(&m_state, OTHER_THREAD);
}
};

// example usage
static int g_shared = 0;
static barrier g_barrier;

void other_thread()
{
for (;;)
{
g_barrier.other_thread_acquire();
++g_shared;
g_barrier.other_thread_release();
sched_yield();
}
}

void main_thread()
{
// since the main thread has access we
// already own the `g_shared' variable

++g_shared;
g_barrier.main_thread_release();

for (;;)
{
sched_yield();
g_barrier.main_thread_acquire();
++g_shared;
g_barrier.main_thread_release();
}
}
__________________________________________________________________

This has got to be more efficient than traditional StoreLoad based mutual
exclusion implementations... However, it's not quite the same as traditional
mutual exclusion. Each time the other thread releases ownership, it
automatically passes ownership the main thread. And vise-versa. This means
that each thread is "dependant" on the others actions when they attempt to
acquire ownership. BTW, this algorithm is easily distributed and can be used
to create StoreLoad membar free read-write synchronization. This may be
actually useful in certain situations...

Any thoughts? Can you spot any holes in it?

Thanks.

David Schwartz

unread,

Nov 18, 2009, 3:33:34 AM11/18/09

to

On Nov 17, 11:34 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> This has got to be more efficient than traditional StoreLoad based mutual
> exclusion implementations... However, it's not quite the same as traditional
> mutual exclusion. Each time the other thread releases ownership, it
> automatically passes ownership the main thread. And vise-versa. This means
> that each thread is "dependant" on the others actions when they attempt to
> acquire ownership. BTW, this algorithm is easily distributed and can be used
> to create StoreLoad membar free read-write synchronization. This may be
> actually useful in certain situations...
>
> Any thoughts? Can you spot any holes in it?

What's the sense in a mutex that always forces the worst possible
behavior? Strict alternation is the worst conceivable mutex access
pattern.

Can you think of any situation in which this makes any sense?

DS

Chris M. Thomasson

unread,

Nov 18, 2009, 4:12:03 AM11/18/09

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:79c9dca6-ee07-490e...@o9g2000prg.googlegroups.com...

Agreed. However, getting rid of the expensive membar is defiantly a plus.

I was originally using a variant of it as a barrier wait algorithm for N
threads and a single "sync" thread. Here is quick example:

http://groups.google.com/group/comp.arch/msg/1deabd28115f2f3e

But that's basically reversed logic in the sense that the worker threads are
the default owner. This allows for barrier semantics. The main thread can
wait until each worker thread reaches the barrier; perform an operation on
shared state; then release all the workers from the barrier. Worker threads
can wait at the barrier and only be released when all other workers have
arrived and have been released by the main sync thread.

> Can you think of any situation in which this makes any sense?

perhaps something like:

// this would be an example of a read-write syncronization using the
primtive:
_______________________________________________________________
#define READERS 4

static barrier<READERS> g_barrier;

void reader_threads(size_t this_barrier_idx)
{
for (;;)
{
g_barrier.acquire(this_barrier_idx);

// read sensor data

g_barrier.release(this_barrier_idx);

// peform complex processing
}
}

void writer_thread()
{
// gather data from various sensor arrays
// which are frequently producing new information

// update sensor data

g_barrier.release();

for (;;)
{
// gather data from various sensor arrays
// which are frequently producing new information

g_barrier.acquire();

// update sensor data

g_barrier.release();
}
}
_______________________________________________________________

The lock accesses in the frequently executed loop are made up of simple
loads and stores. There is no performance destroying StoreLoad. Reader
threads will wait for the writer thread to produce the more recent data. The
writer is not dependant on the forward progress of the readers. The readers
only depend on the writer thread to constantly produce data. The more
frequent the data production, the faster the reaction time.

How would a traditional read write mutex design perform better in such a
scenario?

Chris M. Thomasson

unread,

Nov 18, 2009, 4:44:30 AM11/18/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:2TOMm.17467$ET3....@newsfe17.iad...
[...]

> // this would be an example of a read-write syncronization using the
> primtive:
> _______________________________________________________________

[...]

> _______________________________________________________________
>
> The lock accesses in the frequently executed loop are made up of simple
> loads and stores. There is no performance destroying StoreLoad. Reader
> threads will wait for the writer thread to produce the more recent data.
> The writer is not dependant on the forward progress of the readers. The
> readers only depend on the writer thread to constantly produce data. The
> more frequent the data production, the faster the reaction time.

Actually, I got that ass backwards!!!!!!!! the writer is dependant on the
reader. Sorry for that non-sense!

This version of it would only be good for scenarios in which threads were
frequently acquiring/releasing the mutex in a loop.

Oh well. Anyway, the original version of the algorithm works good as a
barrier when implemented as:

http://groups.google.com/group/comp.arch/msg/1deabd28115f2f3e

Chris M. Thomasson

unread,

Nov 18, 2009, 4:46:32 AM11/18/09

to

Please note that I created this just to show that mutual exclusion without
StoreLoad membars or fancy asymmetric sync schemes can be accomplished. How
useful it is, well, that's debatable to say the least...

;^)

Chris M. Thomasson

unread,

Nov 19, 2009, 7:46:51 AM11/19/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:he0fmt$6b6$1...@aioe.org...

The barrier variant works okay when one thread needs to spawn N jobs, do
some other work, then wait for all spawned tasks to complete and retrieve a
result.

frege

unread,

Nov 20, 2009, 12:42:44 AM11/20/09

to

On Nov 18, 2:34 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> Here is the algorithm and some example uses modeled in the excellent Relacy
> Race Detector 2.0:
>
> http://relacy.pastebin.com/f40769514
>
> http://groups.google.com/group/relacy
> (you can obtain a copy of Relacy here...)
>
> Here is some pseudo-code of a greatly simplified version of algorithm which
> can only be used to synchronizes two threads:
>
> <pseudo-code typed in news reader>
> __________________________________________________________________
>
> // IMPORTANT NOTE: the "main thread owns" the mutex on creation...
> class two_thread_mutex
> {
> enum constant
> {
> OTHER_THREAD = 0,
> MAIN_THREAD = 1
> };
>

MAIN_THREAD = 0,
OTHER_THREAD = 1

would probably be better, then for a static one it would be properly
initted by default.