A decent read/write mutex...

Chris M. Thomasson

unread,

Apr 16, 2017, 1:47:39 AM4/16/17

to

Fwiw, here is some background, if interested please _read_all_:

https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/296471

Okay, I have not yet ported this to C++11 yet, but the fast-pathed
semaphore I posted here happens to be an integral step in the process:

https://groups.google.com/d/topic/comp.lang.c++/60IpFwqbtu8/discussion

If you read all of the data in the links, one can implement it for
themselves. Is there any interest in my upcoming pure c++ std impl of
this rw-mutex? Thanks.

Alf P. Steinbach

unread,

Apr 16, 2017, 2:38:04 AM4/16/17

to

I'm not able to follow (grok) much of what you post, but I think, it's
somewhere on the cutting edge, and when you enjoy doing it then it's
worth doing regardless of interest. We the sheeple will follow later,
presumably. Because there's not much incentive to use these techniques
until they're simple to use for the average Joe Sheep, but the
techniques don't become simple until they've been used for some time,
which is where the specially interested play their rôle. ;-)

Cheers!,

- Alf

Chris M. Thomasson

unread,

Apr 17, 2017, 6:36:52 PM4/17/17

to

On 4/15/2017 11:37 PM, Alf P. Steinbach wrote:
> On 16-Apr-17 7:47 AM, Chris M. Thomasson wrote:
>> Fwiw, here is some background, if interested please _read_all_:
>>
>> https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/296471
>>
>>
>>
>> Okay, I have not yet ported this to C++11 yet, but the fast-pathed
>> semaphore I posted here happens to be an integral step in the process:
>>
>> https://groups.google.com/d/topic/comp.lang.c++/60IpFwqbtu8/discussion
>>
>> If you read all of the data in the links, one can implement it for
>> themselves. Is there any interest in my upcoming pure c++ std impl of
>> this rw-mutex? Thanks.
>
> I'm not able to follow (grok) much of what you post, but I think, it's
> somewhere on the cutting edge, and when you enjoy doing it then it's
> worth doing regardless of interest.

The enjoyment factor is extremely great for me! Thank you for the kind
words Alf. I am thinking about how to bring about the idea of RCU to the
group. So, I thought that showing a decent rw-mutex, and how to use it
first would be prudent. As Bonita basically says, if one cannot use the
basics (e.g., mutex, condvar, rw-mutex, ect...), they cannot be trusted
to use anything else.

> We the sheeple will follow later,
> presumably. Because there's not much incentive to use these techniques
> until they're simple to use for the average Joe Sheep, but the
> techniques don't become simple until they've been used for some time,
> which is where the specially interested play their rôle. ;-)

Well, these type of algorithms are already in use, mainly in operating
systems where the load and possibility for contention in locks is very
great. Audio apps, commercial databases and games are another fertile
ground for these types of exotic algorithms. Of course sharing is bad.
Anytime one can gain an algorithm that does not need sharing, they
should go for that first damn it! Heck, if one needs to use a mutex,
that is sharing. Try to avoid that first. If one cannot, well, why not
try to learn about how C++ provides the ability for code very efficient
solutions to the problems at hand?

Chris M. Thomasson

unread,

Apr 19, 2017, 7:28:50 PM4/19/17

to

One the path to raw C++ that compiles with normal systems.

This is a Relacy test unit! Working wrt my decent rw-mutex, well,
sometimes these exotic atomic and membars can be used to implement your
favorite sync primitive!

C++ Relacy Test Unit
_______________________________
// Unbounded Wait-Free Reader Writer Mutex
// by Chris M. Thomasson Experiment
// In relacy at:
// http://www.1024cores.net/home/relacy-race-detector

#include <relacy/relacy_std.hpp>
#include <cstdio>
#include <climits>

#define mb_relaxed std::memory_order_relaxed
#define mb_consume std::memory_order_consume
#define mb_acquire std::memory_order_acquire
#define mb_release std::memory_order_release
#define mb_acq_rel std::memory_order_acq_rel
#define mb_seq_cst std::memory_order_seq_cst

#define mb_fence(mb) std::atomic_thread_fence(mb)

struct cpp_sem
{
VAR_T(int) m_count;
std::mutex m_mutex;
std::condition_variable m_cond;

cpp_sem(int count = 0) : m_count(count)
{
RL_ASSERT(count > -1);
}

void sub(int c)
{
RL_ASSERT(c > 0);
m_mutex.lock($);
while (VAR(m_count) < c) m_cond.wait(m_mutex, $);
VAR(m_count) -= c;
m_mutex.unlock($);
}

void add(int c)
{
RL_ASSERT(c > 0);
m_mutex.lock($);
VAR(m_count) += c;
m_mutex.unlock($);

if (c < 2)
{
m_cond.notify_one($);
}

else
{
m_cond.notify_all($);
}
}
};

struct ct_rw_mutex
{
std::atomic<long> m_count;
std::atomic<long> m_rdwake;
std::atomic<long> m_wrlockset_state;
cpp_sem m_wrlockset;
cpp_sem m_wrwset;
cpp_sem m_rdwset;

ct_rw_mutex()
: m_count(INT_MAX),
m_rdwake(0),
m_wrlockset_state(1),
m_wrwset(0),
m_rdwset(0)
{

}

~ct_rw_mutex()
{
RL_ASSERT(m_count.load(mb_relaxed) == INT_MAX);
RL_ASSERT(m_rdwake.load(mb_relaxed) == 0);
}

void wrlock()
{
if (m_wrlockset_state.fetch_sub(1, mb_relaxed) < 1)
{
m_wrlockset.sub(1);
}

mb_fence(mb_acquire);

int count = m_count.fetch_add(-INT_MAX, mb_relaxed);

if (count < INT_MAX)
{
if (m_rdwake.fetch_add(INT_MAX - count, mb_relaxed) +
INT_MAX - count)
{
m_wrwset.sub(1);
}
}

mb_fence(mb_seq_cst);
}

void wrunlock()
{
mb_fence(mb_release);

int count = m_count.fetch_add(INT_MAX, mb_relaxed);

if (m_wrlockset_state.fetch_add(1, mb_relaxed) < 0)
{
m_wrlockset.add(1);
}

// Release Readers after mutex unlock,
// Totally Experimental, works in Relacy! ;^)
if (count < 0)
{
m_rdwset.add(-count);
}
}

void rdlock()
{
int count = m_count.fetch_sub(1, mb_relaxed);

if (count < 1)
{
m_rdwset.sub(1);
}

mb_fence(mb_acquire);
}

void rdunlock()
{
mb_fence(mb_release);

int count = m_count.fetch_add(1, mb_relaxed);

if (count < 0)
{
mb_fence(mb_seq_cst);

if (m_rdwake.fetch_sub(1, mb_relaxed) == 1)
{
m_wrwset.add(1);
}
}
}
};

// Single producer consumer spin queue
#define PRODUCERS 3
#define CONSUMERS 3
#define READERS 4

#define THREADS (PRODUCERS + CONSUMERS + READERS)
#define N 3
#define START 4

static_assert(
PRODUCERS == CONSUMERS && N > 0,
"This test wants the number of producers"
"and consumers to be equal:"
"Also, N needs to be greater than zero; damn it!"
);

struct ct_rw_mutex_test
: rl::test_suite<ct_rw_mutex_test, THREADS>
{
ct_rw_mutex g_rw_mutex;
VAR_T(int) g_shared;

ct_rw_mutex_test() : g_shared(START) {}

void after()
{
RL_ASSERT(VAR(g_shared) == START);
}

void thread_producer(unsigned int tidx)
{
//std::printf("thread_producer::%u\n", tidx);

for (unsigned int i = 0; i < N; ++i)
{
g_rw_mutex.wrlock();
++VAR(g_shared);
g_rw_mutex.wrunlock();
}
}

void thread_consumer(unsigned int tidx)
{
//std::printf("thread_consumer::%u\n", tidx);

for (unsigned int i = 0; i < N; ++i)
{
g_rw_mutex.wrlock();
--VAR(g_shared);
g_rw_mutex.wrunlock();
}
}

void thread_reader(unsigned int tidx)
{
//std::printf("thread_reader::%u\n", tidx);

for (unsigned int i = 0; i < N; ++i)
{
g_rw_mutex.rdlock();
int shared = VAR(g_shared);
g_rw_mutex.rdunlock();

// double check our read range!
RL_ASSERT(
shared > (-CONSUMERS * N + START) &&
shared < (PRODUCERS * N + START)
);
}
}

void thread(unsigned int tidx)
{
if (tidx < PRODUCERS)
{
thread_producer(tidx);
}

else if (tidx < PRODUCERS + CONSUMERS)
{
thread_consumer(tidx);
}

else
{
thread_reader(tidx);
}
}
};

int main()
{
{
rl::test_params p;

p.iteration_count = 14000; // for existing proxy gc code
p.execution_depth_limit = 33333;
p.search_type = rl::random_scheduler_type;

//p.search_type = rl::sched_bound;
//p.search_type = rl::fair_full_search_scheduler_type;
//p.search_type = rl::fair_context_bound_scheduler_type;

rl::simulate<ct_rw_mutex_test>(p);
}

std::puts("\nTest Complete!\n");

std::getchar();

return 0;
}
_______________________________

Will have normal std c++ in a day or two. Posted here, for all of the
wonderful contributors and readers of this fine group.

Chris M. Thomasson

unread,

Apr 24, 2017, 5:51:02 PM4/24/17

to

[...]

> _______________________________
>
>
> Will have normal std c++ in a day or two. Posted here, for all of the
> wonderful contributors and readers of this fine group.

Take a look at the wrlock function with that damn annoying seq_cst
barrier. Well, I managed to make it conditional wrt:
____________________________
void wrlock()
{
if (m_wrstate.fetch_sub(1, mb_relaxed) < 1)
{
m_wrlockset.sub(1);

}

int count = m_count.fetch_add(-INT_MAX, mb_relaxed);

if (count < INT_MAX)
{
if (m_rdwake.fetch_add(INT_MAX - count, mb_relaxed) + INT_MAX -
count)
{
m_wrwset.sub(1);
}

mb_fence(mb_seq_cst);
}

else
{
mb_fence(mb_acquire);
}
}
____________________________

Big difference: Especially for x86 platforms!

Yeah!

Chris M. Thomasson

unread,

Apr 25, 2017, 4:51:26 PM4/25/17

to

On 4/15/2017 10:47 PM, Chris M. Thomasson wrote:

Fwiw, I am trying to think of a decent example program to build using my
rw-mutex algorithm. Humm... Perhaps something simple like reader threads
iterating through a shared std::deque, and two classes of writer threads
adding and removing items at random. We can take the total number of
reads performed during the test as an "idea" of the "reader throughput".
Actually, imvho, it would be better to take the total number of reads an
_individual_ reader thread performs. Then make a list like, btw the
numbers are totally randomly typed in here. The actual test would make
real data:

// example output
reader[0] = 213213312 reads
reader[1] = 284575201 reads
reader[2] = 254488774 reads
reader[...] = ... reads

Then I will compare it to a single std::mutex based impl where we all
can see the difference in the number of reads performed. In all of my
experience, a rw-mutex outperforms a single mutex setup. A decent
rw-mutex will allow more reads to be performed per-reader thread.

Once this is completed, I can compare my rw-mutex to another way of
solving the reader-writer problem using something called proxy garbage
collection in pure std c++. The proxy collector will get better numbers
than the rw-mutex.

Any ideas wrt the type of test I should create?

The rw-mutex is ready to be ported to std c++, ready to be taken out of
the Relacy unit-test realm. Getting rid of that damn seq_cst barrier wrt
making it conditional was essential.

:^)

Chris M. Thomasson

unread,

Apr 25, 2017, 5:46:40 PM4/25/17

to

On 4/25/2017 1:51 PM, Chris M. Thomasson wrote:
> On 4/15/2017 10:47 PM, Chris M. Thomasson wrote:
>> Fwiw, here is some background, if interested please _read_all_:
>>
>> https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/296471

[...]

> The rw-mutex is ready to be ported to std c++, ready to be taken out of
> the Relacy unit-test realm. Getting rid of that damn seq_cst barrier wrt
> making it conditional was essential.

Fwiw, the Windows implementation:

http://webpages.charter.net/appcore/misc/rwmutex_win_hpp.html

used the normal InterlockedIncrement:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms683614(v=vs.85).aspx

It is documented as emitting a full barrier, eg.g seq_cst. The old win
code did not use any special ones like InterlockedIncrementAcquire:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms683618(v=vs.85).aspx

This C++11 version is more efficient. Now, the interesting thing is that
we can ifdef out the seq_cst fence on x86 systems because the LOCK
prefix provides the seq_cst relationship by default. The C++11 code on
an x86 will emit an MFENCE or dummy LOCKed atomic op for the seq_cst.
This is not needed on said arch. Now, on a SPARC in say RMO mode, well,
the seq_cst better damn well be there!

;^)

Chris M. Thomasson

unread,

May 13, 2017, 7:11:30 PM5/13/17

to

On 4/15/2017 10:47 PM, Chris M. Thomasson wrote:

Nobody seems interested. Damn it!