Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

an experimental read/write mutex...

213 views
Skip to first unread message

Chris M. Thomasson

unread,
Feb 14, 2019, 2:03:43 AM2/14/19
to
Fwiw, here is an older read/write mutex of mine. I created a little
benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
algorithm beats std::shared_mutex pretty badly. It takes my algorithm
around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
seconds. Wow! What a difference.

Here is the code: Still have to try it out on GCC in c++17 mode.

Can anybody run it?

https://pastebin.com/raw/xCBHY9qd
__________________________________
/* Simple, crude read/write mutex test
by: Chris M. Thomasson
__________________________________________*/



#include <thread>
#include <atomic>
#include <shared_mutex>
#include <condition_variable>
#include <iostream>
#include <functional>
#include <cassert>
#include <cstdlib>
#include <ctime>


#define THREADS 16UL
#define ITERS 10000000UL
#define COUNT (THREADS * ITERS)


// undefine to test std::shared_mutex
#define CT_TEST_FAST_MUTEX 1


// bare bones mutex/condvar based semaphore
struct ct_slow_semaphore
{
unsigned long m_state;
std::mutex m_mutex;
std::condition_variable m_cond;

ct_slow_semaphore(unsigned long state) : m_state(state) {}

void inc()
{
{
std::unique_lock<std::mutex> lock(m_mutex);
++m_state;
}

m_cond.notify_one();
}

void add(unsigned long addend)
{
{
std::unique_lock<std::mutex> lock(m_mutex);
m_state += addend;
}

m_cond.notify_all();
}

void dec()
{
std::unique_lock<std::mutex> lock(m_mutex);
while (m_state == 0) m_cond.wait(lock);
--m_state;
}
};




// bin-sema
struct ct_auto_reset_event
{
bool m_state;
std::mutex m_mutex;
std::condition_variable m_cond;

ct_auto_reset_event() : m_state(false) {}

void signal()
{
std::unique_lock<std::mutex> lock(m_mutex);
m_state = true;
m_cond.notify_one();
}

void wait()
{
std::unique_lock<std::mutex> lock(m_mutex);
while (m_state == false) m_cond.wait(lock);
m_state = false; // auto-reset
}
};


// just a layer over an auto-reset event
struct ct_fast_mutex
{
std::atomic<unsigned int> m_state;
ct_auto_reset_event m_waitset;

ct_fast_mutex() : m_state(0) {}

void lock()
{
if (m_state.exchange(1, std::memory_order_acquire))
{
while (m_state.exchange(2, std::memory_order_acquire))
{
m_waitset.wait();
}
}
}

void unlock()
{
if (m_state.exchange(0, std::memory_order_release) == 2)
{
m_waitset.signal();
}
}
};



// Chris M. Thomassons Experimental Read/Write Mutex
// Yeah, it is pretty damn fat wrt the state, however
// it has some interesting properties...
// The state can be compressed a bit...
// btw, it has no loops...
// Take a look at the lock_shared and unlock_shared functions

#define RWMUTEX_COUNT_MAX LONG_MAX

struct ct_rwmutex
{
// shared state
std::atomic<long> m_wrstate;
std::atomic<long> m_count;
std::atomic<long> m_rdwake;

ct_slow_semaphore m_rdwset;
ct_slow_semaphore m_wrwset;
ct_fast_mutex m_wrlock;


ct_rwmutex() :
m_wrstate(1),
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}


// READ, pretty slim...
void lock_shared()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
{
m_rdwset.dec();
}
}

void unlock_shared()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
{
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
{
m_wrwset.inc();
}
}
}


// WRITE, more hefty
void lock()
{
m_wrlock.lock();

long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
std::memory_order_acquire);

if (count < RWMUTEX_COUNT_MAX)
{
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
std::memory_order_acquire);

if (rdwake + RWMUTEX_COUNT_MAX - count)
{
m_wrwset.dec();
}
}
}

// write unlock
void unlock()
{
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
std::memory_order_release);

if (count < 0)
{
m_rdwset.add(-count);
}

m_wrlock.unlock();
}
};


struct ct_shared
{
std::atomic<unsigned long> m_state;

#if defined (CT_TEST_FAST_MUTEX)
ct_rwmutex m_std_rwmutex;
#else
std::shared_mutex m_std_rwmutex;
#endif

ct_shared() : m_state(0) {}
};


void ct_thread(ct_shared& shared, std::size_t index)
{
for (unsigned int i = 0; i < ITERS; ++i)
{

shared.m_std_rwmutex.lock();
if (i % 256 == 0) std::this_thread::yield();
shared.m_state += 1;
shared.m_std_rwmutex.unlock();


shared.m_std_rwmutex.lock_shared();
if (i % 512 == 0) std::this_thread::yield();
//shared.m_state += 1;
shared.m_std_rwmutex.unlock_shared();

}
}


int main()
{
ct_shared shared;

{
std::thread threads[THREADS];

std::clock_t start = std::clock();

for (std::size_t i = 0; i < THREADS; ++i)
{
threads[i] = std::thread(ct_thread, std::ref(shared), i);
}

for (std::size_t i = 0; i < THREADS; ++i)
{
threads[i].join();
}

std::clock_t diff = clock() - start;

unsigned long msec = diff * 1000 / CLOCKS_PER_SEC;

std::cout << "msec = " << msec << "\n";
}

std::cout << "shared.m_state = " << shared.m_state << "\n";
std::cout << "\n\nFin!\n";

assert(shared.m_state == COUNT);

return 0;
}
__________________________________

I will explain the algorithm in further detail when I get some more
time. Probably tomorrow.

Chris M. Thomasson

unread,
Feb 14, 2019, 2:34:25 AM2/14/19
to
On 2/13/2019 11:03 PM, Chris M. Thomasson wrote:
> Fwiw, here is an older read/write mutex of mine. I created a little
> benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
> algorithm beats std::shared_mutex pretty badly. It takes my algorithm
> around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
> seconds. Wow! What a difference.
>
> Here is the code: Still have to try it out on GCC in c++17 mode.
>
> Can anybody run it?
[...]

Fwiw, I just tested it using GCC 7.3 in C++17 mode. My algorithm is
beating std::shared_mutex, big time. Around 30 seconds for mine, and
about 112 seconds for std::shared_mutex. Wow. These results are
surprising, and a bit disappointing. Why is std::shared_mutex so slow on
both GCC and MSVC on Windows?

Paavo Helde

unread,
Feb 14, 2019, 6:12:47 AM2/14/19
to
On 14.02.2019 9:03, Chris M. Thomasson wrote:
> Fwiw, here is an older read/write mutex of mine. I created a little
> benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
> algorithm beats std::shared_mutex pretty badly. It takes my algorithm
> around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
> seconds. Wow! What a difference.

Cannot repeat your results. On my laptop std::shared_mutex is
consistently faster:

std::shared_mutex:
msec = 14028
shared.m_state = 160000000


ct_fast_mutex:
msec = 18495
shared.m_state = 160000000

x64 Release build
Microsoft Visual Studio Professional 2017 Version 15.8.9
Processor: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz, 2701 Mhz, 2
Core(s), 4 Logical Processor(s)


mvor...@gmail.com

unread,
Feb 14, 2019, 9:52:01 AM2/14/19
to
Chris, I rewrote it to this; let me know if it's still good :)



class rw_fast_mutex
{
public:
rw_fast_mutex()
: m_wrstate(1), m_count(INT_MAX), m_rdwake(0),
m_rdwset(0), m_wrwset(0), m_wrmtx(0) {}

void read_lock()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
m_rdwset.wait();
}

void read_unlock()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
m_wrwset.post();
}

void write_lock()
{
if (m_wrstate.fetch_sub(1, std::memory_order_acquire) < 1)
m_wrmtx.wait();
int count = m_count.fetch_add(-INT_MAX, std::memory_order_acquire);
if (count < INT_MAX)
{
int rdwake = m_rdwake.fetch_add(INT_MAX - count, std::memory_order_acquire);
if (rdwake + INT_MAX - count)
m_wrwset.wait();
}
}

void write_unlock()
{
int count = m_count.fetch_add(INT_MAX, std::memory_order_release);
if (count < 0)
m_rdwset.add(-count);
if (m_wrstate.fetch_add(1, std::memory_order_release) < 0)
m_wrmtx.post();
}

private:
std::atomic<int> m_wrstate;
std::atomic<int> m_count;
std::atomic<int> m_rdwake;

semaphore m_rdwset;
semaphore m_wrwset;
semaphore m_wrmtx;
};

mvor...@gmail.com

unread,
Feb 14, 2019, 12:47:40 PM2/14/19
to
On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson wrote:
here it is: http://blog.vorbrodt.me/?p=680 :)

Chris M. Thomasson

unread,
Feb 14, 2019, 3:34:04 PM2/14/19
to
Strange. I am using the latest version MSVC 2017 express. They must be
using something different in the professional version. When you get some
time, can you explore what std::shared_mutex is using on the
professional version? Also, give GCC in c++17 mode a test.

Chris M. Thomasson

unread,
Feb 14, 2019, 3:39:10 PM2/14/19
to
I cannot get std::shared_mutex to beat my "fast" rwlock on MSVC express
or GCC 7.3 in c++17 mode. Humm... Something odd is going on... Time to
engage my think cap. ;^)

Chris M. Thomasson

unread,
Feb 14, 2019, 3:57:20 PM2/14/19
to
On 2/14/2019 9:47 AM, mvor...@gmail.com wrote:
> On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson wrote:
>> Fwiw, here is an older read/write mutex of mine. I created a little
>> benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
>> algorithm beats std::shared_mutex pretty badly. It takes my algorithm
>> around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
>> seconds. Wow! What a difference.
>>
>> Here is the code: Still have to try it out on GCC in c++17 mode.
>>
>> Can anybody run it?
>>
>> https://pastebin.com/raw/xCBHY9qd
[...]
>> __________________________________
>>
>> I will explain the algorithm in further detail when I get some more
>> time. Probably tomorrow.
>
> here it is: http://blog.vorbrodt.me/?p=680 :)

Need to examine it closer, but looks okay at first glance. Now, the
version I included wrt the benckmark code is a little "different" than
the first version I showed you. See if you can tell the difference:

https://pastebin.com/raw/xCBHY9qd

Both of them work, and have different ways to gain write access. One is
using a "fast" semaphore, one is using the "fast" mutex.

Btw, std::shared_mutex should always beat this because it should be
using the best. I am having trouble understanding why I cannot get
std::shared_mutex to beat it using my GCC 7.3 and MSVC 2017 (community
edition).

Can you run the benchmark and give some results?


https://pastebin.com/raw/xCBHY9qd

Actually, I need to make the output on my benchmark more informative.
Right now, it does not tell the user what algorithm is being profiled,
just gives a time in milliseconds.

Argh.

Melzzzzz

unread,
Feb 14, 2019, 4:00:29 PM2/14/19
to
On 2019-02-14, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
> Fwiw, here is an older read/write mutex of mine. I created a little
> benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
> algorithm beats std::shared_mutex pretty badly. It takes my algorithm
> around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
> seconds. Wow! What a difference.
>
> Here is the code: Still have to try it out on GCC in c++17 mode.
>
> Can anybody run it?
>
> https://pastebin.com/raw/xCBHY9qd

Your mutex is significantly faster against g++ implementation on Linux
system. I have g++,clang and icpc but they all use libstdc++ so results
are same.

>


--
press any key to continue or any other to quit...

Melzzzzz

unread,
Feb 14, 2019, 4:11:48 PM2/14/19
to
On 2019-02-14, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
yours:
~/.../bmaxa_data/examples >>> time ./beat1
msec = 70948
shared.m_state = 160000000


Fin!
shared:
./beat1 50.12s user 20.83s system 397% cpu 17.851 total
~/.../bmaxa_data/examples >>> vim beat1.cpp
~/.../bmaxa_data/examples >>> icpc -std=c++17 -Wall -O3 -march=native -pthread beat1.cpp -o beat1 -lstdc++
~/.../bmaxa_data/examples >>> time ./beat1
msec = 284038
shared.m_state = 160000000


Fin!
./beat1 174.65s user 109.39s system 385% cpu 1:13.68 total
~/.../bmaxa_data/examples >>> icpc -v
icpc version 19.0.0.120 (gcc version 8.2.1 compatibility)

Chris M. Thomasson

unread,
Feb 14, 2019, 4:30:27 PM2/14/19
to
Thank you so much for your time and effort. I really do appreciate it.
Now, this is basically what I am experiencing wrt GCC 7.3 and MSVC 2017
Community on my end in Windoze land. My algorithm is beating the
standard implementations. Have not tested it in MSVC 2017 Professional.
Ahhh... Perhaps I need to test it directly against:

https://docs.microsoft.com/en-us/windows/desktop/sync/slim-reader-writer--srw--locks

I assume std::shared_mutex on Windows maps to this. Ahh shi%, this means
including "windows.h", better define WIN32_LEAN_AND_MEAN... ;^)

Btw, my algorithm can be implemented in a much more efficient manner. I
need to pad the data-structures up to L2 cache lines, and properly align
them on the boundaries. This will increase the performance by avoiding
false-sharing in critical conditions.

Thanks again Melzzzzz. :^)

Paavo Helde

unread,
Feb 14, 2019, 5:08:11 PM2/14/19
to
std::shared_mutex appears to call AcquireSRWLockShared() Windows SDK
function et al.

Don't have a suitable gcc at hand at the moment, maybe later.



Chris M. Thomasson

unread,
Feb 14, 2019, 5:09:54 PM2/14/19
to
This would really help readers if the m_count was on its own cache line.
Notice how the "fast", or non-blocking, path of the readers do not need
to touch anything else but m_count:
___________________________
void lock_shared()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
{
m_rdwset.dec();
}
}

void unlock_shared()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
{
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
{
m_wrwset.inc();
}
}
}
___________________________


Yeah, an implementation wrt proper padding and alignment of data would
help out quite a bit. m_count in particular.


>
> Thanks again Melzzzzz. :^)

Chris M. Thomasson

unread,
Feb 14, 2019, 5:12:38 PM2/14/19
to
Perfect. Okay, so I need to examine what the std::shared_mutex on MSVC
2017 community does. Or, just bite the bullet and test against the SRW
directly. But still, GCC and MSVC should both be using the best damn
thing they can. They should beat my rwmutex algorithm. I cannot do it on
my system, however will test directly against SRW wrt windows.h,
cough... ;^)

Ian Collins

unread,
Feb 14, 2019, 5:14:05 PM2/14/19
to
Definitely significantly faster!
$ g++ -Ofast -std=c++17 x.cc -lpthread
$ ./a.out
msec = 73294
shared.m_state = 160000000


Without CT_TEST_FAST_MUTEX

$ g++ -Ofast -std=c++17 x.cc -lpthread
ian@ryzen:/tmp$ ./a.out
msec = 309352
shared.m_state = 160000000

$ g++ --version
g++ (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0

--
Ian.

Chris M. Thomasson

unread,
Feb 14, 2019, 6:34:13 PM2/14/19
to
Okay. I just tested it against a raw SRW Windows lock in the following code:

https://pastebin.com/raw/HgYzCDSQ
(please try it...)

My algorithm still beats it very badly. Mine takes around 34 seconds
ish, while Windows 10 SRW takes around 122 ish seconds.

My algorithm kind of slaughters it.

Chris M. Thomasson

unread,
Feb 14, 2019, 6:41:53 PM2/14/19
to
Thanks Ian. My algorithm also "destroys" Windows SRW on my end. Fwiw
here is the code, _beware_ it needs windows.h:

https://pastebin.com/raw/HgYzCDSQ

Wow. Perhaps, my experimental read/write mutex is actually worth
something. :^)

Chris M. Thomasson

unread,
Feb 14, 2019, 10:44:20 PM2/14/19
to
On 2/14/2019 12:57 PM, Chris M. Thomasson wrote:
> On 2/14/2019 9:47 AM, mvor...@gmail.com wrote:
>> On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson
>> wrote:
>>> Fwiw, here is an older read/write mutex of mine. I created a little
>>> benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
>>> algorithm beats std::shared_mutex pretty badly. It takes my algorithm
>>> around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
>>> seconds. Wow! What a difference.
>>>
>>> Here is the code: Still have to try it out on GCC in c++17 mode.
>>>
>>> Can anybody run it?
>>>
>>> https://pastebin.com/raw/xCBHY9qd
> [...]
>>> __________________________________
>>>
>>> I will explain the algorithm in further detail when I get some more
>>> time. Probably tomorrow.
>>
>> here it is: http://blog.vorbrodt.me/?p=680 :)
>
> Need to examine it closer, but looks okay at first glance. Now, the
> version I included wrt the benckmark code is a little "different" than
> the first version I showed you. See if you can tell the difference:
>
> https://pastebin.com/raw/xCBHY9qd
>
> Both of them work, and have different ways to gain write access. One is
> using a "fast" semaphore, one is using the "fast" mutex.

Wrt your blog, you should put up both versions for clarity. Add the one
from my crude benchmark code:

https://pastebin.com/raw/xCBHY9qd

It is performing well against std::shared_mutex. Also, it should
outperform the other one that you currently have on there.

The difference is in the way write access is acquire and released, take
a close look.

Chris M. Thomasson

unread,
Feb 14, 2019, 10:45:06 PM2/14/19
to
On 2/14/2019 1:11 PM, Melzzzzz wrote:
> On 2019-02-14, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>> On 2/14/2019 9:47 AM, mvor...@gmail.com wrote:
>>> On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson wrote:
[...]
>> Actually, I need to make the output on my benchmark more informative.
>> Right now, it does not tell the user what algorithm is being profiled,
>> just gives a time in milliseconds.
>>
>> Argh.
> yours:
> ~/.../bmaxa_data/examples >>> time ./beat1
> msec = 70948
> shared.m_state = 160000000
>
>
> Fin!
> shared:
> ./beat1 50.12s user 20.83s system 397% cpu 17.851 total
> ~/.../bmaxa_data/examples >>> vim beat1.cpp
> ~/.../bmaxa_data/examples >>> icpc -std=c++17 -Wall -O3 -march=native -pthread beat1.cpp -o beat1 -lstdc++
> ~/.../bmaxa_data/examples >>> time ./beat1
> msec = 284038
> shared.m_state = 160000000
>
>
> Fin!
> ./beat1 174.65s user 109.39s system 385% cpu 1:13.68 total
> ~/.../bmaxa_data/examples >>> icpc -v
> icpc version 19.0.0.120 (gcc version 8.2.1 compatibility)
>
>

Thanks Melzzzzz.

Melzzzzz

unread,
Feb 15, 2019, 12:35:47 AM2/15/19
to
Hm, when using clang's libc++ shared_mutex case won't finish even after
17 minutes ;)

Paavo Helde

unread,
Feb 15, 2019, 12:40:07 AM2/15/19
to
On 15.02.2019 1:34, Chris M. Thomasson wrote:
>
> My algorithm still beats it very badly. Mine takes around 34 seconds
> ish, while Windows 10 SRW takes around 122 ish seconds.

These numbers looks so bad like they were from a build with no
optimizations, but you wouldn't ever measure performance on a
non-optimized build, would you.

On my machine both times are below 20 seconds.

Chris M. Thomasson

unread,
Feb 15, 2019, 4:00:22 PM2/15/19
to
Really!?!?!, seriously, WTF! Did it just deadlock? Is there any CPU
activity on the process during this 17 minutes? My algorithm works fine,
and was verified with Relacy Race Detector multiple times.

Chris M. Thomasson

unread,
Feb 15, 2019, 4:09:16 PM2/15/19
to
Just MSVC release mode with /Ox. So far, I cannot get shared_mutex or
SRW to beat my code, on my machine.

Melzzzzz

unread,
Feb 15, 2019, 4:15:26 PM2/15/19
to
There is CPU activity but look:
~/.../bmaxa_data/examples >>> clang -stdlib=libc++ -std=c++17 -Wall -O3 -march=native -pthread beat.cpp -o beat -lc++
~/.../bmaxa_data/examples >>> time ./beat
msec = 1159
shared.m_state = 1600000


Fin!
./beat 0.87s user 0.29s system 470% cpu 0.247 total
~/.../bmaxa_data/examples >>> vim beat.cpp
~/.../bmaxa_data/examples >>> clang -stdlib=libc++ -std=c++17 -Wall -O3 -march=native -pthread beat.cpp -o beat -lc++
~/.../bmaxa_data/examples >>> time ./beat
msec = 53918
shared.m_state = 1600000


Fin!
./beat 29.97s user 23.95s system 430% cpu 12.534 total

Chris M. Thomasson

unread,
Feb 15, 2019, 4:30:06 PM2/15/19
to
Wait a minute. Did mine win our lose here? I am not sure what the
timings for:

msec = 1159

and

msec = 53918

map to. This is my fault because I am not showing the user what test
they are running! Bad. It is not showing the user if CT_TEST_FAST_MUTEX
is defined or not. Sorry. ;^o

Chris M. Thomasson

unread,
Feb 15, 2019, 4:31:34 PM2/15/19
to
[...]

^^^^^^^^^^^^^^

Ahhh, you reduced the number of iterations. Nice! Okay.

Melzzzzz

unread,
Feb 15, 2019, 4:38:05 PM2/15/19
to
Yours is 1159.

>
> msec = 1159
>
> and
>
> msec = 53918
>
> map to. This is my fault because I am not showing the user what test
> they are running! Bad. It is not showing the user if CT_TEST_FAST_MUTEX
> is defined or not. Sorry. ;^o


Chris M. Thomasson

unread,
Feb 16, 2019, 6:13:43 PM2/16/19
to
On 2/13/2019 11:03 PM, Chris M. Thomasson wrote:
> Fwiw, here is an older read/write mutex of mine. I created a little
> benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
> algorithm beats std::shared_mutex pretty badly. It takes my algorithm
> around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
> seconds. Wow! What a difference.
>
> Here is the code: Still have to try it out on GCC in c++17 mode.
[...]

Fwiw, I exposed this to the $#%# nest of reddit:

https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/

Also, I edited my code to actually show the user what test they are
actually running!

https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/

The only change is an addition of some "helpful" code is in main:
________________________
int main()
{
ct_shared shared;

std::cout << "Testing: ";

#if defined (CT_TEST_FAST_MUTEX)
std::cout << "Chris M. Thomasson's Experimental Read/Write Mutex\n\n";
#else
std::cout << "std::shared_mutex\n\n";
#endif

{
std::thread threads[THREADS];

std::clock_t start = std::clock();

for (std::size_t i = 0; i < THREADS; ++i)
{
threads[i] = std::thread(ct_thread, std::ref(shared), i);
}

for (std::size_t i = 0; i < THREADS; ++i)
{
threads[i].join();
}

std::clock_t diff = clock() - start;

unsigned long msec = diff * 1000 / CLOCKS_PER_SEC;

std::cout << "msec = " << msec << "\n";
}

std::cout << "shared.m_state = " << shared.m_state << "\n";
std::cout << "\n\nFin!\n";

assert(shared.m_state == COUNT);

return 0;
}
________________________

So, now one can see what the hell they are doing!

Chris M. Thomasson

unread,
Feb 16, 2019, 6:17:04 PM2/16/19
to
On 2/15/2019 1:37 PM, Melzzzzz wrote:
> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>> On 2/15/2019 1:15 PM, Melzzzzz wrote:
>>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>> On 2/14/2019 9:35 PM, Melzzzzz wrote:
>>>>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>>> On 2/14/2019 1:11 PM, Melzzzzz wrote:
>>>>>>> On 2019-02-14, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>>>>> On 2/14/2019 9:47 AM, mvor...@gmail.com wrote:
>>>>>>>>> On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson wrote:
[...]
>> Wait a minute. Did mine win our lose here? I am not sure what the
>> timings for:
>
> Yours is 1159.

Yeee-haw! :^)

I do not want to get ahead of myself, but I wonder if perhaps the Linux
guys just might be interested in my work.

Melzzzzz

unread,
Feb 16, 2019, 7:16:39 PM2/16/19
to
On 2019-02-16, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
> On 2/15/2019 1:37 PM, Melzzzzz wrote:
>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>> On 2/15/2019 1:15 PM, Melzzzzz wrote:
>>>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>> On 2/14/2019 9:35 PM, Melzzzzz wrote:
>>>>>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>>>> On 2/14/2019 1:11 PM, Melzzzzz wrote:
>>>>>>>> On 2019-02-14, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>>>>>> On 2/14/2019 9:47 AM, mvor...@gmail.com wrote:
>>>>>>>>>> On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson wrote:
> [...]
>>> Wait a minute. Did mine win our lose here? I am not sure what the
>>> timings for:
>>
>> Yours is 1159.
>
> Yeee-haw! :^)
>
> I do not want to get ahead of myself, but I wonder if perhaps the Linux
> guys just might be interested in my work.

I think that libc++ and stdlibc++ guys might. That is not Linux only,
rather two major free C++ compilers.

Chris M. Thomasson

unread,
Feb 16, 2019, 7:32:26 PM2/16/19
to
On 2/16/2019 4:16 PM, Melzzzzz wrote:
> On 2019-02-16, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>> On 2/15/2019 1:37 PM, Melzzzzz wrote:
>>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>> On 2/15/2019 1:15 PM, Melzzzzz wrote:
>>>>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>>> On 2/14/2019 9:35 PM, Melzzzzz wrote:
>>>>>>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>>>>> On 2/14/2019 1:11 PM, Melzzzzz wrote:
>>>>>>>>> On 2019-02-14, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>>>>>>> On 2/14/2019 9:47 AM, mvor...@gmail.com wrote:
>>>>>>>>>>> On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson wrote:
>> [...]
>>>> Wait a minute. Did mine win our lose here? I am not sure what the
>>>> timings for:
>>>
>>> Yours is 1159.
>>
>> Yeee-haw! :^)
>>
>> I do not want to get ahead of myself, but I wonder if perhaps the Linux
>> guys just might be interested in my work.
>
> I think that libc++ and stdlibc++ guys might.

Not exactly sure how to present to them. Any insight?


> That is not Linux only,
> rather two major free C++ compilers.
>
>

They might be interested in building a read/write mutex without using
compare-and-swap, if anything at all. They can use futex without CAS,
much better, no explicit looping. Fwiw, Reddit is a shi% storm, troll
galore, so I managed to remember where I mentioned my invention back in
2009 over on an Intel message board:

https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/296471

Some of the links are dead, back in 2009... However, this one still works:

http://pastebin.com/f3d6140eb

My algorithm is comprised of simple increments, decrements and addition,
no CAS, no no loops in the actual struct ct_rwmutex code. Btw, being
finally able to port it over to 100% pure C++11 is very nice! Back in
the day, I had to create my own asm code for different architectures wrt
the atomics, membars and compiler barriers. ahhh, the good ol' days... ;^)

Melzzzzz

unread,
Feb 17, 2019, 2:50:57 AM2/17/19
to
On 2019-02-17, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
> On 2/16/2019 4:16 PM, Melzzzzz wrote:
>> On 2019-02-16, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>> On 2/15/2019 1:37 PM, Melzzzzz wrote:
>>>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>> On 2/15/2019 1:15 PM, Melzzzzz wrote:
>>>>>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>>>> On 2/14/2019 9:35 PM, Melzzzzz wrote:
>>>>>>>> On 2019-02-15, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>>>>>> On 2/14/2019 1:11 PM, Melzzzzz wrote:
>>>>>>>>>> On 2019-02-14, Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>>>>>>>>> On 2/14/2019 9:47 AM, mvor...@gmail.com wrote:
>>>>>>>>>>>> On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson wrote:
>>> [...]
>>>>> Wait a minute. Did mine win our lose here? I am not sure what the
>>>>> timings for:
>>>>
>>>> Yours is 1159.
>>>
>>> Yeee-haw! :^)
>>>
>>> I do not want to get ahead of myself, but I wonder if perhaps the Linux
>>> guys just might be interested in my work.
>>
>> I think that libc++ and stdlibc++ guys might.
>
> Not exactly sure how to present to them. Any insight?

For libstdc++ place to start is: https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html
for libc++ : https://llvm.org/docs/Phabricator.html

Chris M. Thomasson

unread,
Feb 17, 2019, 5:55:47 PM2/17/19
to
On 2/14/2019 9:39 PM, Paavo Helde wrote:
You might be interested in this: Some timings from Dmitry Vyukov who
works at Google. They are more inline with yours:

https://groups.google.com/d/msg/lock-free/zzZX4fvtG04/39IVBrrJAwAJ

https://groups.google.com/d/topic/lock-free/zzZX4fvtG04/discussion
(read all...)



Chris M. Thomasson

unread,
Feb 17, 2019, 5:56:37 PM2/17/19
to
Btw, I know Dmitry from many years ago. He is _very_ smart, and a nice
friend to have.

Chris M. Thomasson

unread,
Feb 17, 2019, 6:06:30 PM2/17/19
to
On 2/13/2019 11:03 PM, Chris M. Thomasson wrote:
> Fwiw, here is an older read/write mutex of mine. I created a little
> benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
> algorithm beats std::shared_mutex pretty badly. It takes my algorithm
> around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
> seconds. Wow! What a difference.
>
> Here is the code: Still have to try it out on GCC in c++17 mode.
>
> Can anybody run it?
>
> https://pastebin.com/raw/xCBHY9qd
> __________________________________
[...]
> __________________________________
>
> I will explain the algorithm in further detail when I get some more
> time. Probably tomorrow.

I need to code up a different type of benchmark. One that measures how
many reads and writes per-second, per-thread can be performed on a
shared data-structure, like a list or something.

Readers would iterate the whole list.

Writers would push and pop items from the list.

How many reads and writes can be completed per-second, per-thread if the
test was run for a fixed amount of time...

Chris M. Thomasson

unread,
Feb 18, 2019, 1:21:45 AM2/18/19
to
Before that, I need to create another type of benchmark. One that
measures reads/writes per-second, per-thread wrt readers iterating
through a shared data-structure. Writers would push and pop from it. The
algorithm that allows for the most reads (primarily), or writes per
second per thread wins. Looking forward to seeing your results, if you
decide to run the new code. Should be up sometime in a day or two. :^)

mvor...@gmail.com

unread,
Feb 18, 2019, 2:35:04 AM2/18/19
to
On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson wrote:
> Fwiw, here is an older read/write mutex of mine. I created a little
> benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
> algorithm beats std::shared_mutex pretty badly. It takes my algorithm
> around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
> seconds. Wow! What a difference.
>
> Here is the code: Still have to try it out on GCC in c++17 mode.
>
> Can anybody run it?
>
> https://pastebin.com/raw/xCBHY9qd
> __________________________________
> /* Simple, crude read/write mutex test
> by: Chris M. Thomasson
> __________________________________________*/
>
>
>
> #include <thread>
> #include <atomic>
> #include <shared_mutex>
> #include <condition_variable>
> #include <iostream>
> #include <functional>
> #include <cassert>
> #include <cstdlib>
> #include <ctime>
>
>
> #define THREADS 16UL
> #define ITERS 10000000UL
> #define COUNT (THREADS * ITERS)
>
>
> // undefine to test std::shared_mutex
> #define CT_TEST_FAST_MUTEX 1
>
>
> // bare bones mutex/condvar based semaphore
> struct ct_slow_semaphore
> {
> unsigned long m_state;
> std::mutex m_mutex;
> std::condition_variable m_cond;
>
> ct_slow_semaphore(unsigned long state) : m_state(state) {}
>
> void inc()
> {
> {
> std::unique_lock<std::mutex> lock(m_mutex);
> ++m_state;
> }
>
> m_cond.notify_one();
> }
>
> void add(unsigned long addend)
> {
> {
> std::unique_lock<std::mutex> lock(m_mutex);
> m_state += addend;
> }
>
> m_cond.notify_all();
> }
>
> void dec()
> {
> std::unique_lock<std::mutex> lock(m_mutex);
> while (m_state == 0) m_cond.wait(lock);
> --m_state;
> }
> };
>
>
>
>
> // bin-sema
> struct ct_auto_reset_event
> {
> bool m_state;
> std::mutex m_mutex;
> std::condition_variable m_cond;
>
> ct_auto_reset_event() : m_state(false) {}
>
> void signal()
> {
> std::unique_lock<std::mutex> lock(m_mutex);
> m_state = true;
> m_cond.notify_one();
> }
>
> void wait()
> {
> std::unique_lock<std::mutex> lock(m_mutex);
> while (m_state == false) m_cond.wait(lock);
> m_state = false; // auto-reset
> }
> };
>
>
> // just a layer over an auto-reset event
> struct ct_fast_mutex
> {
> std::atomic<unsigned int> m_state;
> ct_auto_reset_event m_waitset;
>
> ct_fast_mutex() : m_state(0) {}
>
> void lock()
> {
> if (m_state.exchange(1, std::memory_order_acquire))
> {
> while (m_state.exchange(2, std::memory_order_acquire))
> {
> m_waitset.wait();
> }
> }
> }
>
> void unlock()
> {
> if (m_state.exchange(0, std::memory_order_release) == 2)
> {
> m_waitset.signal();
> }
> }
> };
>
>
>
> // Chris M. Thomassons Experimental Read/Write Mutex
> // Yeah, it is pretty damn fat wrt the state, however
> // it has some interesting properties...
> // The state can be compressed a bit...
> // btw, it has no loops...
> // Take a look at the lock_shared and unlock_shared functions
>
> #define RWMUTEX_COUNT_MAX LONG_MAX
>
> struct ct_rwmutex
> {
> // shared state
> std::atomic<long> m_wrstate;
> std::atomic<long> m_count;
> std::atomic<long> m_rdwake;
>
> ct_slow_semaphore m_rdwset;
> ct_slow_semaphore m_wrwset;
> ct_fast_mutex m_wrlock;
>
>
> ct_rwmutex() :
> m_wrstate(1),
> m_count(RWMUTEX_COUNT_MAX),
> m_rdwake(0),
> m_rdwset(0),
> m_wrwset(0) {
> }
>
>
> // READ, pretty slim...
> void lock_shared()
> {
> if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
> {
> m_rdwset.dec();
> }
> }
>
> void unlock_shared()
> {
> if (m_count.fetch_add(1, std::memory_order_release) < 0)
> {
> if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
> {
> m_wrwset.inc();
> }
> }
> }
>
>
> // WRITE, more hefty
> void lock()
> {
> m_wrlock.lock();
>
> long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
> std::memory_order_acquire);
>
> if (count < RWMUTEX_COUNT_MAX)
> {
> long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
> std::memory_order_acquire);
>
> if (rdwake + RWMUTEX_COUNT_MAX - count)
> {
> m_wrwset.dec();
> }
> }
> }
>
> // write unlock
> void unlock()
> {
> long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
> std::memory_order_release);
>
> if (count < 0)
> {
> m_rdwset.add(-count);
> }
>
> m_wrlock.unlock();
> }
> };
>
>
> struct ct_shared
> {
> std::atomic<unsigned long> m_state;
>
> #if defined (CT_TEST_FAST_MUTEX)
> ct_rwmutex m_std_rwmutex;
> #else
> std::shared_mutex m_std_rwmutex;
> #endif
>
> ct_shared() : m_state(0) {}
> };
>
>
> void ct_thread(ct_shared& shared, std::size_t index)
> {
> for (unsigned int i = 0; i < ITERS; ++i)
> {
>
> shared.m_std_rwmutex.lock();
> if (i % 256 == 0) std::this_thread::yield();
> shared.m_state += 1;
> shared.m_std_rwmutex.unlock();
>
>
> shared.m_std_rwmutex.lock_shared();
> if (i % 512 == 0) std::this_thread::yield();
> //shared.m_state += 1;
> shared.m_std_rwmutex.unlock_shared();
>
> }
> }
>
>
> int main()
> {
> ct_shared shared;
>
> {
> std::thread threads[THREADS];
>
> std::clock_t start = std::clock();
>
> for (std::size_t i = 0; i < THREADS; ++i)
> {
> threads[i] = std::thread(ct_thread, std::ref(shared), i);
> }
>
> for (std::size_t i = 0; i < THREADS; ++i)
> {
> threads[i].join();
> }
>
> std::clock_t diff = clock() - start;
>
> unsigned long msec = diff * 1000 / CLOCKS_PER_SEC;
>
> std::cout << "msec = " << msec << "\n";
> }
>
> std::cout << "shared.m_state = " << shared.m_state << "\n";
> std::cout << "\n\nFin!\n";
>
> assert(shared.m_state == COUNT);
>
> return 0;
> }
> __________________________________
>
> I will explain the algorithm in further detail when I get some more
> time. Probably tomorrow.




Tested your latest code on 2018 i5 Mac Book Pro and 3 different compilers:

*************************************
* GCC -Ofast -march=native -lstdc++ *
*************************************

Testing: Chris M. Thomasson's Experimental Read/Write Mutex

msec = 46171
shared.m_state = 160000000


Fin!

******************************************
* Apple CLANG -Ofast -march=native -lc++ *
******************************************

Testing: Chris M. Thomasson's Experimental Read/Write Mutex

msec = 40027
shared.m_state = 160000000


Fin!

*****************************************
* LLVM CLANG -Ofast -march=native -lc++ *
*****************************************

Testing: Chris M. Thomasson's Experimental Read/Write Mutex

msec = 37518
shared.m_state = 160000000


Fin!



VS SHARED MUTEX:

Ran for 15 minutes then i stopped it. It's not even close.

Chris M. Thomasson

unread,
Feb 18, 2019, 4:22:32 PM2/18/19
to
On 2/13/2019 11:03 PM, Chris M. Thomasson wrote:
> Fwiw, here is an older read/write mutex of mine. I created a little
> benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
> algorithm beats std::shared_mutex pretty badly. It takes my algorithm
> around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
> seconds. Wow! What a difference.
>
> Here is the code: Still have to try it out on GCC in c++17 mode.
>
> Can anybody run it?
>
> https://pastebin.com/raw/xCBHY9qd
> __________________________________
> /* Simple, crude read/write mutex test
>    by: Chris M. Thomasson
[...]

Holy moly! Read all of the following post:



https://groups.google.com/forum/#!original/lock-free/zzZX4fvtG04/ebVWl0BCBAAJ



WOW! Btw, Dmitry Vyukov works at Google and develops the GO language!
Why did I fail to put a license on my work. Damn it.



Here is his response when I asked him what he thought about my algorithm:



Dmitry Vyukov:

I think it's one of the best algorithms out there. The complete fairness
for both readers and writers is notable. And the wait-free fast-path for
readers. It still suffers from high cache line contention even on
read-only workload, but it's as good as one can get with a centralized
design (i.e. not per-thread/cpu distributed stuff which has own
problems). I would expect a CAS-loop-based read lock to degrade
significantly under high read load. Btw your algorithm is used as the
standard Go RWMutex for the past 7+ years= :
https://github.com/golang/go/commit/daaf29cf932 (I should have mentioned
your authorship somewhere!) As for potential improvements I can only
think of optimizing uncontented writer lock/unlock to be 1 RMW each,
i.e. not offloading writer competition resolution to a separate mutex,
but rather incorporate it into the same atomic variable readers and
writers use for synchronization with each other. Do you think it's
possible? With CAS-loop? Or maybe with some smart atomic_fetch_or? Wait,
atomic_fetch_or is compiled to a CAS loop on x86 anyway. These new C/C++
atomic interfaces sometimes make me forget that.

Chris M. Thomasson

unread,
Feb 18, 2019, 4:43:45 PM2/18/19
to
On 2/17/2019 11:34 PM, mvor...@gmail.com wrote:
> On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson wrote:
>> Fwiw, here is an older read/write mutex of mine. I created a little
>> benchmark for it vs c++17's std::shared_mutex. On MSVC 2017, my
>> algorithm beats std::shared_mutex pretty badly. It takes my algorithm
>> around 34 seconds to complete. MSVC's std::shared_mutex takes around 127
>> seconds. Wow! What a difference.
>>
>> Here is the code: Still have to try it out on GCC in c++17 mode.
>>
>> Can anybody run it?
>>
>> https://pastebin.com/raw/xCBHY9qd
[...]
> Tested your latest code on 2018 i5 Mac Book Pro and 3 different compilers:
>
> *************************************
> * GCC -Ofast -march=native -lstdc++ *
> *************************************
>
> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
>
> msec = 46171
> shared.m_state = 160000000
>
>
> Fin!
>
> ******************************************
> * Apple CLANG -Ofast -march=native -lc++ *
> ******************************************
>
> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
>
> msec = 40027
> shared.m_state = 160000000

[...]
> *****************************************
> * LLVM CLANG -Ofast -march=native -lc++ *
> *****************************************
>
> Testing: Chris M. Thomasson's Experimental Read/Write Mutex
>
> msec = 37518
> shared.m_state = 160000000
[...]
> VS SHARED MUTEX:
>
> Ran for 15 minutes then i stopped it. It's not even close.
>

Wow. Thank you so much for taking the time to test it for yourself. I
really do appreciate it. It seems that some of the std::shared_mutex
impls are created around pure mutex/condvar. So, it is only slow pathed
(always blocking). My algorithms try to create fast-paths (non-blocking)
paths that can cleverly "skip" calls into the "slow" paths, so to speak.

I am working on another type of benchmark code. Will get back to you.

Thanks again.

Chris M. Thomasson

unread,
Feb 18, 2019, 11:55:08 PM2/18/19
to
Fwiw, here is such a test using Relacy Race Detector, you know, just to
keep things in line:

https://groups.google.com/d/msg/lock-free/zzZX4fvtG04/-SFh9r_JAQAJ

https://pastebin.com/raw/xCBHY9qd
(link to raw code)

Readers iterate all the nodes in struct ct_simple_stack.

Writers push and pop from it.

Next step is coding it up in pure 100% standard C++11. Well, C++17 wrt
comparing it against std::shared_mutex... ;^)

_________________________
// Experimental Read-Write Mutex Test
// More "realistic" test, in Relacy...
// By: Chris M. Thomasson
//_______________________________________________


//#define RL_DEBUGBREAK_ON_ASSERT
//#define RL_MSVC_OUTPUT
//#define RL_FORCE_SEQ_CST
//#define RL_GC


#include <relacy/relacy_std.hpp>
#include <iostream>


// Simple macro based redirection of the verbose std membars.
#define CT_MB_ACQ std::memory_order_acquire
#define CT_MB_REL std::memory_order_release
#define CT_MB_RLX std::memory_order_relaxed
#define CT_MB_ACQ_REL std::memory_order_acq_rel
#define CT_MB_SEQ_CST std::memory_order_seq_cst
#define CT_MB(mp_0) std::atomic_thread_fence(mp_0)


#define READERS 4
#define WRITERS 2
#define THREADS (READERS + WRITERS)
#define ITERS 3




////////////////


// bare bones mutex/condvar based semaphore
struct ct_slow_semaphore
{
VAR_T(unsigned long) m_state;
std::mutex m_mutex;
std::condition_variable m_cond;

ct_slow_semaphore(unsigned long state) : m_state(state) {}

void inc()
{
m_mutex.lock($);
++VAR(m_state);
m_mutex.unlock($);
m_cond.notify_one($);
}

void add(unsigned long addend)
{
m_mutex.lock($);
VAR(m_state) += addend;
m_mutex.unlock($);
m_cond.notify_all($);
}

void dec()
{
m_mutex.lock($);
while (VAR(m_state) == 0) m_cond.wait(m_mutex, $);
--VAR(m_state);
m_mutex.unlock($);
}
};




// bin-sema
struct ct_auto_reset_event
{
bool m_state;
std::mutex m_mutex;
std::condition_variable m_cond;

ct_auto_reset_event() : m_state(false) {}

void signal()
{
m_mutex.lock($);
m_state = true;
m_cond.notify_one($);
m_mutex.unlock($);
}

void wait()
{
m_mutex.lock($);
while (m_state == false) m_cond.wait(m_mutex, $);
m_state = false; // auto-reset
m_mutex.unlock($);
////////////////


struct ct_simple_stack
{
struct node
{
VAR_T(node*) m_next;
VAR_T(unsigned int) m_tid;

node(unsigned int tid) : m_tid(tid) {}
};

VAR_T(node*) m_head;

ct_simple_stack() : m_head(nullptr) {}

void push(node* n)
{
VAR(n->m_next) = VAR(m_head);
VAR(m_head) = n;
}

node* pop()
{
node* n = VAR(m_head);

if (n)
{
VAR(m_head) = VAR(n->m_next);
}

return n;
}


node* flush()
{
node* n = VAR(m_head);
VAR(m_head) = nullptr;
return n;
}

};


void ct_destroy_nodes(ct_simple_stack& s)
{
ct_simple_stack::node* n = s.flush();

while (n)
{
ct_simple_stack::node* next = VAR(n->m_next);
delete n;
n = next;
}
}



// Relacy Stack Test...
struct ct_fast_mutex_test
: rl::test_suite<ct_fast_mutex_test, THREADS>
{
ct_rwmutex g_rwmutex;
ct_simple_stack g_stack;

void before()
{

}

void after()
{
ct_destroy_nodes(g_stack);
}

// reader
void reader(unsigned int tidx)
{
g_rwmutex.lock_shared();

for (unsigned int i = 0; i < ITERS; ++i)
{
ct_simple_stack::node* h = VAR(g_stack.m_head);

while (h)
{
ct_simple_stack::node* next = VAR(h->m_next);

if (i % 512 == 0)
{
rl::yield(1, $);
}

h = next;
}
}

g_rwmutex.unlock_shared();
}


// writer
void writer(unsigned int tidx)
{
g_rwmutex.lock();
g_stack.push(new ct_simple_stack::node(tidx));
g_stack.push(new ct_simple_stack::node(tidx));
g_rwmutex.unlock();

g_rwmutex.lock();
ct_simple_stack::node* n = g_stack.pop();
g_rwmutex.unlock();

if (n)
{
// destroy
delete n;
}
}


void thread(unsigned int tidx)
{
if (tidx < READERS)
{
reader(tidx);
}

else
{
writer(tidx);
}
}
};



// Test away... Or fly? Humm...
int main()
{
{
rl::test_params p;

p.iteration_count = 10000000;
//p.execution_depth_limit = 33333;
//p.search_type = rl::sched_bound;
//p.search_type = rl::fair_full_search_scheduler_type;
//p.search_type = rl::fair_context_bound_scheduler_type;

rl::simulate<ct_fast_mutex_test>(p);
}

return 0;
}
_________________________

Chris M. Thomasson

unread,
Feb 20, 2019, 12:56:19 AM2/20/19
to
On 2/17/2019 11:34 PM, mvor...@gmail.com wrote:
Fwiw, I have almost completed my new benchmark. The timings that I am
getting measure a different thing. How many reads and writes can be
completed within a given time frame. Well, here are some initial results:

Testing: Chris M. Thomasson's Experimental Read/Write Mutex

Raw Reads: 3086724
Raw Writes: 59651
reads_per_tick = 308309
writes_per_tick = 5958
Ticks = 10.0118


Testing: std::shared_mutex

Raw Reads: 110922
Raw Writes: 29502
reads_per_tick = 11071
writes_per_tick = 2944
Ticks = 10.019


For a simple test of 10 seconds, my algorithm allows for 3086724 reads
and 59651 writes to be completed. While std::shared_mutex allows for
only 110922 reads and 29502 writes, damn. Look at the difference in the
read rate per tick! Wow. I can't wait for you, and others to test it and
tell me the results, good or bad. Heck, std::shared_mutex should beat my
work, wouldn't ya think? Humm...

sa...@portsip.com

unread,
May 5, 2019, 4:35:54 AM5/5/19
to
Hi Chris,

I'm looking for a fast read/write lock in my project, would you please share me your latest implementation then I can do benchmark and feedback.

Thanks

Chris M. Thomasson

unread,
May 5, 2019, 6:30:54 PM5/5/19
to
On 5/5/2019 1:35 AM, sa...@portsip.com wrote:
> Hi Chris,
>
> I'm looking for a fast read/write lock in my project, would you please share me your latest implementation then I can do benchmark and feedback.
>

Here is my latest version 0.1:

https://pastebin.com/raw/1QtPCGhV
(raw text link, no ads...)

It should compile right up and run. If you undefine CT_TEST_FAST_MUTEX
it will run the test against std::shared_mutex.

> On Wednesday, February 20, 2019 at 1:56:19 PM UTC+8, Chris M. Thomasson wrote:
>> On 2/17/2019 11:34 PM, mvor...@gmail.com wrote:
>>> On Thursday, February 14, 2019 at 2:03:43 AM UTC-5, Chris M. Thomasson wrote:
[...]

sa...@portsip.com

unread,
May 6, 2019, 3:57:07 AM5/6/19
to
Thanks, it's really good!

Chris M. Thomasson

unread,
May 9, 2019, 5:44:57 PM5/9/19
to
On 5/6/2019 12:56 AM, sa...@portsip.com wrote:
> Thanks, it's really good!

Thank you for giving it a go. When you get some time, can you show your
results? :^)
0 new messages