my rwmutex algorithm for Linux...

Chris M. Thomasson

unread,

May 26, 2009, 2:38:34 PM5/26/09

to

Here is a full example implementation:

I have tested this on Fedore Core. However, it should compile on any IA32
based system with GCC and PThreads (with support for `sem_post()'). If
your having any troubles, please let me know.

I am interested in hearing about the performance differences between the
native PThread read/write mutex vs. my algorithm. Last time I checked the
NPTL source code, it used internal mutexs for read access. Therefore, mine
should have a performance "advantage".

Any comments, good or bad, are welcome.

Anyway, enjoy!

:^D

Maxim Yegorushkin

unread,

May 29, 2009, 5:19:37 AM5/29/09

to

On May 26, 7:38 pm, "Chris M. Thomasson" <n...@name.invalid> wrote:
> Here is a full example implementation:
>
> http://pastebin.com/f3d6140eb
>
> I have tested this on Fedore Core. However, it should compile on any IA32
> based system with GCC and PThreads (with support for `sem_post()'). If
> your having any troubles, please let me know.
>
> I am interested in hearing about the performance differences between the
> native PThread read/write mutex vs. my algorithm. Last time I checked the
> NPTL source code, it used internal mutexs for read access. Therefore, mine
> should have a performance "advantage".

Have you benchmarked it to validate your guesses?

--
Max

Chris M. Thomasson

unread,

May 29, 2009, 2:10:29 PM5/29/09

to

"Maxim Yegorushkin" <maxim.ye...@gmail.com> wrote in message
news:a7c04724-895b-42de...@r13g2000vbr.googlegroups.com...

Not yet. However, if the implmentation is anything like:

http://www.opensource.apple.com/source/Libc/Libc-498/pthreads/pthread_rwlock.c

or glibc ver 2.10.1 which has code like:

/* Acquire read lock for RWLOCK. */
int
__pthread_rwlock_rdlock (rwlock)
pthread_rwlock_t *rwlock;
{
int result = 0;
/* Make sure we are along. */
lll_lock (rwlock->__data.__lock, rwlock->__data.__shared);

[...]
}

Then, my algorithm should outperform them because if multiple reader threads
hit those implementations above, they will all be contending for the single
mutex. On the other hand, my algorithm will allow the readers to gain
wait-free bounded-time access. Also, my algorithm is 100% starvation free.

I will code up a little test to see if my algorithm as any performance
enhancements on my Fedora Core installation. It should be ready sometime
today. I think I will set it up as a single linked list with multiple
readers iterating over it as fast as they can, and a couple of writers that
periodically add/remove items. I will calculate how many iterations
per-reader-thread-per-second are achieved.

Chris M. Thomasson

unread,

May 29, 2009, 3:54:50 PM5/29/09

to

Here is a VERY CRUDE test program:

http://pastebin.com/f7d6677c6

This mainly tests read performance. Writes are VERY few and far between.
So writer-to-reader contention will probably never show up. Anyway, here
is the command line I used to compile my version:

gcc -Wall -pedantic -DNDEBUG -O3 rwmutex.c -lpthread

And here is the one I used for the NPTL:

gcc -Wall -pedantic -DNDEBUG -DUSE_PTHREAD_NATIVE_RWLOCK -O3 rwmutex.c -lpthread

Here is the relevant portion of the output I get from my version:

TEST RUN 1 of 2 RUNNING...
READERS ACHIEVED 249999 ITERATIONS PER-THREAD-PER-SECOND

TEST RUN 1 of 2 COMPLETED!!!
------------------------------------
TEST RUN 2 of 2 RUNNING...
READERS ACHIEVED 249999 ITERATIONS PER-THREAD-PER-SECOND

TEST RUN 2 of 2 COMPLETED!!!
------------------------------------

And here is the output I get from the NPTL version:

TEST RUN 1 of 2 RUNNING...
READERS ACHIEVED 133333 ITERATIONS PER-THREAD-PER-SECOND

TEST RUN 1 of 2 COMPLETED!!!
------------------------------------
TEST RUN 2 of 2 RUNNING...
READERS ACHIEVED 133333 ITERATIONS PER-THREAD-PER-SECOND

TEST RUN 2 of 2 COMPLETED!!!
------------------------------------

As you can see, my version is getting 116666 more reads per-second-per-
thread. AFAICT, that is fairly significant.

Chris M. Thomasson

unread,

Jun 5, 2009, 7:20:59 AM6/5/09

to

"Chris M. Thomasson" <n...@name.invalid> wrote in message
news:e4XTl.103675$bi7....@newsfe07.iad...

> On Fri, 29 May 2009 02:19:37 -0700, Maxim Yegorushkin wrote:
>
>> On May 26, 7:38 pm, "Chris M. Thomasson" <n...@name.invalid> wrote:
>>> Here is a full example implementation:
>>>
>>> http://pastebin.com/f3d6140eb
>>>
>>> I have tested this on Fedore Core. However, it should compile on any
>>> IA32 based system with GCC and PThreads (with support for
>>> `sem_post()'). If your having any troubles, please let me know.
>>>
>>> I am interested in hearing about the performance differences between
>>> the native PThread read/write mutex vs. my algorithm. Last time I
>>> checked the NPTL source code, it used internal mutexs for read access.
>>> Therefore, mine should have a performance "advantage".
>>
>> Have you benchmarked it to validate your guesses?
>
> Here is a VERY CRUDE test program:
>
> http://pastebin.com/f7d6677c6
>
> This mainly tests read performance. Writes are VERY few and far between.
> So writer-to-reader contention will probably never show up. Anyway, here
> is the command line I used to compile my version:

> [...]

> As you can see, my version is getting 116666 more reads per-second-per-
> thread. AFAICT, that is fairly significant.

WOW!!!

My non-starvation bounded-time wairt-free fast-path reader rw-algorithm,
which was designed specifically for windows (pre-vista), is getting
extremely similar performance numbers on most recent versions of
Linux/Solaris/OpenSolaris with most recent version of GCC indeed!

Cool!

Perhaps a hundred thousand more reads per-thread-per-second!!!!!!

My algorithm seems to rock and roll!

I test on Fedora 10/Most Recent OpenSoalris/Solaris/Windows XP Most
Recent GCC.... Waiting for Fedora 11, Leonidas!

GCC, well, awesome!

;D