I have tested this on Fedore Core. However, it should compile on any IA32
based system with GCC and PThreads (with support for `sem_post()'). If
your having any troubles, please let me know.
I am interested in hearing about the performance differences between the
native PThread read/write mutex vs. my algorithm. Last time I checked the
NPTL source code, it used internal mutexs for read access. Therefore, mine
should have a performance "advantage".
Any comments, good or bad, are welcome.
Anyway, enjoy!
:^D
Have you benchmarked it to validate your guesses?
--
Max
Not yet. However, if the implmentation is anything like:
http://www.opensource.apple.com/source/Libc/Libc-498/pthreads/pthread_rwlock.c
or glibc ver 2.10.1 which has code like:
/* Acquire read lock for RWLOCK. */
int
__pthread_rwlock_rdlock (rwlock)
pthread_rwlock_t *rwlock;
{
int result = 0;
/* Make sure we are along. */
lll_lock (rwlock->__data.__lock, rwlock->__data.__shared);
[...]
}
Then, my algorithm should outperform them because if multiple reader threads
hit those implementations above, they will all be contending for the single
mutex. On the other hand, my algorithm will allow the readers to gain
wait-free bounded-time access. Also, my algorithm is 100% starvation free.
I will code up a little test to see if my algorithm as any performance
enhancements on my Fedora Core installation. It should be ready sometime
today. I think I will set it up as a single linked list with multiple
readers iterating over it as fast as they can, and a couple of writers that
periodically add/remove items. I will calculate how many iterations
per-reader-thread-per-second are achieved.
Here is a VERY CRUDE test program:
This mainly tests read performance. Writes are VERY few and far between.
So writer-to-reader contention will probably never show up. Anyway, here
is the command line I used to compile my version:
gcc -Wall -pedantic -DNDEBUG -O3 rwmutex.c -lpthread
And here is the one I used for the NPTL:
gcc -Wall -pedantic -DNDEBUG -DUSE_PTHREAD_NATIVE_RWLOCK -O3 rwmutex.c -lpthread
Here is the relevant portion of the output I get from my version:
TEST RUN 1 of 2 RUNNING...
READERS ACHIEVED 249999 ITERATIONS PER-THREAD-PER-SECOND
TEST RUN 1 of 2 COMPLETED!!!
------------------------------------
TEST RUN 2 of 2 RUNNING...
READERS ACHIEVED 249999 ITERATIONS PER-THREAD-PER-SECOND
TEST RUN 2 of 2 COMPLETED!!!
------------------------------------
And here is the output I get from the NPTL version:
TEST RUN 1 of 2 RUNNING...
READERS ACHIEVED 133333 ITERATIONS PER-THREAD-PER-SECOND
TEST RUN 1 of 2 COMPLETED!!!
------------------------------------
TEST RUN 2 of 2 RUNNING...
READERS ACHIEVED 133333 ITERATIONS PER-THREAD-PER-SECOND
TEST RUN 2 of 2 COMPLETED!!!
------------------------------------
As you can see, my version is getting 116666 more reads per-second-per-
thread. AFAICT, that is fairly significant.
WOW!!!
My non-starvation bounded-time wairt-free fast-path reader rw-algorithm,
which was designed specifically for windows (pre-vista), is getting
extremely similar performance numbers on most recent versions of
Linux/Solaris/OpenSolaris with most recent version of GCC indeed!
Cool!
Perhaps a hundred thousand more reads per-thread-per-second!!!!!!
My algorithm seems to rock and roll!
I test on Fedora 10/Most Recent OpenSoalris/Solaris/Windows XP Most
Recent GCC.... Waiting for Fedora 11, Leonidas!
GCC, well, awesome!
;D