rwlock using pthread

Hovik Melikyan

unread,

Sep 16, 2002, 6:58:44 AM9/16/02

to

Hi,

This might be interesting to those who write portable multithreaded
programs.

I was looking for a read/write lock implementation based on the other
pthreads primitives for systems which lack built-in rwlock, e.g. Mac
OS X. I came across two implementations on the Net (I omit many others
since they evidently mimic these two, along with their bugs and
weaknesses) and thoroughly tested them both. The test program was
starting 50 threads each of which tries to acquire either a read or a
write lock (1 to 5, randomly selected) repeatedly.

The first one can be found in one of the AIX tutorials:
http://www.unet.univie.ac.at/aix/aixprggd/genprogc/complex_synchronization_objects.htm
. Many other articles and sites refer to this page. This
implementation tries to give higher priority to writers, however, it
contains a bug that makes this code unusable. Take a look at function
rwlock_lock_read() on the page above: the waiting condition must be

while ((rwl->lock_count < 0) || (rwl->waiting_writers))
pthread_cond_wait(&rwl->rcond, &rwl->lock);

instead of &&.

The other one was posted in this newsgroup a few months ago by Sean
P.Burke: http://groups.google.com/groups?selm=82fzyzjq2k.fsf%40dev0.welkyn.com
. Sean's rwlock module is not writer-priority, however, it is
reentrant. Few modifications in the source code make Sean's
implementation writer-priority and, according to my tests, a little
bit faster, but NOT reentrant (please see the patch below).

Regards,

--
Hovik Melikyan

*** rwlock.c.ORIG Mon Sep 16 15:44:19 2002
--- rwlock.c Mon Sep 16 15:47:05 2002
***************
*** 51,57 ****
*/

/* Wait while writers are active */
! while (lock->active_readers < 0)
pthread_cond_wait(&lock->ok_to_read, &lock->mutex);

lock->pending_readers--;
--- 51,57 ----
*/

/* Wait while writers are active */
! while (lock->active_readers < 0 || lock->pending_writers > 0)
pthread_cond_wait(&lock->ok_to_read, &lock->mutex);

lock->pending_readers--;
***************
*** 186,192 ****
/* Signal waiters while holding the mutex, to meet the
* requirements of our pthread_cond_wait emulation for WIN32.
*/
! if (lock->pending_readers != 0)
/* Wake all readers.
*/
pthread_cond_broadcast(&lock->ok_to_read);
--- 186,192 ----
/* Signal waiters while holding the mutex, to meet the
* requirements of our pthread_cond_wait emulation for WIN32.
*/
! if (lock->pending_writers == 0)
/* Wake all readers.
*/
pthread_cond_broadcast(&lock->ok_to_read);

Sean P. Burke

unread,

Sep 16, 2002, 1:43:09 PM9/16/02

to

ho...@moon.yerphi.am (Hovik Melikyan) writes:

Thank you for subjecting my readers/writer lock to a rigorous
test. I have some questions about your conclusions, though.

My implementation was intended to allow a single writer thread
to alternate with a group of reader threads for access to the
lock - an attempt at "fairness". Your change appears to give
writers priority in all cases, which may be prefereable, but
how does it become non-reentrant?

-SEan

Hovik Melikyan

unread,

Sep 17, 2002, 4:15:15 AM9/17/02

to

Hi Sean,

sbu...@dev0.welkyn.com (Sean P. Burke) wrote in message news:<82ptvds...@dev0.welkyn.com>...

>
> My implementation was intended to allow a single writer thread
> to alternate with a group of reader threads for access to the
> lock - an attempt at "fairness".

Do you think "ladies first" is fair, especially if the female
population prevales? (i mean, if readers are ladies) :) Seriously,
from the point of view of practical programming it's much easier to
give writers exclusive priveleges and forget about the problem of
starvation than, on the other hand, make the user of your module think
whether his/her application will starve with your rwlock. With your
scheme, I'll have to test and evaluate the number of concurrent reads
and writes in my app to see if writes are possible at all. The
writer-priority scheme does not burden the developer with this kind of
problems. (How about making your code conditional at run-time, btw? I
mean, one additional bool flag, smth like writer_prioritized).

> Your change appears to give
> writers priority in all cases, which may be prefereable, but
> how does it become non-reentrant?

Frankly, I don't remember how I came to this conclusion. Most likely I
was wrong by saying it is not reentrant.

BTW, as someone who writes threading algorithms, what do you think
about the rwlock implementation for Win32 found in APR (Apache
Portable Runtime)? Win32 lacks posix-like conditions, but it offers
Event instead. APR uses two Event objects and a mutex to implement
rwlocks, but I'm not sure I understand this code and how it
prioritizes the locks. The thing is that I'm writing a portable
library for multithreading and networking and I need to port the
rwlock interface to Win32 too.

Below is the APR rwlock module for Win32. Any comments and
explanations would be greatly appreciated. Thanks.

--
Hovik Melikyan

static apr_status_t thread_rwlock_cleanup(void *data)
{
apr_thread_rwlock_t *rwlock = data;
CloseHandle(rwlock->readevent);
CloseHandle(rwlock->finished);
CloseHandle(rwlock->writemutex);
return APR_SUCCESS;
}

APR_DECLARE(apr_status_t) apr_thread_rwlock_create(apr_thread_rwlock_t
**rwlock,
apr_pool_t *pool)
{
(*rwlock) = apr_palloc(pool, sizeof(**rwlock));
(*rwlock)->pool = pool;
(*rwlock)->readevent=CreateEvent(NULL,TRUE,FALSE,NULL);
(*rwlock)->finished = CreateEvent(NULL,FALSE,TRUE,NULL);
(*rwlock)->writemutex = CreateMutex(NULL,FALSE,NULL);
(*rwlock)->counter = -1;
(*rwlock)->wrcounter = 0;
return APR_SUCCESS;
}

APR_DECLARE(apr_status_t) apr_thread_rwlock_rdlock(apr_thread_rwlock_t
*rwlock)
{
if (InterlockedIncrement(&rwlock->counter) == 0) {
WaitForSingleObject(rwlock->finished, INFINITE);
SetEvent(rwlock->readevent);
}
WaitForSingleObject(rwlock->readevent,INFINITE);
return APR_SUCCESS;
}

APR_DECLARE(apr_status_t) apr_thread_rwlock_wrlock(apr_thread_rwlock_t
*rwlock)
{
WaitForSingleObject(rwlock->writemutex,INFINITE);
WaitForSingleObject(rwlock->finished, INFINITE);
rwlock->wrcounter++;
return APR_SUCCESS;
}

APR_DECLARE(apr_status_t) apr_thread_rwlock_unlock(apr_thread_rwlock_t
*rwlock)
{
if (rwlock->wrcounter) {
/* If wrcounter is > 0, then we must have a writer lock */
rwlock->wrcounter--;
SetEvent(rwlock->finished);
ReleaseMutex(rwlock->writemutex);
}
else {
if (InterlockedDecrement(&rwlock->counter) < 0) {
ResetEvent(rwlock->readevent);
SetEvent(rwlock->finished);
}
}
return APR_SUCCESS;
}

APR_DECLARE(apr_status_t)
apr_thread_rwlock_destroy(apr_thread_rwlock_t *rwlock)
{
return apr_pool_cleanup_run(rwlock->pool, rwlock,
thread_rwlock_cleanup);
}

Alexander Terekhov

unread,

Sep 17, 2002, 6:54:28 AM9/17/02

to

Hovik Melikyan wrote:
[...]

> BTW, as someone who writes threading algorithms, what do you think
> about the rwlock implementation for Win32 found in APR (Apache
> Portable Runtime)? Win32 lacks posix-like conditions, but it offers
> Event instead. APR uses two Event objects and a mutex to implement
> rwlocks, but I'm not sure I understand this code and how it
> prioritizes the locks. The thing is that I'm writing a portable
> library for multithreading and networking and I need to port the
> rwlock interface to Win32 too.

http://sources.redhat.com/pthreads-win32
(Win32 condvars emulation, to begin with)

http://groups.google.com/groups?selm=3B166244.F923B993%40web.de
(Subject: Re: how to distinguish between read and write locks?)

"....
for example (version w/o condition variables and with
"no read/write preference" policy; pseudo-code): ...."

> Below is the APR rwlock module for Win32. Any comments and
> explanations would be greatly appreciated. Thanks.

Comment: <as usual> MS-threading stuff is UTTERLY BRAIN-DEAD
and, for ENTIRELY that reason, this rwlock is totally broken
as well (unless I'm just missing and/or misunderstanding
something).

Explanation...

[...]

> APR_DECLARE(apr_status_t) apr_thread_rwlock_create(apr_thread_rwlock_t
> **rwlock,
> apr_pool_t *pool)
> {
> (*rwlock) = apr_palloc(pool, sizeof(**rwlock));
> (*rwlock)->pool = pool;
> (*rwlock)->readevent=CreateEvent(NULL,TRUE,FALSE,NULL);

That's manual reset event used by the first/last reader to
allow/disallow "more readers" to come in.

> (*rwlock)->finished = CreateEvent(NULL,FALSE,TRUE,NULL);

That's binary semaphore (MS-auto-reset event) and it's acquired/
releases by first/last reader and writers (recursive write case
is broken, BTW).

> (*rwlock)->writemutex = CreateMutex(NULL,FALSE,NULL);

Well, that's MS "Mutex" {recursive} beast and it's used here to
exclude concurrent writes.

> (*rwlock)->counter = -1;

Count of pending and/or active readers.

> (*rwlock)->wrcounter = 0;

That's recursive write locking count (again, it's broken for > 1).

> return APR_SUCCESS;
> }
>
> APR_DECLARE(apr_status_t) apr_thread_rwlock_rdlock(apr_thread_rwlock_t
> *rwlock)
> {
> if (InterlockedIncrement(&rwlock->counter) == 0) {

This MS-thing should really provide ACQUIRE memory sync.
semantics in order to allow unlock to see proper value of
rwlock->wrcounter variable -- and you should really ask MS
folks whether it really works that way... ;-)

Anyway, here we've got a race condition with respect to
unlock (last reader in the "previous" readers batch).

> WaitForSingleObject(rwlock->finished, INFINITE);

Here's the point to decided who's next -- either a whole
bunch of readers [the first one will release all others
(next line) and, AFAICS, continuous stream of subsequent
readers WILL result in writers starvation here) or some
writer.

> SetEvent(rwlock->readevent);

Return is missing here, sort of. ;-)

> }
> WaitForSingleObject(rwlock->readevent,INFINITE);

Here pending readers are awaiting to become released by
the first reader in the bacth that is granted the access
via the "finished" sema.

> return APR_SUCCESS;
> }
>
> APR_DECLARE(apr_status_t) apr_thread_rwlock_wrlock(apr_thread_rwlock_t
> *rwlock)
> {
> WaitForSingleObject(rwlock->writemutex,INFINITE);

Mutual exclusion among writers.

> WaitForSingleObject(rwlock->finished, INFINITE);

Write lock acquisition; here it competes with some lucky
reader -- first one in the batch of readers (and priorities
don't really work for that reason, BTW).

> rwlock->wrcounter++;

Recursive write lock counting.

> return APR_SUCCESS;
> }
>
> APR_DECLARE(apr_status_t) apr_thread_rwlock_unlock(apr_thread_rwlock_t
> *rwlock)
> {
> if (rwlock->wrcounter) {

``What am I?'' (might well be broken for readers, see above)

> /* If wrcounter is > 0, then we must have a writer lock */
> rwlock->wrcounter--;
> SetEvent(rwlock->finished);

BUG. if ( 0 == --rwlock->wrcounter ) SetEvent(rwlock->finished);

> ReleaseMutex(rwlock->writemutex);
> }
> else {
> if (InterlockedDecrement(&rwlock->counter) < 0) {

Bye-bye. Here's the major problem: another batch of readers
(all but first one) may easily pass lock() prior to ResetEvent(
rwlock->readevent) below... with concurrent access granted to
some pending writer (on "finished" sema) as well -- result of
SetEvent(rwlock->finished) below.

> ResetEvent(rwlock->readevent);
> SetEvent(rwlock->finished);

Last reader kinda turns-off-the-lights (race condition).

Hmmm. Am I missing and/or misunderstanding something?

regards,
alexander.

--
You Could Have Won >>ONE MILLION<< At http://sources.redhat.com/pthreads-win32

You May Also Want To BUY Microsoft Windows >>UNIX<< for $50-$99 (plus $30/CAL)

http://www.infoworld.com/articles/pl/xml/02/07/29/020729plmsu.xml
(REVIEWS: A nod toward Unix... PROS: + Bypasses Win32 API for performance
+ Superior performance to Win32 API
+ Price is unbeatable)

ftp://ftp.microsoft.com/developr/Interix/sfu30/POSIX1Conformance.doc
(Microsoft Windows Services for UNIX 3.0 Interix POSIX.1 Conformance Document)

ftp://ftp.microsoft.com/developr/Interix/sfu30/POSIX2Conformance.doc
(Microsoft Windows Services for UNIX 3.0 Interix POSIX.2 Conformance Document)

http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/windows2000serv/deploy/sfu/sfuposix.asp
(Writing POSIX-Standard Code)

http://www.microsoft.com/windows/sfu/default.asp
(Services for UNIX Home)

http://www.microsoft.com/windows/sfu/productinfo/trial/default.asp
(Trial Software Windows Services for UNIX 3.0)

http://www.microsoft.com/windows/sfu/support/xnews/default.asp
(microsoft.public.servicesforunix.general)

#include <disclaim.er>

Sean P. Burke

unread,

Sep 17, 2002, 12:13:22 PM9/17/02

to

ho...@moon.yerphi.am (Hovik Melikyan) writes:

> Hi Sean,
>
> sbu...@dev0.welkyn.com (Sean P. Burke) wrote in message news:<82ptvds...@dev0.welkyn.com>...
> >
> > My implementation was intended to allow a single writer thread
> > to alternate with a group of reader threads for access to the
> > lock - an attempt at "fairness".
>
> Do you think "ladies first" is fair, especially if the female
> population prevales? (i mean, if readers are ladies) :) Seriously,
> from the point of view of practical programming it's much easier to
> give writers exclusive priveleges and forget about the problem of
> starvation than, on the other hand, make the user of your module think
> whether his/her application will starve with your rwlock. With your
> scheme, I'll have to test and evaluate the number of concurrent reads
> and writes in my app to see if writes are possible at all. The
> writer-priority scheme does not burden the developer with this kind of
> problems. (How about making your code conditional at run-time, btw? I
> mean, one additional bool flag, smth like writer_prioritized).

I wrote "fairness" in quotes because people often disagree about
what is "fair", and not just in the field of computer software! :^>

To be serious, this lock shouldn't starve anyone. The intention is
that when a writer releases a lock, any currently pending readers
are allowed in. If there are also pending writers, then any new
readers attempting to acquire the lock will wait until the next
writer has finished, so that no thread is locked out indefinitely.
If you find that the code doesn't do that, please let me know.

> > Your change appears to give
> > writers priority in all cases, which may be prefereable, but
> > how does it become non-reentrant?
>
> Frankly, I don't remember how I came to this conclusion. Most likely I
> was wrong by saying it is not reentrant.

The "fairness" policy I described above will deadlock if readers
take locks recursively. Probably that's what you were thinking of.

> BTW, as someone who writes threading algorithms, what do you think
> about the rwlock implementation for Win32 found in APR (Apache
> Portable Runtime)? Win32 lacks posix-like conditions, but it offers
> Event instead. APR uses two Event objects and a mutex to implement
> rwlocks, but I'm not sure I understand this code and how it
> prioritizes the locks. The thing is that I'm writing a portable
> library for multithreading and networking and I need to port the
> rwlock interface to Win32 too.
>
> Below is the APR rwlock module for Win32. Any comments and
> explanations would be greatly appreciated. Thanks.

I'd like to help, but the only thing I know about Win32 threads
is that pthread-win32 runs on them.

-SEan

Hovik Melikyan

unread,

Sep 18, 2002, 7:07:55 AM9/18/02

to

Thanks for your comments, Alexander.

Alexander Terekhov <tere...@web.de> wrote in message >

> http://sources.redhat.com/pthreads-win32
> (Win32 condvars emulation, to begin with)
>

This is a condvar implementation, which in its turn is based on two
semaphores and a mutex. As a result, you have 4 semaphores and 4
mutex'es for a single rwlock object! Hmmm... I'd rahter think about
involving native Win32 objects than sticking on anti-microsoft or
whatever. The reality is that I want my daemon to work on Win32 too,
and I don't want it to be just a clumsy 'emulated' thing with a bunch
of soft pillows, that is, emulating DLL's. After all, while implicitly
promoting Unix, Apache and MySQL are still running perfectly on almost
anything, even on a cell phone :) and maybe that's why they're number
one, aren't they?

In fact, if we can make sure the APR code works, we can suggest it for
pthreads-win32 too. Let's see if we are on the right track at all.

So, Event is a binary semaphore with two different modes of operation:
when signaling a manual-reset event, it releases all threads waiting
for the event; when signaling an auto-reset event, it lets only one
thread in and remains in a non-signaled state. As you can see Event's
behaviour is somewhat resembling pthread_cond_signal and
pthread_cond_broadcast, but of course it's not quite the same.

I'm re-writing the APR code to some kind of a OO pseudo-code. I'll try
then to answer to your comments. (Actually I did write a wrapper class
for this event/mutex implementation on Windows and tested it.
Surprisingly it works perfectly :) even under very stressing
conditions.)

Cast:
writelock - a mutex
oktoread - a manual reset event
finished - an auto-reset event
readercnt - initialized to -1
writercnt - initialized to 0

1: void rwlock::rdlock()
2: {
3: if (atomic_increment(&readercnt) == 0)
4: {
5: finished.wait();
6: oktoread.post();
7: }
8: oktoread.wait();
9: }
10:
11: void rwlock::wrlock()
12: {
13: writelock.lock();
14: finished.wait();
15: writercnt++;
16: }
17:
18: void rwlock::unlock()
19: {
20: if (writercnt != 0)
21: {
22: writercnt--;
23: finished.post();
24: writelock.unlock();
25: }
26: else if (atomic_decrement(&readercnt) < 0)
27: {
28: oktoread.reset();
29: finished.post();
20: }
31: }

>
> > if (InterlockedIncrement(&rwlock->counter) == 0) {
>
> This MS-thing should really provide ACQUIRE memory sync.
> semantics in order to allow unlock to see proper value of
> rwlock->wrcounter variable -- and you should really ask MS
> folks whether it really works that way... ;-)
>

Are you talking about InterlockedIncrement? If so, indeed, it's a
strange thing that behaves differently on Win95+ and NT+ families. But
anyway, you can safely test the result for 0 and for signness on any
platform (it's a CPU compatibility issue). As for not using atomic_inc
or dec for writercnt, it's still safe: a reader can never see anything
other than 0 in writercnt (on line 20).

>
> Anyway, here we've got a race condition with respect to
> unlock (last reader in the "previous" readers batch).
>

Could you explain why?

>
> > WaitForSingleObject(rwlock->finished, INFINITE);
>
> Here's the point to decided who's next -- either a whole
> bunch of readers [the first one will release all others
> (next line) and, AFAICS, continuous stream of subsequent
> readers WILL result in writers starvation here) or some
> writer.
>

This is a reader-priority scheme, but not because of that. Take a look
at lines 23 and 29: both the writer and the last reader make
finished.post(), however, the next writer is stuck on a mutex, rather
than on the 'finished' event, and that's why readers will come in
first.

> > WaitForSingleObject(rwlock->writemutex,INFINITE);
>
> Mutual exclusion among writers.
>
> > WaitForSingleObject(rwlock->finished, INFINITE);
>
> Write lock acquisition; here it competes with some lucky
> reader -- first one in the batch of readers (and priorities
> don't really work for that reason, BTW).
>

Right, the next finished.post() will randomly choose a reader or a
writer, which makes this scheme somewhat "fairer" than just
reader-priority. Hence, what we actually have here is: writers ALWAYS
release readers when they finish, and readers MAY release writers or
readers (presumably, depending on the order they locked the 'finished'
event). Isn't it "fair"?

>
> > if (rwlock->wrcounter) {
>
> ``What am I?'' (might well be broken for readers, see above)
>

Actually not, see above.

> > /* If wrcounter is > 0, then we must have a writer lock */
> > rwlock->wrcounter--;
> > SetEvent(rwlock->finished);
>
> BUG. if ( 0 == --rwlock->wrcounter ) SetEvent(rwlock->finished);
>

I don't think so. This code is protected with a mutex, so writecnt can
never be > 1. It's just either 0 or 1.

>
> > ReleaseMutex(rwlock->writemutex);
> > }
> > else {
> > if (InterlockedDecrement(&rwlock->counter) < 0) {
>
> Bye-bye. Here's the major problem: another batch of readers
> (all but first one) may easily pass lock() prior to ResetEvent(
> rwlock->readevent) below... with concurrent access granted to
> some pending writer (on "finished" sema) as well -- result of
> SetEvent(rwlock->finished) below.
>

Nope! The thing is that if there are readers waiting on line 8, it
means readercnt is > 0 and thus, the exiting reader on line 26 will
not execute lines 28/29. On the other hand, if readercnt is 0, there
seems to be a race condition between lines 3 and 26, but the order of
wait's and post's on both event objects actually prevents it.

> > ResetEvent(rwlock->readevent);
> > SetEvent(rwlock->finished);
>
> Last reader kinda turns-off-the-lights (race condition).
>

Why?

The other question is whether this rwlock is reentrant. And besides,
how can you change the priorities here, if at all possible. Other than
that it seems pretty good to me.

Thanks,

--
Hovik Melikyan

Hovik Melikyan

unread,

Sep 18, 2002, 7:23:41 AM9/18/02

to

sbu...@dev0.welkyn.com (Sean P. Burke) wrote in message news:<82hegos...@dev0.welkyn.com>...

>
> To be serious, this lock shouldn't starve anyone. The intention is
> that when a writer releases a lock, any currently pending readers
> are allowed in. If there are also pending writers, then any new
> readers attempting to acquire the lock will wait until the next
> writer has finished, so that no thread is locked out indefinitely.
>

... which means, if there are many writers (more than readers)
continuously approaching the point of locking, they will have priority
over poor readers and they will never let them in.

My concern is that in the real world of networking any "unfairness" of
such kind opens a possibility of denial-of-service attacks. If someone
knows how you distribute priorities between threads, he can
deliberately cause starvation and hence degradation of your service.
The best thing for the developer is to know that the service he writes
can not be broken that way.

Regards,

--
Hovik Melikyan

Alexander Terekhov

unread,

Sep 18, 2002, 8:43:57 AM9/18/02

to

Hovik Melikyan wrote:
>
> Thanks for your comments, Alexander.
>
> Alexander Terekhov <tere...@web.de> wrote in message >
> > http://sources.redhat.com/pthreads-win32
> > (Win32 condvars emulation, to begin with)
> >
>
> This is a condvar implementation, which in its turn is based on two
> semaphores and a mutex. As a result, you have 4 semaphores and 4
> mutex'es for a single rwlock object! Hmmm... I'd rahter think about
> involving native Win32 objects than sticking on anti-microsoft or
> whatever.

Have you also noticed that pthreads-win32's rwlock object is
actually built along the lines of: < from my previous reply >

: http://groups.google.com/groups?selm=3B166244.F923B993%40web.de

: (Subject: Re: how to distinguish between read and write locks?)
:
: "....
: for example (version w/o condition variables and with
: "no read/write preference" policy; pseudo-code): ...."

where cndSharedAccessCompleted could be replaced easily with
binary sema (ms-auto-reset event), thread cancellation aside
for a moment?

http://sources.redhat.com/cgi-bin/cvsweb.cgi/pthreads/implement.h?rev=1.88&content-type=text/x-cvsweb-markup&cvsroot=pthreads-win32

http://sources.redhat.com/cgi-bin/cvsweb.cgi/pthreads/pthread_rwlock_unlock.c?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=pthreads-win32

http://sources.redhat.com/cgi-bin/cvsweb.cgi/pthreads/pthread_rwlock_wrlock.c?rev=1.2&content-type=text/x-cvsweb-markup&cvsroot=pthreads-win32

http://sources.redhat.com/cgi-bin/cvsweb.cgi/pthreads/ptw32_rwlock_cancelwrwait.c?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=pthreads-win32

> The reality is [...Unix, Apache, etc...]

The "reality" is also this:

http://groups.google.com/groups?selm=%23AsFvRtHCHA.2144%40tkmsftngp13
(Subject: Re: Apache won't compile)

"....the new config.guess and config.sub understand about Interix3 and
are doing pretty well. ...."

> So, Event is a binary semaphore with two different modes of operation:
> when signaling a manual-reset event, it releases all threads waiting
> for the event; when signaling an auto-reset event, it lets only one
> thread in and remains in a non-signaled state.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/setevent.asp

"The state of a manual-reset event object remains signaled until
it is set explicitly to the nonsignaled state by the ResetEvent
function. Any number of waiting threads, or threads that
subsequently begin wait operations for the specified event
object by calling one of the wait functions, can be released
while the object's state is signaled.

The state of an auto-reset event object remains signaled until
a single waiting thread is released, at which time the system
automatically sets the state to nonsignaled. If no threads are
waiting, the event object's state remains signaled.

Setting an event that is already reset has no effect."

> As you can see Event's
> behaviour is somewhat resembling pthread_cond_signal and
> pthread_cond_broadcast, but of course it's not quite the same.

http://groups.google.com/groups?selm=3CE0BCD3.EF695748%40web.de
(Subject: Re: The implementation of condition variables in pthreads-win32)

> I'm re-writing the APR code to some kind of a OO pseudo-code. I'll try
> then to answer to your comments. (Actually I did write a wrapper class
> for this event/mutex implementation on Windows and tested it.
> Surprisingly it works perfectly :) even under very stressing

> conditions.) ....

Yeah, and you can easily break it with one or two Sleeps added,
I guess. ;-)

> > > if (InterlockedIncrement(&rwlock->counter) == 0) {
> >
> > This MS-thing should really provide ACQUIRE memory sync.
> > semantics in order to allow unlock to see proper value of
> > rwlock->wrcounter variable -- and you should really ask MS
> > folks whether it really works that way... ;-)
> >
>

> Are you talking about InterlockedIncrement? ....

http://groups.google.com/groups?threadm=c29b5e33.0202140541.7df2bb9d%40posting.google.com
(Subject: Re: Can multiple threads set a global variable simultaneously?)

> > Anyway, here we've got a race condition with respect to
> > unlock (last reader in the "previous" readers batch).
> >
>
> Could you explain why?

See below.

> > > WaitForSingleObject(rwlock->finished, INFINITE);
> >
> > Here's the point to decided who's next -- either a whole
> > bunch of readers [the first one will release all others
> > (next line) and, AFAICS, continuous stream of subsequent
> > readers WILL result in writers starvation here) or some
> > writer.
> >
>
> This is a reader-priority scheme, but not because of that. Take a look
> at lines 23 and 29: both the writer and the last reader make
> finished.post(), however, the next writer is stuck on a mutex, rather
> than on the 'finished' event, and that's why readers will come in
> first.

Not really. See the event implementation pseudo-code (link above).

> > > WaitForSingleObject(rwlock->writemutex,INFINITE);
> >
> > Mutual exclusion among writers.
> >
> > > WaitForSingleObject(rwlock->finished, INFINITE);
> >
> > Write lock acquisition; here it competes with some lucky
> > reader -- first one in the batch of readers (and priorities
> > don't really work for that reason, BTW).
> >
>
> Right, the next finished.post() will randomly choose a reader or a
> writer, which makes this scheme somewhat "fairer" than just
> reader-priority. Hence, what we actually have here is: writers ALWAYS
> release readers when they finish, and readers MAY release writers or
> readers (presumably, depending on the order they locked the 'finished'
> event). Isn't it "fair"?

See rwlock implementation with one single "entry" lock ("try"-
readers aside) which is supposed to make it "kinda fair" (link
above).

> > > if (rwlock->wrcounter) {
> >
> > ``What am I?'' (might well be broken for readers, see above)
> >
>
> Actually not, see above.

Right. "see above" ;-)

> > > /* If wrcounter is > 0, then we must have a writer lock */
> > > rwlock->wrcounter--;
> > > SetEvent(rwlock->finished);
> >
> > BUG. if ( 0 == --rwlock->wrcounter ) SetEvent(rwlock->finished);
> >
>
> I don't think so. This code is protected with a mutex, so writecnt can
> never be > 1. It's just either 0 or 1.

The code is protected with MS >>recursive<< mutex. To make it
"correct" (other problems aside for a moment), you'd need to
add more "if":

APR_DECLARE(apr_status_t) apr_thread_rwlock_wrlock(apr_thread_rwlock_t
*rwlock)
{
WaitForSingleObject(rwlock->writemutex,INFINITE);

if ( 0 == rwlock->wrcounter )

WaitForSingleObject(rwlock->finished, INFINITE);
rwlock->wrcounter++;
return APR_SUCCESS;
}

> > > ReleaseMutex(rwlock->writemutex);

> > > }
> > > else {
> > > if (InterlockedDecrement(&rwlock->counter) < 0) {
> >
> > Bye-bye. Here's the major problem: another batch of readers
> > (all but first one) may easily pass lock() prior to ResetEvent(
> > rwlock->readevent) below... with concurrent access granted to
> > some pending writer (on "finished" sema) as well -- result of
> > SetEvent(rwlock->finished) below.
> >
>
> Nope! The thing is that if there are readers waiting on line 8, it
> means readercnt is > 0 and thus, the exiting reader on line 26 will
> not execute lines 28/29. On the other hand, if readercnt is 0, there
> seems to be a race condition between lines 3 and 26, but the order of
> wait's and post's on both event objects actually prevents it.
>
> > > ResetEvent(rwlock->readevent);
> > > SetEvent(rwlock->finished);
> >
> > Last reader kinda turns-off-the-lights (race condition).
> >
>
> Why?

Try to imagine what could happen if you'd add this:

: > > > ReleaseMutex(rwlock->writemutex);

: > > > }
: > > > else {
: > > > if (InterlockedDecrement(&rwlock->counter) < 0) {

----------------> Sleep( rand() ); // Let's BREAK it...

> The other question is whether this rwlock is reentrant.

What do you mean?

regards,
alexander.

Sean P. Burke

unread,

Sep 18, 2002, 11:34:11 AM9/18/02

to

ho...@moon.yerphi.am (Hovik Melikyan) writes:

> sbu...@dev0.welkyn.com (Sean P. Burke) wrote in message news:<82hegos...@dev0.welkyn.com>...
> >
> > To be serious, this lock shouldn't starve anyone. The intention is
> > that when a writer releases a lock, any currently pending readers
> > are allowed in. If there are also pending writers, then any new
> > readers attempting to acquire the lock will wait until the next
> > writer has finished, so that no thread is locked out indefinitely.
> >
>
> ... which means, if there are many writers (more than readers)
> continuously approaching the point of locking, they will have priority
> over poor readers and they will never let them in.

Perhaps you overlooked this subtlety - we set lock->active_readers
to be -1 in the case of an active writer, so the write-lock operation
waits on the ok_to_write condition if the lock is held by anyone:

while (lock->valid && lock->active_readers)
pthread_cond_wait(&lock->ok_to_write, &lock->mutex);

When a writer thread releases the lock, it favors pending
readers:

if (lock->pending_readers != 0)
/* Wake all readers.
*/
pthread_cond_broadcast(&lock->ok_to_read);

else
/* Wake a single writer.
*/
pthread_cond_signal(&lock->ok_to_write);

Now, I suppose that a new writer thread could grab the lock
in a narrow window after the old writer releases the mutex,
but before a reader thread waking from the cond_wait can
acquire the mutex. I would expect this to be rare.

> My concern is that in the real world of networking any "unfairness" of
> such kind opens a possibility of denial-of-service attacks. If someone
> knows how you distribute priorities between threads, he can
> deliberately cause starvation and hence degradation of your service.
> The best thing for the developer is to know that the service he writes
> can not be broken that way.

Well, there's no need to speculate, since we can test the code.
Were you able to demonstrate lockout of readers in your testing?

-SEan

Hovik Melikyan

unread,

Sep 19, 2002, 4:00:48 AM9/19/02

to

sbu...@dev0.welkyn.com (Sean P. Burke) wrote in message

> Well, there's no need to speculate, since we can test the code.
> Were you able to demonstrate lockout of readers in your testing?

Well, I used a program that starts some number of threads. Each thread
iteratively locks the rwlock object for reading or writing, randomly
chosen, but with the prespecified ratio, e.g. 1 writer and 5 readers,
or vice versa. Readers plot dots, and writers plot w's. The picture
you get when running the test gives approximate notion of what's going
on, who's starving etc. I tried both your code and my modifications to
it. Actually the test is much longer, I'm giving just few lines from
each run:

SEAN, readers/writers = 5/1

w........ww...w........w..w..ww............w...w..w..w..........w....w.........
......ww..........................w........w..w......w.....w.ww..w...ww.w..w..w

SEAN, readers/writers = 1/5

.www.w.www..w.wwwwww..wwwwwwww.wwwwwwwwwwww.wwwwww..wwwwwwwwwww.ww.w.wwwwwwwwww
wwwwwwww..wwwwww..w..wwwwwwwwww.w.www.wwww..wwwwwww.w.wwwww.wwwwwww.wwwwwww.w.w

HOVIK, readers/writers = 5/1

....wwwwwwwwwww.........................................wwwwwwwwwwwww..........
....................www.................................wwwwwww................

HOVIK, readers/writers = 5/1

wwwwwwwwwwwwwwwwwwwww..............................wwwwwwwwwwwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww.......

In your case we see much better distribution than in mine. However,
yours is running approx. 3-4 times slower (naturally). The maximum
number of readers that entered the critical section simultaneously is
twice as more in my case.

All in all, despite its slowness, a better distribution in your case
is convincing. I'm going to use your algorithm in my library, if you
allow. It's an open-source library and can be found at
http://www.melikyan.com/ptypes/

Thanks,

--
Hovik Melikyan

-----------------------------------------------------------------------------

//
// rwlock test
//

const int rw_max_threads = 30;
const int rw_max_tries = 30;
const int rw_max_delay = 10;
const int rw_rw_ratio = 5;
const bool rw_swap = false;

class rwthread: public thread
{
protected:
virtual void execute();
public:
rwthread(): thread(false) {}
virtual ~rwthread() { waitfor(); }
};

rwlock rw;

int reader_cnt = 0;
int writer_cnt = 0;
int total_writers = 0;
int total_readers = 0;
int max_readers = 0;

int prand(int max)
{
return rand() % max;
}

void rwthread::execute()
{
for(int i = 0; i < rw_max_tries; i++)
{
psleep(prand(rw_max_delay));
bool writer = prand(rw_rw_ratio) == 0;
if (writer ^ rw_swap)
{
rw.wrlock();
pout.put('w');
if (pincrement(&writer_cnt) > 1)
fatal(0xa0, "Writer: Huh?! Writers in here?");
pincrement(&total_writers);
}
else
{
rw.rdlock();
pout.put('.');

// we have a race condition here, so max_readers
// wouldn't be that precise.
int t;
if ((t = pincrement(&reader_cnt)) > max_readers)
max_readers = t;

if (writer_cnt > 0)
fatal(0xa1, "Reader: Huh?! Writers in here?");
pincrement(&total_readers);
}

psleep(prand(rw_max_delay));
if (writer ^ rw_swap)
pdecrement(&writer_cnt);
else
pdecrement(&reader_cnt);

rw.unlock();
}
}

void rwlock_test()
{
// a bug in MSC does not allow us to declare
// an array of static thread objects :(
rwthread* threads[rw_max_threads];

srand((unsigned)time(0));

int i;
for(i = 0; i < rw_max_threads; i++)
{
threads[i] = new rwthread();
threads[i]->start();
}
for(i = 0; i < rw_max_threads; i++)
delete threads[i];

pout.putf("\nmax readers: %d\n", max_readers);
}

t...@cs.ucr.edu

unread,

Sep 19, 2002, 5:24:50 AM9/19/02

to

Hovik Melikyan <ho...@moon.yerphi.am> wrote:
+ sbu...@dev0.welkyn.com (Sean P. Burke) wrote in message
+> Well, there's no need to speculate, since we can test the code.
+> Were you able to demonstrate lockout of readers in your testing?

+ Well, I used a program that starts some number of threads. Each thread
+ iteratively locks the rwlock object for reading or writing, randomly
+ chosen, but with the prespecified ratio, e.g. 1 writer and 5 readers,
+ or vice versa. Readers plot dots, and writers plot w's. The picture
+ you get when running the test gives approximate notion of what's going
+ on, who's starving etc. I tried both your code and my modifications to
+ it. Actually the test is much longer, I'm giving just few lines from
+ each run:

+ SEAN, readers/writers = 5/1

+ w........ww...w........w..w..ww............w...w..w..w..........w....w.........
+ ......ww..........................w........w..w......w.....w.ww..w...ww.w..w..w

+ SEAN, readers/writers = 1/5

+ .www.w.www..w.wwwwww..wwwwwwww.wwwwwwwwwwww.wwwwww..wwwwwwwwwww.ww.w.wwwwwwwwww
+ wwwwwwww..wwwwww..w..wwwwwwwwww.w.www.wwww..wwwwwww.w.wwwww.wwwwwww.wwwwwww.w.w

+ HOVIK, readers/writers = 5/1

+ ....wwwwwwwwwww.........................................wwwwwwwwwwwww..........
+ ....................www.................................wwwwwww................

+ HOVIK, readers/writers = 5/1

+ wwwwwwwwwwwwwwwwwwwww..............................wwwwwwwwwwwwwwwwwwwwwwwwwwww
+ wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww.......

+ In your case we see much better distribution than in mine. However,
+ yours is running approx. 3-4 times slower (naturally). The maximum
+ number of readers that entered the critical section simultaneously is
+ twice as more in my case.

Hmmmmm. I have difficulty accounting for that "3-4 times slower". If
I understand correctly, each algorithm takes the same amount of time
to accomplish the writing tasks, but Hovik's writers-get-priority
policy leads to about half as many batches of concurrent readers.

Even if Hovik's algorithm cut reading time to zero, I can't see a
factor of 3-4 speedup. What am I missing?

Tom Payne

David Schwartz

unread,

Sep 19, 2002, 6:00:30 AM9/19/02

to

Hovik Melikyan wrote:

> ... which means, if there are many writers (more than readers)
> continuously approaching the point of locking, they will have priority
> over poor readers and they will never let them in.

If there are many writers, more than readers, you shouldn'y be using an
rwlock.

> My concern is that in the real world of networking any "unfairness" of
> such kind opens a possibility of denial-of-service attacks. If someone
> knows how you distribute priorities between threads, he can
> deliberately cause starvation and hence degradation of your service.

If you don't want writer priority, you probably don't want an rwlock.

> The best thing for the developer is to know that the service he writes
> can not be broken that way.

If you are really concerned about 'fairness', you probably want a
mode-switching rwlock. It has a more complex state machine than a
regular rwlock.

Essentially, what it does is it allows one writer through, then all
presently waiting readers. As soon as a write lock is requested, no new
readers are granted the lock. But two consecutive write locks are not
granted if there are waiting readers.

So it basically alternates. One writer, all waiting readers, one
writer, all waiting readers. Where 'waiting' means that they've been
waiting since before the writer released the lock.

DS

Hovik Melikyan

unread,

Sep 19, 2002, 10:14:22 AM9/19/02

to

Alexander Terekhov <tere...@web.de> wrote:
>
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/setevent.asp
>

> The state of an auto-reset event object remains signaled until
> a single waiting thread is released, at which time the system
> automatically sets the state to nonsignaled. If no threads are
> waiting, the event object's state remains signaled.
>

The description is incomplete: "If no threads are waiting, the event
object's state remains signaled" ... until the next thread releases
the object. However, wait's and post's are NOT balanced like for an
ordinary semaphore. If you call wait() twice it doesn't necessarily
mean you will release two threads.

This is important for understanding the implementation of rwlock on
Win32.

>
> > > > if (InterlockedIncrement(&rwlock->counter) == 0) {
> > >
> > > This MS-thing should really provide ACQUIRE memory sync.
> > > semantics in order to allow unlock to see proper value of
> > > rwlock->wrcounter variable -- and you should really ask MS
> > > folks whether it really works that way... ;-)
> > >
> >
> > Are you talking about InterlockedIncrement? ....
>
> http://groups.google.com/groups?threadm=c29b5e33.0202140541.7df2bb9d%40posting.google.com
> (Subject: Re: Can multiple threads set a global variable simultaneously?)
>

InterlockedXXX rely on 'LOCK XADD', a i486+ instruction. LOCK is for
SMP, actually. I don't understand why should I care about how the
cache works. LOCK is for locking the main memory for the next
instruction, and that means, any instruction with LOCK should work
exactly the same way as without it on a single-CPU system. The CPU
_should_ write through, validate the cache or the main memory or
whatever it is doing, I don't care, to work AS EXPECTED.

(Frankly, if I would become anti-something, I would start with Intel
rather than with Microsoft).

> http://msdn.microsoft.com/library/default.asp?url=/library/en- us/dllproc/synchro_11df.asp
>
> Does NOT look very convincing to me!
>
> ("Processors can be instructed to force their memory
> caches to agree with main memory..." w.r.t mem.sync
> IS JUST PLAIN MS-STYLE-BULLSHIT, AFAIK/AFAICT/IMHO!)
>

All is not bullshit that smells :) As you can see, they are solving
the problem with InterlockedExchange, because it issues a lock on the
main memory (LOCK XCHG) and thus validates its contents.

>
> > > > if (rwlock->wrcounter) {
> > >
> > > ``What am I?'' (might well be broken for readers, see above)
> > >
> >
> > Actually not, see above.
>
> Right. "see above" ;-)
>

Wrong... (Does anyone look above in this list, after all?!) :)

> > > > /* If wrcounter is > 0, then we must have a writer lock */
> > > > rwlock->wrcounter--;
> > > > SetEvent(rwlock->finished);
> > >
> > > BUG. if ( 0 == --rwlock->wrcounter ) SetEvent(rwlock->finished);
> > >
> >
> > I don't think so. This code is protected with a mutex, so writecnt can
> > never be > 1. It's just either 0 or 1.
>
> The code is protected with MS >>recursive<< mutex. To make it
> "correct" (other problems aside for a moment), you'd need to
> add more "if":
>

... only if the 'finished' event is recursive too. I'm not sure it is.
I'm gonna play with it a little to see what it's doing... Kind of
'experimental programming', which reminds me my true speciality -
experimental physics :)

>
> Try to imagine what could happen if you'd add this:
>
> : > > > ReleaseMutex(rwlock->writemutex);
> : > > > }
> : > > > else {
> : > > > if (InterlockedDecrement(&rwlock->counter) < 0) {
> ----------------> Sleep( rand() ); // Let's BREAK it...
>

I tried it, and again, it worked fine. With 100 threads, or just 2 or
even 1. With Sleep or sleepless - it still works. How many threads and
what situation do you need to break it, exactly?

Regards,

--
Hovik Melikyan

Hovik Melikyan

unread,

Sep 19, 2002, 11:00:00 AM9/19/02

to

David Schwartz <dav...@webmaster.com> wrote in message news:<3D89A03E...@webmaster.com>...

>
> If there are many writers, more than readers, you shouldn'y be using an
> rwlock.
>

Why shouldn't I? If writers prevail, I would give them higher priority
but still, I want my readers to do their job simultaneously as they
don't bother each other when reading the shared resource.

>
> > The best thing for the developer is to know that the service he writes
> > can not be broken that way.
>
> If you are really concerned about 'fairness', you probably want a
> mode-switching rwlock. It has a more complex state machine than a
> regular rwlock.
>

Agreed. And few more words about breaking a service that uses an
rwlock.

Let's say you have an open service with two public functions: Fr and
Fw for reading a shared resource and for writing to it respectively.
If I know that you are using any of the 'unfair' schemes, say,
w-priority, I can open two connections to your service and start
calling Fw repeatedly so that the calls always overlap. Other users
calling Fr will wait forever. Notice, I need only 2 connections, not
1000.

> Essentially, what it does is it allows one writer through, then all
> presently waiting readers. As soon as a write lock is requested, no new
> readers are granted the lock. But two consecutive write locks are not
> granted if there are waiting readers.
>

I think that's exactly what Sean's algorithm is doing. Correct me if
I'm mistaken.

Regards,

--
Hovik Melikyan

Hovik Melikyan

unread,

Sep 19, 2002, 11:14:42 AM9/19/02

to

t...@cs.ucr.edu wrote in message news:<amc552$nn6$1...@glue.ucr.edu>...

>
> Hmmmmm. I have difficulty accounting for that "3-4 times slower". If
> I understand correctly, each algorithm takes the same amount of time
> to accomplish the writing tasks, but Hovik's writers-get-priority
> policy leads to about half as many batches of concurrent readers.
>

This comes from real timing tests. In Sean's case the maximum number
of readers is always half the total number of threads, whereas in my
case they are equal. In other words, my algorithm (conventionally,
it's not mine actually) uses reads more effectively. As for "3-4
times", as I said, it's coming from tests, I can't tell you exactly
why it's just 3-4, although there must be some explanation, of course.

--
Hovik Melikyan

Alexander Terekhov

unread,

Sep 19, 2002, 11:33:18 AM9/19/02

to

Hovik Melikyan wrote:
[...]
> InterlockedXXX rely on 'LOCK XADD', a i486+ instruction. ....

Really? How fascinating.

http://search.microsoft.com/default.asp?qu=%22non%2Dx86+systems%22&boolean=ALL&nq=NEW&so=RECCNT&p=1&ig=01&i=00&i=01&i=02&i=03&i=04&i=05&i=06&i=07&i=08&i=09&i=10&i=11&i=12&i=13&i=14&i=15&i=16&i=17&i=18&i=19&i=20&i=21&i=22&i=23&i=24&i=25&i=26&i=27&i=28&i=29&i=30&i=31&i=32&i=33&i=34&i=35&i=36&i=37&i=38&i=39&i=40&i=41&siteid=us/dev
(Search Results for ""non-x86 systems"" using All words)

> > : > > > if (InterlockedDecrement(&rwlock->counter) < 0) {
> > ----------------> Sleep( rand() ); // Let's BREAK it...
> >
>
> I tried it, and again, it worked fine. With 100 threads, or just 2 or
> even 1. With Sleep or sleepless - it still works. How many threads and
> what situation do you need to break it, exactly?

Nah, let's first settle MS-InterlockedXXX-class-action-lawsuit, okay? ;-)

regards,
alexander.

Sean P. Burke

unread,

Sep 19, 2002, 1:46:01 PM9/19/02

to

ho...@moon.yerphi.am (Hovik Melikyan) writes:

> David Schwartz <dav...@webmaster.com> wrote in message news:<3D89A03E...@webmaster.com>...
> >
> > If there are many writers, more than readers, you shouldn'y be using an
> > rwlock.
> >
>
> Why shouldn't I? If writers prevail, I would give them higher priority
> but still, I want my readers to do their job simultaneously as they
> don't bother each other when reading the shared resource.

Look at the overhead of the readers/writer lock: in the uncontended
case, you must do two mutex lock/unlock operations for each access
to the protected object, compared to a single lock/unlock if you use
a simple mutex. When writers contend for a lock, the overhead is
three mutex_lock/unlock cycles plus a cond_wait and cond_signal.

Mutex lock and unlock operations are expensive to the extent that they
operate on shared data, and can therefore induce cache ping-pong, and
they invoke memory barriers on most SMP machines. I think that testing
would reveal that many applications of rwlocks could be replaced by
simple mutexes with a net gain in performance.

> > > The best thing for the developer is to know that the service he writes
> > > can not be broken that way.
> >
> > If you are really concerned about 'fairness', you probably want a
> > mode-switching rwlock. It has a more complex state machine than a
> > regular rwlock.
> >
>
> Agreed. And few more words about breaking a service that uses an
> rwlock.
>
> Let's say you have an open service with two public functions: Fr and
> Fw for reading a shared resource and for writing to it respectively.
> If I know that you are using any of the 'unfair' schemes, say,
> w-priority, I can open two connections to your service and start
> calling Fw repeatedly so that the calls always overlap. Other users
> calling Fr will wait forever. Notice, I need only 2 connections, not
> 1000.
>
> > Essentially, what it does is it allows one writer through, then all
> > presently waiting readers. As soon as a write lock is requested, no new
> > readers are granted the lock. But two consecutive write locks are not
> > granted if there are waiting readers.
> >
>
> I think that's exactly what Sean's algorithm is doing. Correct me if
> I'm mistaken.

I copied the algorthim from Uresh Vahalia's "UNIX Internals: The New
Frontiers", ISBN 0-13-101908-2, a fascinating book that I recommend
most heartily. Curt Schimmel's "UNIX Systems for Modern Architectures",
ISBN 0-201-63338-8, is also an excellent read in the same area.

-SEan

Drazen Kacar

unread,

Sep 19, 2002, 2:05:46 PM9/19/02

to

Sean P. Burke wrote:

> Look at the overhead of the readers/writer lock: in the uncontended
> case, you must do two mutex lock/unlock operations for each access
> to the protected object, compared to a single lock/unlock if you use
> a simple mutex. When writers contend for a lock, the overhead is
> three mutex_lock/unlock cycles plus a cond_wait and cond_signal.
>
> Mutex lock and unlock operations are expensive to the extent that they
> operate on shared data, and can therefore induce cache ping-pong, and
> they invoke memory barriers on most SMP machines. I think that testing
> would reveal that many applications of rwlocks could be replaced by
> simple mutexes with a net gain in performance.

Is that still true if one uses say, pthread_rwlock_* functions? The thread
library should be able to optimize somewhat, but I don't know how
effective that is compared to the implementation built on a mutex and
condition variables.

--
.-. .-. I don't work here. I'm a consultant.
(_ \ / _)
| da...@willfork.com
|

David Schwartz

unread,

Sep 19, 2002, 3:45:06 PM9/19/02

to

Hovik Melikyan wrote:

> > If there are many writers, more than readers, you shouldn'y be using an
> > rwlock.

> Why shouldn't I?

Because the added complexity will not gain you anything unless there
are more readers than writers.

> If writers prevail, I would give them higher priority
> but still, I want my readers to do their job simultaneously as they
> don't bother each other when reading the shared resource.

But they won't unless there are more readers than writers. They'll wind
up waiting for the writers to finish.

> > > The best thing for the developer is to know that the service he writes
> > > can not be broken that way.

> > If you are really concerned about 'fairness', you probably want a
> > mode-switching rwlock. It has a more complex state machine than a
> > regular rwlock.

> Agreed. And few more words about breaking a service that uses an
> rwlock.

> Let's say you have an open service with two public functions: Fr and
> Fw for reading a shared resource and for writing to it respectively.
> If I know that you are using any of the 'unfair' schemes, say,
> w-priority, I can open two connections to your service and start
> calling Fw repeatedly so that the calls always overlap. Other users
> calling Fr will wait forever. Notice, I need only 2 connections, not
> 1000.

You shouldn't do this anyway. Mutexes will break the same way. In
principle, a mutex could be implemented to give the thread that was
created the earliest priority. So if you map threads to connections, a
connection that repeatedly does anything involving the mutex will starve
all other connections that want the mutex.

But I don't think mutex or lock fairness is the correct solution. I
think the solution is to write your code so that you don't care who gets
the lock. Since you decide what work is done once the lock is acquired,
you shouldn't care which thread gets it, since any thread will do
exactly what work you ask it to.

To a certain extent, rwlock's are an exception because there's a
fundamental difference between a thread that gets a read lock and a
thread that gets a write lock. So you can't sensibly code them to behave
exactly the same.

IMO, the correct and safe way to use an rwlock is in a case where you
would otherwise use a mutex, but you want the possibility that
concurrent readers will give you a performance boost. Because there's
more overhead, that performance boost should be expected to be
significant because otherwise you will wind up losing more than you
gain.

The rationale behind write priority is that if you have many more
readers than writers and you don't allow a write lock until there happen
to be no readers, you may delay threads that want the write lock for
huge amounts of time. However, unless you have a special case (for
example, where writers obsolete previous data entirely), you probably
would prefer for writers not to be able to monopolize the lock and shut
out readers either.

DS

Ziv Caspi

unread,

Sep 19, 2002, 6:23:12 PM9/19/02

to

On Thu, 19 Sep 2002 17:33:18 +0200, Alexander Terekhov
<tere...@web.de> wrote:

>
>Hovik Melikyan wrote:
>[...]
>> InterlockedXXX rely on 'LOCK XADD', a i486+ instruction. ....
>
>Really? How fascinating.
>
>http://search.microsoft.com/default.asp?qu=%22non%2Dx86+systems%22&boolean=ALL&nq=NEW&so=RECCNT&p=1&ig=01&i=00&i=01&i=02&i=03&i=04&i=05&i=06&i=07&i=08&i=09&i=10&i=11&i=12&i=13&i=14&i=15&i=16&i=17&i=18&i=19&i=20&i=21&i=22&i=23&i=24&i=25&i=26&i=27&i=28&i=29&i=30&i=31&i=32&i=33&i=34&i=35&i=36&i=37&i=38&i=39&i=40&i=41&siteid=us/dev
>(Search Results for ""non-x86 systems"" using All words)

I don't get your point. Obviously, on non-x86 systems, the
implementation is not 'LOCK XADD', but something that is appropriate
for the target architecture.

Ziv

t...@cs.ucr.edu

unread,

Sep 20, 2002, 2:12:26 AM9/20/02

to

Hovik Melikyan <ho...@moon.yerphi.am> wrote:
+ t...@cs.ucr.edu wrote in message news:<amc552$nn6$1...@glue.ucr.edu>...
+>
+> Hmmmmm. I have difficulty accounting for that "3-4 times slower". If
+> I understand correctly, each algorithm takes the same amount of time
+> to accomplish the writing tasks, but Hovik's writers-get-priority
+> policy leads to about half as many batches of concurrent readers.
+>

+ This comes from real timing tests. In Sean's case the maximum number
+ of readers is always half the total number of threads, whereas in my
+ case they are equal. In other words, my algorithm (conventionally,
+ it's not mine actually) uses reads more effectively. As for "3-4
+ times", as I said, it's coming from tests, I can't tell you exactly
+ why it's just 3-4, although there must be some explanation, of course.
^^^^^^^^
I would expect far less speedup. Was it the same for the 5/1 test as
for the 1/5 test? I'm suspecting that there was some flaw in the
implementation or the testing of those algorithms.

Tom Payne

Hovik Melikyan

unread,

Sep 20, 2002, 3:30:00 AM9/20/02

to

David Schwartz <dav...@webmaster.com> wrote in message news:<3D8A2942...@webmaster.com>...

>
> > Let's say you have an open service with two public functions: Fr and
> > Fw for reading a shared resource and for writing to it respectively.

> > [...]

>
> You shouldn't do this anyway. Mutexes will break the same way.

They won't, because mutexes work either on a FIFO or on a random
basis. If you provide a public function F which uses a simple mutex,
starvation of concurrent users will be exponential, whereas with
'unfair' Fr/Fw it is discrete: only two overlapping iterative calls to
the prioritized function will _stop_ the service, not just slow it
down.

>
> IMO, the correct and safe way to use an rwlock is in a case where you
> would otherwise use a mutex, but you want the possibility that
> concurrent readers will give you a performance boost. Because there's
> more overhead, that performance boost should be expected to be
> significant because otherwise you will wind up losing more than you
> gain.
>

Regarding the overhead: (1) you can use rwlocks in your app with the
hope that once the question of prioritization is settled and somehow
standardized, they will become efficient system-level objects; (2) in
the real-world cases I deal with currently, the protected parts of my
code may intensively use dynamic memory allocation and even disk I/O
(at least for logging), so that few more CPU instructions is nothing
compared to them.

Regards,

--
Hovik Melikyan

P.S. I'm sorry for my slowness, but actually it's not me, that's
Google posts with a big delay, from 3 to 9 hours. It's damn anti-spam
or anti-idunno-what policy... :((

David Butenhof

unread,

Sep 20, 2002, 9:32:13 AM9/20/02

to

Drazen Kacar wrote:

"An implementation [of read-write locks] built on a mutex and condition
variables" is a poor read-write lock. Really poor. The only advantage is
that it "implements the API".

Remember that CV wakeup requires locking the mutex. Every CV wakeup. So what
happens when you broadcast to a CV to wake up a swarm of "concurrent
readers"? Well, there are various optimizations, but essentially EACH of
them, IN TURN (one at a time) will acquire the mutex, manipulate the RWlock
shared data to get out of the API, and then start executing the protected
application code.

Now, if your readers do a lot of work while holding the read lock (which is
generally a bad idea), then there's a chance that some of the other readers
will have time to wake up and progress in parallel before you're done. If
it's short, most likely you'll have serial access anyway -- exactly as if
you'd used a mutex but with more overhead.

A real implementation of pthread_rwlock_* CAN (but need not) support true
parallel scheduling of concurrent readers across a set of available CPUs.
It's not trivial, because it means the awakened readers cannot require any
locks on "the way out". Even atomic memory operations (like incrementing a
reader count) reduces concurrency by provoking memory contention (and
transfer of cache line ownership).

So the answer is that the ideal thread library implementation can do far
better than you can possibly do with anything resembling portable code.
Otherwise, you're pretty much out of luck and you might as well stick with
just a mutex.

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

Alexander Terekhov

unread,

Sep 20, 2002, 9:41:16 AM9/20/02

to

Ziv Caspi wrote:
>
> On Thu, 19 Sep 2002 17:33:18 +0200, Alexander Terekhov
> <tere...@web.de> wrote:

[...InterlockedXXX...]

> I don't get your point.

My point is this:

http://groups.google.com/groups?threadm=c29b5e33.0202140541.7df2bb9d%40posting.google.com
(Subject: Re: Can multiple threads set a global variable simultaneously?)

> Obviously, on non-x86 systems, the

> implementation is not 'LOCK XADD', but something that is appropriate
> for the target architecture.

http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000023.html
http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000024.html
http://gee.cs.oswego.edu/dl/concurrent/java/util/concurrent/AtomicInteger.html

So,

// Java's {proposed} atomic integer increment operation: <behaviorally>
for ( ;; ) {
int i = get(); // VOLATILE-ACQUIRE
if ( attemptUpdate(i, i+1) ) // VOLATILE-RELEASE
break;
}

HTH.

regards,
alexander.

David Schwartz

unread,

Sep 20, 2002, 2:17:05 PM9/20/02

to

Hovik Melikyan wrote:

> > You shouldn't do this anyway. Mutexes will break the same way.

> They won't, because mutexes work either on a FIFO or on a random
> basis.

Do you have a reference for this claim? It was my understanding that
which thread gets the mutex when their priorities were equal is
unspecified.

> If you provide a public function F which uses a simple mutex,
> starvation of concurrent users will be exponential, whereas with
> 'unfair' Fr/Fw it is discrete: only two overlapping iterative calls to
> the prioritized function will _stop_ the service, not just slow it
> down.

What if the mutex implementation is "thread that has the lowest thread
ID gets the mutex"?

DS

Ziv Caspi

unread,

Sep 21, 2002, 5:50:05 AM9/21/02

to

On Fri, 20 Sep 2002 15:41:16 +0200, Alexander Terekhov
<tere...@web.de> wrote:

>> I don't get your point.
>
>My point is this:
>
>http://groups.google.com/groups?threadm=c29b5e33.0202140541.7df2bb9d%40posting.google.com
>(Subject: Re: Can multiple threads set a global variable simultaneously?)

Perhaps I'm a bit dense, but I still don't see the point. Win32's
InterlockedXxx guarantee the proper behavior for global vars. It's the
OS's job to work in a single-processor system, x86 or not, as well as
SMP system. Nobody suggests that you use LOCK XADD in your code -- you
are supposed to use InterlockedXxx instead, and are assured by the
system that it would work across architectures.

(The link to MSDN about flushing processor caches is broken, so I
can't follow that one to see what you were referring to.)

>http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000023.html
>http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000024.html
>http://gee.cs.oswego.edu/dl/concurrent/java/util/concurrent/AtomicInteger.html

I'm not sure why Java is relevant to this discussion. It's the "job"
of Win32 to provide these guarantees; the OS is, after all, the best
place to make such decisions. This works regardless of the runtime
that called these functions.

Ziv

t...@cs.ucr.edu

unread,

Sep 21, 2002, 5:33:11 AM9/21/02

to

David Butenhof <David.B...@compaq.com> wrote:
[...]
+ A real implementation of pthread_rwlock_* CAN (but need not) support true
+ parallel scheduling of concurrent readers across a set of available CPUs.

Hmmmmm. Extractiing those readers from the ready queue (or wherever)
seems inherently serial. What am I missing?

Tom Payne

Alexander Terekhov

unread,

Sep 21, 2002, 10:03:23 AM9/21/02

to

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/synchronization_and_multiprocessor_issues.asp
("Synchronization and Multiprocessor Issues", *MS-STYLE-BULLSHIT*)

http://www.crhc.uiuc.edu/ece412/papers/models_tutorial.pdf
(Shared Memory Consistency Models: A Tutorial)

Ziv Caspi wrote:
>
> On Fri, 20 Sep 2002 15:41:16 +0200, Alexander Terekhov

> ?tere...@web.de? wrote:
>
> ?? I don't get your point.
> ?
> ?My point is this:
> ?
> ?http://groups.google.com/groups?threadm=c29b5e33.0202140541.7df2bb9d%40posting.google.com
> ?(Subject: Re: Can multiple threads set a global variable simultaneously?)

>
> Perhaps I'm a bit dense, but I still don't see the point. Win32's
> InterlockedXxx guarantee the proper behavior for global vars. It's the
> OS's job to work in a single-processor system, x86 or not, as well as
> SMP system. Nobody suggests that you use LOCK XADD in your code -- you
> are supposed to use InterlockedXxx instead, and are assured by the
> system that it would work across architectures.
>
> (The link to MSDN about flushing processor caches is broken, so I
> can't follow that one to see what you were referring to.)
>

> ?http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000023.html
> ?http://altair.cs.oswego.edu/pipermail/concurrency-interest/2002-February/000024.html
> ?http://gee.cs.oswego.edu/dl/concurrent/java/util/concurrent/AtomicInteger.html

Alexander Terekhov

unread,

Sep 21, 2002, 10:41:18 AM9/21/02

to

t...@cs.ucr.edu wrote:

>
> David Butenhof ?David.B...@compaq.com? wrote:
> [...]
> + A real implementation of pthread_rwlock_* CAN (but need not) support true
> + parallel scheduling of concurrent readers across a set of available CPUs.
>
> Hmmmmm. Extractiing those readers from the ready queue (or wherever)
> seems inherently serial. What am I missing?

Consider the following rwlock algorithm where pending readers stay
blocked on mutex (in addition to "kinda fairness" it allows us to
bypass condvar broadcast operation [for readers], which may somewhat
"confuse" MP-optimized mutexes, though ;-) ). Another problem here
is exactly the "inherently serial" wakeup for readers (unless they
just happen to spin-wait long enough ;-) ).

http://groups.google.com/groups?selm=3B166244.F923B993%40web.de
(Subject: Re: how to distinguish between read and write locks?)

< slightly changed/simplified version; error checking,
cancel-cleanup and code for recursive readlocks omitted >

mtxEntryLock - mutex
mtxReadExitLock - mutex
cndWriteAccessGranted - condvar
nReadEntryCount - int
nReadExitCount - int
nWriteAccessCount - int

rdlock() {

lock( mtxEntryLock );

if ( INT_MAX == ++nReadEntryCount ) {
lock( mtxReadExitLock );
nReadEntryCount -= nReadExitCount;
nReadExitCount = 0;
unlock( mtxReadExitLock );
}

unlock( mtxEntryLock );
}

wrlock() {

lock( mtxEntryLock );

if ( 0 == nWriteAccessCount ) {

lock( mtxReadExitLock );

if ( 0 != nReadExitCount ) {
nReadEntryCount -= nReadExitCount;
nReadExitCount = 0;
}

if ( 0 != nReadEntryCount ) {

nReadExitCount = -nReadEntryCount;
nReadEntryCount = 0;
do {
wait( cndWriteAccessGranted, mtxReadExitLock );
} while( 1 != nWriteAccessCount );
}
else {
nWriteAccessCount = 1;
}

unlock( mtxReadExitLock );
}
else { // recursive write locking
nWriteAccessCount++; // via recursive mtxEntryLock
}
}

unlock() {

auto: bool bGrantWriteAccess

if ( 0 == nWriteAccessCount ) {

lock( mtxReadExitLock );

if ( 0 == ++nReadExitCount ) {
nWriteAccessCount = 1;
bGrantWriteAccess = true;
}
else {
bGrantWriteAccess = false;
}

unlock( mtxReadExitLock );

if ( bGrantWriteAccess ) {
signal( cndWriteAccessGranted );
}
}
else {
nWriteAccessCount--;
unlock( mtxEntryLock );
}
}

regards,
alexander.

Ziv Caspi

unread,

Sep 21, 2002, 4:19:42 PM9/21/02

to

On Sat, 21 Sep 2002 16:03:23 +0200, Alexander Terekhov
<tere...@web.de> wrote:

>http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/synchronization_and_multiprocessor_issues.asp
>("Synchronization and Multiprocessor Issues", *MS-STYLE-BULLSHIT*)
>
>http://www.crhc.uiuc.edu/ece412/papers/models_tutorial.pdf
>(Shared Memory Consistency Models: A Tutorial)

The MSDN article you quote is certainly inadequate. It *does*,
however, indicate what is the approach in Win32 for solving shared
memory problems -- use a Win32 function that is guaranteed to
synchronize.

Ziv

Alexander Terekhov

unread,

Sep 21, 2002, 5:44:16 PM9/21/02

to

Really? This silly MSDN article advocates "Classic-DCL"-like totally
*BROKEN* stuff [AFAICS even on IA32 under its SSE-extension-"relaxed
memory" model, *IA64* aside for a moment], to begin with.

"....
Consequently, the multiprocessor race condition above can be repaired
as follows:

void CacheComputedValue()
{
if (!fValueHasBeenComputed) {
iValue = ComputeValue();
InterlockedExchange((LONG*)&fValueHasBeenComputed, TRUE);
}
}"

Heck, they even claim that:

"The InterlockedExchange function ensures that the value of iValue is
updated for all processors before the value of fValueHasBeenComputed
is set to TRUE."

Yeah, ``inadequate'', indeed.

regards,
alexander.

Ziv Caspi

unread,

Sep 22, 2002, 8:07:59 AM9/22/02

to

On Sat, 21 Sep 2002 23:44:16 +0200, Alexander Terekhov
<tere...@web.de> wrote:

>Really? This silly MSDN article advocates "Classic-DCL"-like totally
>*BROKEN* stuff [AFAICS even on IA32 under its SSE-extension-"relaxed
>memory" model, *IA64* aside for a moment], to begin with.

I think you're missing the fact that the "relaxed" memory model in
Intel's advanced processors does *not* mean that the lock model
featured in previous processors breaks. It simply means there are now
more efficient ways to do the same thing. If there ever was a company
that "understands" binary backward compatibility, it's Intel.

See "Intel Software Developer's Manual Vol. 3: System Programming",
chapter 7 "Multiple-Processor Management", 7.1 "Locked Atomic
Operations", 7.2 "Memory Ordering", and 7.4 "Serializing
Instructions".

>"....
> Consequently, the multiprocessor race condition above can be repaired
> as follows:
>
> void CacheComputedValue()
> {
> if (!fValueHasBeenComputed) {
> iValue = ComputeValue();
> InterlockedExchange((LONG*)&fValueHasBeenComputed, TRUE);
> }
> }"

This statement in MSDN is, in fact, true. The Win32 Interlocked* model
assures callers that there's a memory barrier there. You can rely on
the above pattern working on any machine Windows is available for (and
have been shown to work in stress for up to 32 processors).

Ziv

t...@cs.ucr.edu

unread,

Sep 22, 2002, 9:14:50 AM9/22/02

to

Alexander Terekhov <tere...@web.de> wrote:

+ t...@cs.ucr.edu wrote:
+>
+> David Butenhof ?David.B...@compaq.com? wrote:
+> [...]
+> + A real implementation of pthread_rwlock_* CAN (but need not) support true
+> + parallel scheduling of concurrent readers across a set of available CPUs.
+>
+> Hmmmmm. Extractiing those readers from the ready queue (or wherever)
+> seems inherently serial. What am I missing?

+ Consider the following rwlock algorithm where pending readers stay
+ blocked on mutex (in addition to "kinda fairness" it allows us to
+ bypass condvar broadcast operation [for readers], which may somewhat
+ "confuse" MP-optimized mutexes, though ;-) ). Another problem here
+ is exactly the "inherently serial" wakeup for readers (unless they
+ just happen to spin-wait long enough ;-) ).

+ http://groups.google.com/groups?selm=3B166244.F923B993%40web.de
+ (Subject: Re: how to distinguish between read and write locks?)

[... code snipped ...]

An interesting implementation, but waiters on spinlocks already have
CPUs. What I'm curious about is Dave Butenhof's mention of "true

parallel scheduling of concurrent readers across a set of available
CPUs".

My question presumed that there would be a ready list, i.e., a
priority queue of runnable threads from which available CPUs would get
their threads to run. I've also seen situations where CPUs obtain
their work from a master list of all threads, where each thread has
state bits that tell whether it is runnable, running, waiting etc. I
suppose that the available CPUs could concurrently scan such a list
looking for work if each thread (descriptor) were given its own mutex.
I'm hoping that Dave knows of a better alternative.

Tom Payne

Alexander Terekhov

unread,

Sep 23, 2002, 4:24:15 AM9/23/02

to

Ziv Caspi wrote:
>
> On Sat, 21 Sep 2002 23:44:16 +0200, Alexander Terekhov
> <tere...@web.de> wrote:
>
> >Really? This silly MSDN article advocates "Classic-DCL"-like totally
> >*BROKEN* stuff [AFAICS even on IA32 under its SSE-extension-"relaxed
> >memory" model, *IA64* aside for a moment], to begin with.
>

> I think you're missing the fact that....

http://groups.google.com/groups?selm=3C3C9B63.DFDC9920%40web.de
(Subject: Re: Multi-Processor Concurrency Problem)

> >"....
> > Consequently, the multiprocessor race condition above can be repaired
> > as follows:
> >
> > void CacheComputedValue()
> > {
> > if (!fValueHasBeenComputed) {
> > iValue = ComputeValue();
> > InterlockedExchange((LONG*)&fValueHasBeenComputed, TRUE);
> > }
> > }"
>
> This statement in MSDN is, in fact, true.

<quote>

BOOL FetchComputedValue(int *piResult)
{
if (fValueHasBeenComputed)
{
*piResult = iValue;
return TRUE;
}
else
return FALSE;
}

</quote>

> The Win32 Interlocked* model assures callers that there's a memory

> barrier there. ....

http://groups.google.com/groups?threadm=c29b5e33.0202140541.7df2bb9d%40posting.google.com

(Subject: Re: Can multiple threads set a global variable simultaneously?)

http://www.crhc.uiuc.edu/ece412/papers/models_tutorial.pdf
(Shared Memory Consistency Models: A Tutorial,
see "Figure 1: What value can a read return?")

http://groups.google.com/groups?selm=G3Q49.14%24Fr2.86101%40news.cpqcorp.net
(Subject: Re: DCL and code optimization and etc.)

regards,
alexander.

David Butenhof

unread,

Sep 23, 2002, 10:21:35 AM9/23/02

to

t...@cs.ucr.edu wrote:

Sure, the threads are probably "extracted" serially; but that doesn't take
much time. There's no reason that all the extracted threads can't then be
parcelled out across a set of processors and scheduled concurrently.
(Whether that actually works out to "parallel" depends on many factors; but
at least it's POSSIBLE.)

The point is that CV/MUTEX synchronization requires that the waiters WAKE
serially -- concurrently scheduling the broadcast waiters would merely
guarantee that all but one will need to reblock on the mutex. This
PROHIBITS parallel continuation. (Simple implementations in fact may do
that; more sophisticated implementations optimize the operation by
scheduling only one thread and directly blocking the remainder on the
mutex.)

A true read-write lock cannot generally guarantee parallel continuation, but
can be designed not to prohibit parallel continuation. A CV/MUTEX protocol,
while "simple and general", cannot.

Alexander Terekhov

unread,

Sep 23, 2002, 12:20:15 PM9/23/02

to

David Butenhof wrote:
[...]

> The point is that CV/MUTEX synchronization requires that the waiters WAKE
> serially -- concurrently scheduling the broadcast waiters would merely
> guarantee that all but one will need to reblock on the mutex. This
> PROHIBITS parallel continuation. (Simple implementations in fact may do
> that; more sophisticated implementations optimize the operation by
> scheduling only one thread and directly blocking the remainder on the
> mutex.)

"more sophisticated" == "wait morphing" and, probably, "simple
implementations" == "thundering herd", correct?

http://sunsite.uakom.sk/sunworldonline/swol-09-1999/swol-09-insidesolaris.html
("Kernel synchronization primitives")

regards,
alexander.

t...@cs.ucr.edu

unread,

Sep 24, 2002, 4:05:32 AM9/24/02

to

David Butenhof <David.B...@compaq.com> wrote:
+ t...@cs.ucr.edu wrote:

+> David Butenhof <David.B...@compaq.com> wrote:
+> [...]
+> + A real implementation of pthread_rwlock_* CAN (but need not) support
+> true + parallel scheduling of concurrent readers across a set of available
+> CPUs.
+>
+> Hmmmmm. Extractiing those readers from the ready queue (or wherever)
+> seems inherently serial. What am I missing?

+ Sure, the threads are probably "extracted" serially; but that doesn't take
+ much time. There's no reason that all the extracted threads can't then be
+ parcelled out across a set of processors and scheduled concurrently.
+ (Whether that actually works out to "parallel" depends on many factors; but
+ at least it's POSSIBLE.)

+ The point is that CV/MUTEX synchronization requires that the waiters WAKE
+ serially -- concurrently scheduling the broadcast waiters would merely
+ guarantee that all but one will need to reblock on the mutex. This
+ PROHIBITS parallel continuation. (Simple implementations in fact may do
+ that; more sophisticated implementations optimize the operation by
+ scheduling only one thread and directly blocking the remainder on the
+ mutex.)

If I understood correctly, the OP was interested in a passive-waiting
protocol.

+ A true read-write lock cannot generally guarantee parallel continuation, but
+ can be designed not to prohibit parallel continuation. A CV/MUTEX protocol,
+ while "simple and general", cannot.

So, in a true read-write lock, readers do not recheck their predicate
after waiting, and, therefore, true read-write locks cannot be based
on possibly spurious signalling. Also, true read-write locks use the
tail-waiting optimization, where waiters do not reacquire the mutex
after waiting.

class RWlock : Monitor {

// no spurious signals or broadcasts.
// tail_wait() unlocks Monitor::mutex but doesn't relock it.

Condition okToRead;
Condition okToWrite;
bool writing;
int readerCount;

public:

RWlock()
: okToRead(this),
okToWrite(this),
writing(false),
readerCount(0)
{}

start_read() {
Monitor::lock();
if ( writing || okToWrite.count() ) {
okToRead.tail_wait() // no relocking
}
}

start_write() {
Monitor::lock();
if ( writing || okToRead.count() ) {
okToWrite.tail_wait() // no relocking
}
}

end_read() {
Monitor::lock();
assert( readerCount && ! writing );
if ( ! --readerCount ) {
writing = true;
okToWrite.signal()
}
Monitor::unlock();
}

end_write() {
Monitor::lock();
assert( writing && ! readerCount );
if ( okToRead.count() ) {
readerCount = okToRead.count(); // unoptimized
writing = false;
okToRead.broadcast()
} else {
okToWrite.signal()
}
Monitor::unlock();
}

};

Tom Payne

Alexander Terekhov

unread,

Sep 24, 2002, 5:40:27 AM9/24/02

to

t...@cs.ucr.edu wrote:
[...]

> class RWlock : Monitor {
>
> // no spurious signals or broadcasts.
> // tail_wait() unlocks Monitor::mutex but doesn't relock it.

// tail_signal() unlocks Monitor::mutex and signals condvar.
// tail_broadcast() unlocks Monitor::mutex and broadcasts condvar.

>
> Condition okToRead;
> Condition okToWrite;
> bool writing;
> int readerCount;
>
> public:
>
> RWlock()
> : okToRead(this),
> okToWrite(this),
> writing(false),
> readerCount(0)
> {}
>
> start_read() {
> Monitor::lock();
> if ( writing || okToWrite.count() ) {
> okToRead.tail_wait() // no relocking
> }

else {
++readerCount;

}
> }
>
> start_write() {
> Monitor::lock();
> if ( writing || okToRead.count() ) {
> okToWrite.tail_wait() // no relocking
> }

else {
writing = true;

}
> }
>
> end_read() {
> Monitor::lock();
> assert( readerCount && ! writing );

/*

> if ( ! --readerCount ) {

*/
if ( ! --readerCount && okToWrite.count() ) {
> writing = true;
/*
> okToWrite.signal()
> }
> Monitor::unlock();
*/
okToWrite.tail_signal();
}
else {

Monitor::unlock();
}
> }
>
> end_write() {
> Monitor::lock();
> assert( writing && ! readerCount );
> if ( okToRead.count() ) {
> readerCount = okToRead.count(); // unoptimized
> writing = false;

/*

> okToRead.broadcast()
> } else {
> okToWrite.signal()
> }
> Monitor::unlock();

*/
okToRead.tail_broadcast();
}
else if ( okToWrite.count() ) {
okToWrite.tail_signal();
}
else {
writing = false;
Monitor::unlock();
}
> }
>
> };

---
class RWlock : Monitor {

// no spurious signals or broadcasts.
// tail_wait() unlocks Monitor::mutex but doesn't relock it.

// tail_signal() unlocks Monitor::mutex and signals condvar.
// tail_broadcast() unlocks Monitor::mutex and broadcasts condvar.

Condition okToRead;
Condition okToWrite;
bool writing;
int readerCount;

public:

RWlock()
: okToRead(this),
okToWrite(this),
writing(false),
readerCount(0)
{}

start_read() {
Monitor::lock();
if ( writing || okToWrite.count() ) {
okToRead.tail_wait() // no relocking
}

else {
++readerCount;
}
}

start_write() {
Monitor::lock();
if ( writing || okToRead.count() ) {
okToWrite.tail_wait() // no relocking
}

else {
writing = true;
}
}

end_read() {
Monitor::lock();
assert( readerCount && ! writing );

if ( ! --readerCount && okToWrite.count() ) {
writing = true;
okToWrite.tail_signal();
}
else {
Monitor::unlock();
}
}

end_write() {
Monitor::lock();
assert( writing && ! readerCount );
if ( okToRead.count() ) {
readerCount = okToRead.count(); // unoptimized
writing = false;

okToRead.tail_broadcast();
}
else if ( okToWrite.count() ) {
okToWrite.tail_signal();
}
else {
writing = false;
Monitor::unlock();
}
}

};

---

regards,
alexander.

Alexander Terekhov

unread,

Sep 24, 2002, 5:56:07 AM9/24/02

to

Alexander Terekhov wrote:
[...]
> class RWlock : Monitor {
>
> // no spurious signals or broadcasts.
> // tail_wait() unlocks Monitor::mutex but doesn't relock it.
> // tail_signal() unlocks Monitor::mutex and signals condvar.
> // tail_broadcast() unlocks Monitor::mutex and broadcasts condvar.
>
> Condition okToRead;
> Condition okToWrite;
> bool writing;
> int readerCount;
>
> public:
>
> RWlock()
> : okToRead(this),
> okToWrite(this),
> writing(false),
> readerCount(0)
> {}
>
> start_read() {
> Monitor::lock();
> if ( writing || okToWrite.count() ) {
> okToRead.tail_wait() // no relocking
> }
> else {
> ++readerCount;

Monitor::unlock();

> }
> }
>
> start_write() {
> Monitor::lock();
> if ( writing || okToRead.count() ) {
> okToWrite.tail_wait() // no relocking
> }
> else {
> writing = true;

Monitor::unlock();

> }
> }
>
> end_read() {
> Monitor::lock();
> assert( readerCount && ! writing );
> if ( ! --readerCount && okToWrite.count() ) {
> writing = true;
> okToWrite.tail_signal();
> }
> else {
> Monitor::unlock();
> }
> }
>
> end_write() {
> Monitor::lock();
> assert( writing && ! readerCount );
> if ( okToRead.count() ) {
> readerCount = okToRead.count(); // unoptimized
> writing = false;
> okToRead.tail_broadcast();
> }
> else if ( okToWrite.count() ) {
> okToWrite.tail_signal();
> }
> else {
> writing = false;
> Monitor::unlock();
> }
> }
>
> };

regards,
alexander.

Alexander Terekhov

unread,

Sep 24, 2002, 7:55:43 AM9/24/02

to

Alexander Terekhov wrote: < My bad, twice! >
[...]
> > bool writing;
> > int readerCount;
[...]

> > start_write() {
> > Monitor::lock();
> > if ( writing || okToRead.count() ) {

Yeah, "never thread [&& drink] before lunch". ;-)

< another try >

class RWlock : Monitor {

// no spurious signals or broadcasts.
// tail_wait() unlocks Monitor::mutex but doesn't relock it.
// tail_signal() unlocks Monitor::mutex and signals condvar.
// tail_broadcast() unlocks Monitor::mutex and broadcasts condvar.

Condition okToRead;
Condition okToWrite;
int count; // ==-1:writing ==0:free >=+1:reading

public:

RWlock()
: okToRead(this),
okToWrite(this),
count(0),
{}

start_read() {
Monitor::lock();
if ( 0 > count || okToWrite.count() ) {

okToRead.tail_wait() // no relocking
}
else {

++count;
Monitor::unlock();
}
}

start_write() {
Monitor::lock();
if ( 0 != count ) {

okToWrite.tail_wait() // no relocking
}
else {

count = -1;
Monitor::unlock();
}
}

end_read() {
Monitor::lock();
assert( 0 < count );
if ( 0 == --count && okToWrite.count() ) {
count = -1;
okToWrite.tail_signal();
}
else {
Monitor::unlock();
}
}

end_write() {
Monitor::lock();
assert( 0 > count );
if ( 0 != (count = okToRead.count()) ) {
okToRead.tail_broadcast();
}
else if ( 0 != okToWrite.count() ) {
count = -1;
okToWrite.tail_signal();
}
else {
Monitor::unlock();
}
}

};

regards,
alexander.

David Butenhof

unread,

Sep 24, 2002, 8:08:29 AM9/24/02

to

t...@cs.ucr.edu wrote:

> So, in a true read-write lock, readers do not recheck their predicate
> after waiting, and, therefore, true read-write locks cannot be based
> on possibly spurious signalling. Also, true read-write locks use the
> tail-waiting optimization, where waiters do not reacquire the mutex
> after waiting.

A read-write lock is a LOCK, not a "signalling mechanism" like a CV. The
important distinction is that the implementation knows (and controls) the
predicate, like a mutex. In contrast, a CV is designed to allow the caller
to communicate changes to a predicate about which the implementation can
know (and assume) nothing.

In a read-write lock package based on a mutex and CV, then the caller waits
for the lock (read or write) by waiting on the CV, and on wakeup must
(serially) reacquire the mutex to validate the predicate.

With a "true" read-write lock, there are various alternatives. While a
tail-wait is the "cleanest", it's not the only possibility. The
implementation might instead validate via low-level hardware mechanisms
rather than by taking a "scheduling lock", for example, using
compare-and-swap or simply memory barrier ordering.

t...@cs.ucr.edu

unread,

Sep 24, 2002, 12:20:14 PM9/24/02

to

David Butenhof <David.B...@compaq.com> wrote:
[...]

+ With a "true" read-write lock, there are various alternatives. While a
+ tail-wait is the "cleanest", it's not the only possibility. The
+ implementation might instead validate via low-level hardware mechanisms
+ rather than by taking a "scheduling lock", for example, using
+ compare-and-swap or simply memory barrier ordering.

Thanks. I'd be interested in references on implementing rwlocks via
barriers and/or compare-and-swap instructions.

Tom Payne

Ziv Caspi

unread,

Sep 24, 2002, 3:50:47 PM9/24/02

to

On Mon, 23 Sep 2002 10:24:15 +0200, Alexander Terekhov
<tere...@web.de> wrote:

>http://groups.google.com/groups?selm=3C3C9B63.DFDC9920%40web.de
>(Subject: Re: Multi-Processor Concurrency Problem)

Yes, Intel doesn't guarantee it. This is besides the point -- you get
that guarantee from the Interlocked* functions. If ever Intel were to
issue a processor in which the current implementation of Interlocked*
would fail, you can be sure they'd do their best to have MS issue a
Windows release that supports it (possibly by modifying the
implementation of the Interlocked* functions).

Your logic in that post could also be taken further: Intel doesn't
guarantee anything about future processors, so not only can't we use
Interlocked functions, we *also* can't use critical-sections -- who
knows, perhaps a future processor will break Windows' current
implementation of EnterCriticalSection.

>
>> >"....
>> > Consequently, the multiprocessor race condition above can be repaired
>> > as follows:
>> >
>> > void CacheComputedValue()
>> > {
>> > if (!fValueHasBeenComputed) {
>> > iValue = ComputeValue();
>> > InterlockedExchange((LONG*)&fValueHasBeenComputed, TRUE);
>> > }
>> > }"
>>
>> This statement in MSDN is, in fact, true.
>
><quote>
>
> BOOL FetchComputedValue(int *piResult)
> {
> if (fValueHasBeenComputed)
> {
> *piResult = iValue;
> return TRUE;
> }
> else
> return FALSE;
> }
>
></quote>
>
>> The Win32 Interlocked* model assures callers that there's a memory
>> barrier there. ....
>
>http://groups.google.com/groups?threadm=c29b5e33.0202140541.7df2bb9d%40posting.google.com
>(Subject: Re: Can multiple threads set a global variable simultaneously?)

I think I already answered your concerns here.

>http://www.crhc.uiuc.edu/ece412/papers/models_tutorial.pdf
>(Shared Memory Consistency Models: A Tutorial,
> see "Figure 1: What value can a read return?")

This is irrelevant to our discussion. Win32 *assures* you that
Interlocked will put a barrier in all architectures that require it...

>http://groups.google.com/groups?selm=G3Q49.14%24Fr2.86101%40news.cpqcorp.net
>(Subject: Re: DCL and code optimization and etc.)

...*including* the Alpha architecture to which David was referring.

Ziv

t...@cs.ucr.edu

unread,

Sep 25, 2002, 2:01:45 AM9/25/02

to

Alexander Terekhov <tere...@web.de> wrote:

+ Alexander Terekhov wrote: < My bad, twice! >
+ [...]
+> > bool writing;
+> > int readerCount;
+ [...]
+> > start_write() {
+> > Monitor::lock();
+> > if ( writing || okToRead.count() ) {

+ Yeah, "never thread [&& drink] before lunch". ;-)

+ < another try >

+ class RWlock : Monitor {

+ // no spurious signals or broadcasts.
+ // tail_wait() unlocks Monitor::mutex but doesn't relock it.
+ // tail_signal() unlocks Monitor::mutex and signals condvar.
+ // tail_broadcast() unlocks Monitor::mutex and broadcasts condvar.

I don't like such naked signalling/broadcasting. A passive
Condition's queue of waiters is a shared data structure that requires
lock protection (or some very clever use of low-level instructions).
If we don't protect the Condition with the Monitor's lock, then the
Condition needs its own lock, which adds overhead.

Tom Payne

Alexander Terekhov

unread,

Sep 25, 2002, 6:30:04 AM9/25/02

to

Ziv Caspi wrote:
>
> On Mon, 23 Sep 2002 10:24:15 +0200, Alexander Terekhov
> <tere...@web.de> wrote:
>
> >http://groups.google.com/groups?selm=3C3C9B63.DFDC9920%40web.de
> >(Subject: Re: Multi-Processor Concurrency Problem)
>
> Yes, Intel doesn't guarantee it. This is besides the point -- you get
> that guarantee from the Interlocked* functions.

What guarantee, exactly?

[...]

> >> >"....
> >> > Consequently, the multiprocessor race condition above can be repaired
> >> > as follows:
> >> >
> >> > void CacheComputedValue()
> >> > {
> >> > if (!fValueHasBeenComputed) {
> >> > iValue = ComputeValue();
> >> > InterlockedExchange((LONG*)&fValueHasBeenComputed, TRUE);
> >> > }
> >> > }"
> >>
> >> This statement in MSDN is, in fact, true.
> >
> ><quote>
> >
> > BOOL FetchComputedValue(int *piResult)
> > {
> > if (fValueHasBeenComputed)
> > {
> > *piResult = iValue;
> > return TRUE;
> > }
> > else
> > return FALSE;
> > }
> >
> ></quote>
> >
> >> The Win32 Interlocked* model assures callers that there's a memory
> >> barrier there. ....
> >
> >http://groups.google.com/groups?threadm=c29b5e33.0202140541.7df2bb9d%40posting.google.com
> >(Subject: Re: Can multiple threads set a global variable simultaneously?)
>
> I think I already answered your concerns here.

Nope. Here's my "concerns":

---
Well, but *what* (if any) MEMORY SYNCHRONIZATION semantics do they
provide?

ACQUIRE (mutex.lock)?

RELEASE (mutex.unlock)?

ACQUIRE-RELEASE (both)?
---

And, BTW, I guess, now I should really expand this list:

ACQUIRE (mutex.lock)?

RELEASE (mutex.unlock)?

ACQUIRE-RELEASE (both)?

ZIV-CASPI-MAGIC ("a thing" that would make the CacheComputedValue/
FetchComputedValueexample "example" above work
on IPF(IA64)/etc.)?

> >http://www.crhc.uiuc.edu/ece412/papers/models_tutorial.pdf
> >(Shared Memory Consistency Models: A Tutorial,
> > see "Figure 1: What value can a read return?")
>
> This is irrelevant to our discussion. Win32 *assures* you that
> Interlocked will put a barrier in all architectures that require it...
>
> >http://groups.google.com/groups?selm=G3Q49.14%24Fr2.86101%40news.cpqcorp.net
> >(Subject: Re: DCL and code optimization and etc.)
>
> ...*including* the Alpha architecture to which David was referring.

Yeah, now read this Butenhof's article again. Please pay some special
attention to phrases in >>CAPS<<. (especially the last one. ;-) )

regards,
alexander.

Alexander Terekhov

unread,

Sep 25, 2002, 6:57:54 AM9/25/02

to

t...@cs.ucr.edu wrote:
[...]

> + class RWlock : Monitor {
>
> + // no spurious signals or broadcasts.
> + // tail_wait() unlocks Monitor::mutex but doesn't relock it.
> + // tail_signal() unlocks Monitor::mutex and signals condvar.
> + // tail_broadcast() unlocks Monitor::mutex and broadcasts condvar.
>
> I don't like such naked signalling/broadcasting.

I meant:

condition::tail_broadcast() {
waiter* w = this->queue;
this->queue = 0;
unlock( this->monitor_mutex );
/**/
while ( 0 != w ) { /* "schedule" w...*/ w = w->next; }
}

Here's another version (without "tail_wait"/"lock hand off"
for writers -- I don't like such determinism ;-) ):

class RWlock : Monitor {

// tail_wait() [no spurious wakeups] unlocks Monitor::mutex but doesn't relock it.
// tail_signal() unlocks Monitor::mutex and schedule ONE waiter.
// tail_broadcast() unlocks Monitor::mutex and schedule ALL waiters.

Condition okToRead;
Condition okToWrite;
int count; // ==-1:writing ==0:free >=+1:reading

public:

RWlock()
: okToRead(this),
okToWrite(this),
count(0)
{}

start_read() {
Monitor::lock();
if ( 0 > count || okToWrite.count() ) {
okToRead.tail_wait(); // no relocking
}
else {
++count;
Monitor::unlock();
}
}

start_write() {
Monitor::lock();
while ( 0 != count ) {
okToWrite.wait(); // relocking

}
count = -1;
Monitor::unlock();
}

end_read() {
Monitor::lock();
assert( 0 < count );

if ( 0 == --count && 0 != okToWrite.count() ) {
okToWrite.tail_signal();
}
else {
Monitor::unlock();
}
}

end_write() {
Monitor::lock();
assert( 0 > count );
if ( 0 != (count = okToRead.count()) ) {
okToRead.tail_broadcast();
}
else if ( 0 != okToWrite.count() ) {

t...@cs.ucr.edu

unread,

Sep 25, 2002, 9:46:41 AM9/25/02

to

Alexander Terekhov <tere...@web.de> wrote:

+ t...@cs.ucr.edu wrote:
+ [...]
+> + class RWlock : Monitor {
+>
+> + // no spurious signals or broadcasts.
+> + // tail_wait() unlocks Monitor::mutex but doesn't relock it.
+> + // tail_signal() unlocks Monitor::mutex and signals condvar.
+> + // tail_broadcast() unlocks Monitor::mutex and broadcasts condvar.
+>
+> I don't like such naked signalling/broadcasting.

+ I meant:

+ condition::tail_broadcast() {
+ waiter* w = this->queue;
+ this->queue = 0;
+ unlock( this->monitor_mutex );
+ /**/
+ while ( 0 != w ) { /* "schedule" w...*/ w = w->next; }
+ }

Okay. That has the advantage that the lock available while the
thundering herd of readers is being scheduled.

+ Here's another version (without "tail_wait"/"lock hand off"
+ for writers -- I don't like such determinism ;-) ):

In which case, I presume, signal works pretty much the same as
brodcast, i.e., extracts the first waiter, releases the lock, and then
schedule's the waiter. Very efficient.

For shorter-term waiting, one could simply used implementations of
Condition that are based on spin-waiting.

By the way, has anyone looked at short-term/long-term condition based
on spin-down locking, where the waiter initially spins but shifts to a
queue of passive waiters after spinning for a predetermined time?

Tom Payne

Ziv Caspi

unread,

Sep 27, 2002, 5:29:46 AM9/27/02

to

On Wed, 25 Sep 2002 12:30:04 +0200, Alexander Terekhov
<tere...@web.de> wrote:

>
>Ziv Caspi wrote:
>> Yes, Intel doesn't guarantee it. This is besides the point -- you get
>> that guarantee from the Interlocked* functions.
>
>What guarantee, exactly?

That Interlocked* operations put a memory barrier.

>> I think I already answered your concerns here.
>
>Nope. Here's my "concerns":

I apologize.

>
>---
>Well, but *what* (if any) MEMORY SYNCHRONIZATION semantics do they
>provide?
>
>ACQUIRE (mutex.lock)?
>
>RELEASE (mutex.unlock)?
>
>ACQUIRE-RELEASE (both)?
>---
>
>And, BTW, I guess, now I should really expand this list:
>
>ACQUIRE (mutex.lock)?
>
>RELEASE (mutex.unlock)?
>
>ACQUIRE-RELEASE (both)?
>
>ZIV-CASPI-MAGIC ("a thing" that would make the CacheComputedValue/
> FetchComputedValueexample "example" above work
> on IPF(IA64)/etc.)?

The example is partial, and therefore (as I've already said)
inadequate. It explains how to put an Interlocked* operation in the
write path. Another interlocked operation is also needed in the read
path, but the article is silent on this issue when it should (IMHO)
explicitly mention it.

So ZIV-CASPI-MAGIC (in this case) (it's not actually my magic) is to
perform the read using an interlocked operation. This can be done by
InterlockedCompareExchange whose compare value and set value are the
same.

This is a known bug in the doc.

>Yeah, now read this Butenhof's article again.

I was thrown off as this thread started on x86 issues, and then
migrated to general discussion on processors that don't preserve write
order or read order. As I said, I apologize.

Alexander Terekhov

unread,

Sep 27, 2002, 7:15:10 AM9/27/02

to

Ziv Caspi wrote:
[...]

> >> Yes, Intel doesn't guarantee it. This is besides the point -- you get
> >> that guarantee from the Interlocked* functions.
> >
> >What guarantee, exactly?
>
> That Interlocked* operations put a memory barrier.

Okay. < Intel's SSE/SSE2 stuff, IA32 >

"....
The SFENCE, LFENCE, and MFENCE instructions provide a
performance-efficient way of insuring load and store memory
ordering between routines that produce weakly-ordered results
and routines that consume that data. The functions of these
instructions are as follows:

- SFENCE-Serializes all store (write) operations that
occurred prior to the SFENCE instruction in the program
instruction stream, but does not affect load operations.

- LFENCE-Serializes all load (read) operations that occurred
prior to the LFENCE instruction in the program instruction
stream, but does not affect store operations.

- MFENCE-Serializes all store and load operations that
occurred prior to the MFENCE instruction in the program
instruction stream."

Which one? ;-)

[...]

> >ZIV-CASPI-MAGIC ("a thing" that would make the CacheComputedValue/
> > FetchComputedValueexample "example" above work
> > on IPF(IA64)/etc.)?
>
> The example is partial, and therefore (as I've already said)
> inadequate. It explains how to put an Interlocked* operation in the
> write path. Another interlocked operation is also needed in the read

> path, .... ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^

I don't think so. In kinda-proposed "java atomic terms", you'd need:

< ``Safety'' of *concurrent* "iValue = ComputeValue()" thing ASIDE >

A) AtomicInt.get() -- ACQUIRE semantics (fValueHasBeenComputed read).
B) AtomicInt.set() -- RELEASE semantics (fValueHasBeenComputed write).

regards,
alexander.

Ziv Caspi

unread,

Oct 3, 2002, 3:33:07 PM10/3/02

to

None. As I said, a LOCK prefix is used. With respect to memory, it has
a similar effect as MFENCE, IIRC.

>
>[...]
>> >ZIV-CASPI-MAGIC ("a thing" that would make the CacheComputedValue/
>> > FetchComputedValueexample "example" above work
>> > on IPF(IA64)/etc.)?
>>
>> The example is partial, and therefore (as I've already said)
>> inadequate. It explains how to put an Interlocked* operation in the
>> write path. Another interlocked operation is also needed in the read
>> path, .... ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ^^^^
>
>I don't think so. In kinda-proposed "java atomic terms", you'd need:
>
>< ``Safety'' of *concurrent* "iValue = ComputeValue()" thing ASIDE >
>
>A) AtomicInt.get() -- ACQUIRE semantics (fValueHasBeenComputed read).
>B) AtomicInt.set() -- RELEASE semantics (fValueHasBeenComputed write).

I don't understand what you mean by that. In the MSDN example, there
are two independent reads from memory which can, in some
architectures, by re-ordered. That's why you need an interlocked
operation. (I believe the Alpha docs actually provide this scenario as
a code example, but I don't have them handy to make sure.)

Ziv

Alexander Terekhov

unread,

Oct 4, 2002, 7:41:17 AM10/4/02

to

Ziv Caspi wrote:
>
> On Fri, 27 Sep 2002 13:15:10 +0200, Alexander Terekhov
> <tere...@web.de> wrote:

[...IA32's SEE/SSE2 mbars...]

> >Which one? ;-)
>
> None. As I said, a LOCK prefix is used. With respect to memory, it has
> a similar effect as MFENCE, IIRC.

Well, Intel's IA32 docu says:

"....
Beginning with the P6 family processors, when the LOCK
prefix is prefixed to an instruction and the memory area
being accessed is cached internally in the processor, the
LOCK# signal is generally not asserted. Instead, only the
processor's cache is locked. Here, the processor's cache
coherency mechanism insures that the operation is carried
out atomically with regards to memory.
...."

Can you show me something "official" related to *guarantees*
w.r.t. memory acesss {re-}ordering and the LOCK-thing?

The stuff I've found "merely" says that:

"....
For areas of memory where weak ordering is acceptable, the
write back (WB) memory type can be chosen. Here, reads can
be performed speculatively and writes can be buffered and
combined. For this type of memory, cache locking is performed
on atomic (locked) operations that do not split across cache
lines, which helps to reduce the performance penalty associated
with the use of the typical synchronization instructions, such
as XCHG, that lock the bus during the entire read-modify-write
operation. With the WB memory type, the XCHG instruction locks
the cache instead of the bus if the memory access is contained
within a cache line.

It is recommended that software written to run on Pentium 4,
Intel Xeon, and P6 family processors assume the processor-
ordering model or a weaker memory-ordering model. The Pentium 4,
Intel Xeon, and P6 family processors do not implement a strong
memory-ordering model, except when using the UC memory type.
Despite the fact that Pentium 4, Intel Xeon, and P6 family
processors support processor ordering, Intel does not guarantee
that future processors will support this model."

regards,
alexander.

Ziv Caspi

unread,

Oct 6, 2002, 5:12:54 AM10/6/02

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3D9D7E5D...@web.de>...
[...]

> Can you show me something "official" related to *guarantees*
> w.r.t. memory acesss {re-}ordering and the LOCK-thing?

[...]

Chapter 7 of the manual is particularly frustrating to read.
The guarantee is found in 7.2.4 (I've broken the lines for easier reading):

[[[
Synchronization mechanisms in multiple-processor systems may depend upon
a strong memory-ordering model.

Here, a program can use a locking instruction such as the XCHG
instruction or the LOCK prefix to insure that
a read-modify-write operation on memory is carried out atomically.

Locking operations typically operate like I/O operations in that
they wait for all previous instructions to complete
and for all buffered writes to drain to memory
(see Section 7.1.2., "Bus Locking").
]]]

Ziv

Alexander Terekhov

unread,

Oct 14, 2002, 6:40:15 AM10/14/02

to

t...@cs.ucr.edu wrote:
[...]

> I'd be interested in references on implementing rwlocks via
> barriers and/or compare-and-swap instructions.

Check this (and related papers):

http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html

regards,
alexander.

P.S. Uhmm, ``NPTL's rwlocks''... oder? ;-) ;-)

Konrad Schwarz

unread,

Oct 30, 2002, 11:53:14 AM10/30/02

to

David Butenhof schrieb:
> "An implementation [of read-write locks] built on a mutex and condition
> variables" is a poor read-write lock. Really poor. The only advantage is
> that it "implements the API".
>
> Remember that CV wakeup requires locking the mutex. Every CV wakeup. So what
> happens when you broadcast to a CV to wake up a swarm of "concurrent
> readers"? Well, there are various optimizations, but essentially EACH of
> them, IN TURN (one at a time) will acquire the mutex, manipulate the RWlock
> shared data to get out of the API, and then start executing the protected
> application code.
[...]
> So the answer is that the ideal thread library implementation can do far
> better than you can possibly do with anything resembling portable code.
> Otherwise, you're pretty much out of luck and you might as well stick with
> just a mutex.

The following is (sort-of) pseudo-code for the simplest (and therefore fastest)
implementation of a RW-lock via cv/mutex I could think of. It happens to have the
property that readers do not wait on CVs. It also has the property that writers
can starve, but I don't care about that; I feel application code that does is badly designed.

I am not sure that it is correct---what do you say?

struct rw_lock {
mutex_t m;
cond_t no_more_readers;
int num_readers;
};

void
rw_lock_init (struct rw_lock *lock)
{
mutex_init (&lock->m);
cond_init (&lock->no_more_readers);
num_readers = 0;
}

void
rw_lock_read_lock (struct rw_lock *lock)
{
mutex_lock (&lock->m);
++lock->num_readers;
mutex_unlock (&lock->m);
}

void
rw_lock_read_unlock (struct rw_lock *lock)
{
int num_readers;
mutex_lock (&lock->m);
num_readers = --lock->num_readers;
mutex_unlock (&lock->m);
if (!num_readers)
cond_signal (&lock->no_more_readers);
}

void
rw_lock_write_lock (struct rw_lock *lock)
{
for (;;) {
mutex_lock (&lock->m);
if (!lock->num_readers)
return;
cond_wait (&lock->no_more_readers, &lock->m);
}
}

void
rw_lock_write_unlock (struct rw_lock *lock)
{
mutex_unlock (&lock->m);
}

Alexander Terekhov

unread,

Oct 30, 2002, 12:08:17 PM10/30/02

to

Konrad Schwarz wrote:
[...]

> The following is (sort-of) pseudo-code for the simplest (and therefore fastest)
> implementation of a RW-lock via cv/mutex I could think of. It happens to have the
> property that readers do not wait on CVs. It also has the property that writers
> can starve, but I don't care about that; I feel application code that does is badly designed.
>
> I am not sure that it is correct---what do you say?

I say: it's broken.

>
> struct rw_lock {
> mutex_t m;
> cond_t no_more_readers;
> int num_readers;
> };
>
> void
> rw_lock_init (struct rw_lock *lock)
> {
> mutex_init (&lock->m);
> cond_init (&lock->no_more_readers);
> num_readers = 0;
> }
>
> void
> rw_lock_read_lock (struct rw_lock *lock)
> {
> mutex_lock (&lock->m);
> ++lock->num_readers;
> mutex_unlock (&lock->m);
> }

And what if writer is ALREADY/STILL active at this point?

>
> void
> rw_lock_read_unlock (struct rw_lock *lock)
> {
> int num_readers;
> mutex_lock (&lock->m);
> num_readers = --lock->num_readers;
> mutex_unlock (&lock->m);
> if (!num_readers)
> cond_signal (&lock->no_more_readers);
> }
>
> void
> rw_lock_write_lock (struct rw_lock *lock)
> {
> for (;;) {
> mutex_lock (&lock->m);
> if (!lock->num_readers)
> return;
> cond_wait (&lock->no_more_readers, &lock->m);

The mutex is locked here... see line number -3 above. ;-)

> }
> }
>
> void
> rw_lock_write_unlock (struct rw_lock *lock)
> {
> mutex_unlock (&lock->m);
> }

regards,
alexander.

Patrick TJ McPhee

unread,

Oct 30, 2002, 12:06:25 PM10/30/02

to

In article <3DC00E7...@mchpDOTsiemens.de>,
Konrad Schwarz <konradDO...@mchpDOTsiemens.de> wrote:

% void
% rw_lock_write_lock (struct rw_lock *lock)
% {
% for (;;) {
% mutex_lock (&lock->m);
% if (!lock->num_readers)
% return;
% cond_wait (&lock->no_more_readers, &lock->m);
% }
% }

This is not a correct use of the condition wait. You might try

void
rw_lock_write_lock (struct rw_lock *lock)
{

mutex_lock (&lock->m);
while (lock->num_readers)

cond_wait (&lock->no_more_readers, &lock->m);
}

--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

Alexander Terekhov

unread,

Oct 30, 2002, 12:59:12 PM10/30/02

to

Patrick TJ McPhee wrote:
>
> In article <3DC00E7...@mchpDOTsiemens.de>,
> Konrad Schwarz <konradDO...@mchpDOTsiemens.de> wrote:
>
> % void
> % rw_lock_write_lock (struct rw_lock *lock)
> % {
> % for (;;) {
> % mutex_lock (&lock->m);
> % if (!lock->num_readers)
> % return;
> % cond_wait (&lock->no_more_readers, &lock->m);
> % }
> % }
>
> This is not a correct use of the condition wait. You might try
>
> void
> rw_lock_write_lock (struct rw_lock *lock)
> {
> mutex_lock (&lock->m);
> while (lock->num_readers)
> cond_wait (&lock->no_more_readers, &lock->m);
> }

One problem of such "single mutex" implementation [where a writer keeps
the "entry" mutex locked in order to exclude readers and other writers]
is that, AFAICS, there is no way to implement tryrdlock/EBUSY with POSIX
semantics -- "[EBUSY] The read-write lock could not be acquired for
reading because a writer holds the lock or a writer with the appropriate
priority was blocked on it.". This problem and the desire to reduce a
bit contention for the same mutex WRT entering and exiting readers, has
led me to the following solution [portability/"harmless race condition"
aside]: http://groups.google.com/groups?selm=3D8C850E.DC030D52%40web.de

-------- Original Message --------
t...@cs.ucr.edu wrote:
>
> David Butenhof ?David.B...@compaq.com? wrote:
> [...]
> + A real implementation of pthread_rwlock_* CAN (but need not) support true
> + parallel scheduling of concurrent readers across a set of available CPUs.

>
> Hmmmmm. Extractiing those readers from the ready queue (or wherever)

> seems inherently serial. What am I missing?

Consider the following rwlock algorithm where pending readers stay

blocked on mutex (in addition to "kinda fairness" it allows us to

bypass condvar broadcast operation [for readers], which may somewhat

"confuse" MP-optimized mutexes, though ;-) ). Another problem here

is exactly the "inherently serial" wakeup for readers (unless they

just happen to spin-wait long enough ;-) ).

http://groups.google.com/groups?selm=3B166244.F923B993%40web.de

(Subject: Re: how to distinguish between read and write locks?)

< slightly changed/simplified version; error checking,
cancel-cleanup and code for recursive readlocks omitted >

mtxEntryLock - mutex
mtxReadExitLock - mutex
cndWriteAccessGranted - condvar
nReadEntryCount - int
nReadExitCount - int
nWriteAccessCount - int

rdlock() {

lock( mtxEntryLock );

if ( INT_MAX == ++nReadEntryCount ) {
lock( mtxReadExitLock );
nReadEntryCount -= nReadExitCount;
nReadExitCount = 0;
unlock( mtxReadExitLock );
}

unlock( mtxEntryLock );
}

wrlock() {

lock( mtxEntryLock );

if ( 0 == nWriteAccessCount ) {

lock( mtxReadExitLock );

if ( 0 != nReadExitCount ) {
nReadEntryCount -= nReadExitCount;
nReadExitCount = 0;
}

if ( 0 != nReadEntryCount ) {

nReadExitCount = -nReadEntryCount;
nReadEntryCount = 0;
do {
wait( cndWriteAccessGranted, mtxReadExitLock );
} while( 1 != nWriteAccessCount );
}
else {
nWriteAccessCount = 1;
}

unlock( mtxReadExitLock );
}
else { // recursive write locking
nWriteAccessCount++; // via recursive mtxEntryLock
}
}

unlock() {

auto: bool bGrantWriteAccess

if ( 0 == nWriteAccessCount ) {

lock( mtxReadExitLock );

if ( 0 == ++nReadExitCount ) {
nWriteAccessCount = 1;
bGrantWriteAccess = true;
}
else {
bGrantWriteAccess = false;
}

unlock( mtxReadExitLock );

if ( bGrantWriteAccess ) {
signal( cndWriteAccessGranted );
}
}
else {
nWriteAccessCount--;
unlock( mtxEntryLock );
}
}

regards,
alexander.

David Butenhof

unread,

Oct 30, 2002, 1:09:46 PM10/30/02

to

Konrad Schwarz wrote:

> David Butenhof schrieb:
>> "An implementation [of read-write locks] built on a mutex and condition
>> variables" is a poor read-write lock. Really poor. The only advantage is
>> that it "implements the API".
>>
>> Remember that CV wakeup requires locking the mutex. Every CV wakeup. So
>> what happens when you broadcast to a CV to wake up a swarm of "concurrent
>> readers"? Well, there are various optimizations, but essentially EACH of
>> them, IN TURN (one at a time) will acquire the mutex, manipulate the
>> RWlock shared data to get out of the API, and then start executing the
>> protected application code.
> [...]
>> So the answer is that the ideal thread library implementation can do far
>> better than you can possibly do with anything resembling portable code.
>> Otherwise, you're pretty much out of luck and you might as well stick
>> with just a mutex.
>
> The following is (sort-of) pseudo-code for the simplest (and therefore
> fastest)
> implementation of a RW-lock via cv/mutex I could think of. It happens to
> have the
> property that readers do not wait on CVs. It also has the property that
> writers can starve, but I don't care about that; I feel application code
> that does is badly designed.

Many might instead argue that a read-write lock that starved writers is
badly designed, but I won't get into that. (I think the only real answer
depends on the needs of the particular application, and we can't apply
"general purpose" arguments to an application-specific implementation.)

Also, "simple" isn't necessarily "fast". That'll depend on internal
characteristics of the implementation, as well as how the application uses
these things. (How often are the multiple concurrent readers, how often
will they conflict with a writer, and so forth.)

> I am not sure that it is correct---what do you say?

First, you're representing the write lock by transferring control of the
internal mutex to application code -- that is, you're returning from your
code while still holding the mutex. Dangerous. Though not illegal, I'd
always label it "bad design" in any general purpose API.

However, despite adding that hazzard, you really haven't solved the problem.
Sure, the readers don't wait on a CV, but they still wait, on the mutex,
and all would-be readers are SERIALIZED on that mutex to determine that
"parallel" reading is allowed. In other words, even without the reader CV,
you're still not getting parallel (or even reasonably concurrent) wakeup of
multiple readers. When a writer releases the mutex, ONE reader will acquire
the mutex, determine it can read, release the mutex... allowing the NEXT
reader to do the same... and so forth.

This is EXACTLY what happens with the reader CV, except that a mutex is
optimized for NON-contention cases and therefore may behave less
efficiently in a case like this where contention is expected (and even
desired).

Again: your code doesn't break any laws (aside from a coding error that
could be fixed), but it's likely to be no better in performance than the
CV-based model, (and may well be worse on many platforms). Finally, it
doesn't really do anything to improve the scheduling of concurrent readers.

An "ideal" read-write lock needs to be able to schedule one reader per
available processor and let them all begin execution in application code
"at the same time". Yes, this definition is loose at best on a timeshare
system, but "waking each serially" is not a reasonable or useful
approximation.

> void
> rw_lock_write_lock (struct rw_lock *lock)
> {
> for (;;) {
> mutex_lock (&lock->m);
> if (!lock->num_readers)
> return;
> cond_wait (&lock->no_more_readers, &lock->m);

When cond_wait() returns the thread owns the mutex associated with the
predicate (lock->m in this case). Because you are not releasing that lock,
and will re-lock it at the top of the loop, the second iteration will
deadlock. (To put it a different way, your implementation is far more
inclined to starve writers than you may have thought... but it'll starve
readers, too. ;-) )

Instead of this whole 'for' loop, just lock the mutex and then wait 'while
(lock->num_readers)'.

David Butenhof

unread,

Oct 30, 2002, 1:13:41 PM10/30/02

to

Alexander Terekhov wrote:

>> I am not sure that it is correct---what do you say?
>
> I say: it's broken.

Well, yes; but...

>> void
>> rw_lock_read_lock (struct rw_lock *lock)
>> {
>> mutex_lock (&lock->m);
>> ++lock->num_readers;
>> mutex_unlock (&lock->m);
>> }
>
> And what if writer is ALREADY/STILL active at this point?

Can't be... because a writer holds lock->m. Ugly, yes; but not illegal.

>> void
>> rw_lock_write_lock (struct rw_lock *lock)
>> {
>> for (;;) {
>> mutex_lock (&lock->m);
>> if (!lock->num_readers)
>> return;
>> cond_wait (&lock->no_more_readers, &lock->m);
>
> The mutex is locked here... see line number -3 above. ;-)

This is intentional. The predicate loop is busted because it'll always
deadlock on the second iteration (re-locking lock->m)... but other than
that it'd "work". It won't solve the problem, and may behave horribly
because it depends on blocking the readers on a mutex (generally optimized
for nonblocking cases) rather than on a CV (generally optimized for
blocking cases)... but it'd work.

Alexander Terekhov

unread,

Oct 30, 2002, 2:09:30 PM10/30/02

to

David Butenhof wrote:
[...]

> This is intentional. The predicate loop is busted because it'll always
> deadlock on the second iteration (re-locking lock->m)... but other than
> that it'd "work". It won't solve the problem, and may behave horribly
> because it depends on blocking the readers on a mutex (generally optimized
> for nonblocking cases) rather than on a CV (generally optimized for
> blocking cases)... but it'd work.

I fully agree that the mutex is kinda misused/abused here. This problem
can be "solved", however. Solution: USE SEMAPHORE! ;-) ;-) Kidding aside,
such solution might be really helpful to implement/emulate process shared
rwlocks on a system that provides semaphores only [for process shared/IPC
sync], like almost all [probably w/o exceptions] current distributions of
Linux.

regards,
alexander.

P.S. It seems to me that NPTL's rwlocks are somewhat less broken than its
current condvars, but [check this out yourself]...

E:\>where /r . /t DESIGN-rwlock.txt
2527 9-19-102 11:10a E:\nptl04\nptl\DESIGN-rwlock.txt

That's original NPTL-stuff.

E:\nptl04\nptl\DESIGN-rwlock.txt DOS Line 1/110

Reader Writer Locks pseudocode
==============================

pthread_rwlock_rdlock(pthread_rwlock_t *rwlock);
pthread_rwlock_unlock(pthread_rwlock_t *rwlock);
pthread_rwlock_wrlock(pthread_rwlock_t *rwlock);

struct pthread_rwlock_t {

unsigned int lock:
- internal mutex

unsigned int writers_preferred;
- locking mode: 0 recursive, readers preferred
1 nonrecursive, writers preferred

unsigned int readers;
- number of read-only references various threads have

pthread_t writer;
- descriptor of the writer or 0

unsigned int readers_wakeup;
- 'all readers should wake up' futex.

unsigned int writer_wakeup;
- 'one writer should wake up' futex.

unsigned int nr_readers_queued;
- number of readers queued up.

unsigned int nr_writers_queued;
- number of writers queued up.
}

pthread_rwlock_rdlock(pthread_rwlock_t *rwlock)
{
lll_lock(rwlock->lock);
for (;;) {
if (!rwlock->writer && (!rwlock->nr_writers_queued ||
!rwlock->writers_preferred))
break;

rwlock->nr_readers_queued++;
lll_unlock(rwlock->lock);

futex_wait(&rwlock->readers_wakeup, 0)

lll_lock(rwlock->lock);
if (!--rwlock->nr_readers_queued)
rwlock->readers_wakeup = 0;
}
rwlock->readers++;
lll_unlock(rwlock->lock);
}

pthread_rwlock_tryrdlock(pthread_rwlock_t *rwlock)
{
int result = EBUSY;
lll_lock(rwlock->lock);
if (!rwlock->writer && (!rwlock->nr_writers_queued ||
!rwlock->writers_preferred))
rwlock->readers++;
lll_unlock(rwlock->lock);
return result;
}

pthread_rwlock_wrlock(pthread_rwlock_t *rwlock)
{
lll_lock(rwlock->lock);
for (;;) {
if (!rwlock->writer && !rwlock->readers)
break;

rwlock->nr_writers_queued++;
lll_unlock(rwlock->lock);

futex_wait(&rwlock->writer_wakeup, 0);

lll_lock(rwlock->lock);
rwlock->nr_writers_queued--;
rwlock->writer_wakeup = 0;
}
rwlock->writer = pthread_self();
lll_unlock(rwlock->lock);
}

pthread_rwlock_unlock(pthread_rwlock_t *rwlock)
{
lll_lock(rwlock->lock);

if (rwlock->writer)
rwlock->writer = 0;
else
rwlock->readers--;

if (!rwlock->readers) {
if (rwlock->nr_writers_queued) {
rwlock->writer_wakeup = 1;
futex_wake(&rwlock->writer_wakeup, 1);
} else
if (rwlock->nr_readers_queued) {
rwlock->readers_wakeup = 1;
futex_wake(&rwlock->readers_wakeup, MAX_INT);
}
}

lll_unlock(rwlock->lock);
}

Konrad Schwarz

unread,

Oct 31, 2002, 11:22:23 AM10/31/02

to

Patrick TJ McPhee schrieb:

> This is not a correct use of the condition wait. You might try
>
> void
> rw_lock_write_lock (struct rw_lock *lock)
> {
> mutex_lock (&lock->m);
> while (lock->num_readers)
> cond_wait (&lock->no_more_readers, &lock->m);
> }

Sorry, trivial mistake---haven't been doing any real CV/Mutex programming lately.

Konrad Schwarz

unread,

Oct 31, 2002, 11:40:23 AM10/31/02

to

> One problem of such "single mutex" implementation [where a writer keeps
> the "entry" mutex locked in order to exclude readers and other writers]
> is that, AFAICS, there is no way to implement tryrdlock/EBUSY with POSIX
> semantics -- "[EBUSY] The read-write lock could not be acquired for
> reading because a writer holds the lock or a writer with the appropriate
> priority was blocked on it.". This problem and the desire to reduce a

I could always fake it by using mutex_trylock() or whatever its called
and failing if that fails. :-)

Konrad Schwarz

unread,

Oct 31, 2002, 12:00:24 PM10/31/02

to

David Butenhof schrieb:

> Many might instead argue that a read-write lock that starved writers is
> badly designed, but I won't get into that. (I think the only real answer
> depends on the needs of the particular application, and we can't apply
> "general purpose" arguments to an application-specific implementation.)

Just a small aside: maybe the best solution is to use (gasp :-) thread priorities:
let whoever has the highest priority in next.

> Also, "simple" isn't necessarily "fast". That'll depend on internal
> characteristics of the implementation, as well as how the application uses
> these things. (How often are the multiple concurrent readers, how often
> will they conflict with a writer, and so forth.)

> First, you're representing the write lock by transferring control of the
> internal mutex to application code -- that is, you're returning from your
> code while still holding the mutex. Dangerous. Though not illegal, I'd
> always label it "bad design" in any general purpose API.

Ain't visible in the design (of the API), only in the implementation :-)

> However, despite adding that hazzard, you really haven't solved the problem.
> Sure, the readers don't wait on a CV, but they still wait, on the mutex,
> and all would-be readers are SERIALIZED on that mutex to determine that
> "parallel" reading is allowed. In other words, even without the reader CV,
> you're still not getting parallel (or even reasonably concurrent) wakeup of
> multiple readers. When a writer releases the mutex, ONE reader will acquire
> the mutex, determine it can read, release the mutex... allowing the NEXT
> reader to do the same... and so forth.
>
> This is EXACTLY what happens with the reader CV, except that a mutex is
> optimized for NON-contention cases and therefore may behave less
> efficiently in a case like this where contention is expected (and even
> desired).

So what you are saying is that the mutex lock operation after waking up from the
condition variable broadcast is somehow more efficient than just pending on
the mutex? I can see that happening for the first waiter (lock morphing),
but the rest, I would just expect to queue up like in the pure mutex case.
Thundering herd and all that---except the solution presented doesn't have
the thundering herd problem; the readers are released serially.

I still feel that the presented solution is the best that can be done on top
of CV/Mutex.

> Again: your code doesn't break any laws (aside from a coding error that
> could be fixed), but it's likely to be no better in performance than the
> CV-based model, (and may well be worse on many platforms). Finally, it
> doesn't really do anything to improve the scheduling of concurrent readers.

See above.

> An "ideal" read-write lock needs to be able to schedule one reader per
> available processor and let them all begin execution in application code
> "at the same time". Yes, this definition is loose at best on a timeshare
> system, but "waking each serially" is not a reasonable or useful
> approximation.

So what you actually do as kernel programmer is transfer the queue of blocked readers
directly to the ready queue(s) when the writer unlocks?

David Butenhof

unread,

Oct 31, 2002, 1:29:06 PM10/31/02

to

Konrad Schwarz wrote:

Not quite, because if the trylock fails you don't know if it's because
there's a current writer, or because a reader is just entering (or
exiting). You need to acquire the mutex to reliably determine whether
there's an active reader, but you can't do that until the writer finishes.

So, no, you can't do a tryrdlock. I discounted this since the (non-POSIX)
API proposed didn't have a tryrdlock and I presumed that it wasn't
considered relevant for the application in question.

David Butenhof

unread,

Oct 31, 2002, 1:43:53 PM10/31/02

to

Konrad Schwarz wrote:

> David Butenhof schrieb:
>> Many might instead argue that a read-write lock that starved writers is
>> badly designed, but I won't get into that. (I think the only real answer
>> depends on the needs of the particular application, and we can't apply
>> "general purpose" arguments to an application-specific implementation.)
>
> Just a small aside: maybe the best solution is to use (gasp :-) thread
> priorities: let whoever has the highest priority in next.

In fact, that's the way "good" read-write locks work. More or less. Of
course the simplistic mutex-writer version here can't do that, because the
readers and writers don't wait on the same object.

>> Also, "simple" isn't necessarily "fast". That'll depend on internal
>> characteristics of the implementation, as well as how the application
>> uses these things. (How often are the multiple concurrent readers, how
>> often will they conflict with a writer, and so forth.)
>
>> First, you're representing the write lock by transferring control of the
>> internal mutex to application code -- that is, you're returning from your
>> code while still holding the mutex. Dangerous. Though not illegal, I'd
>> always label it "bad design" in any general purpose API.
>
> Ain't visible in the design (of the API), only in the implementation :-)

Sorry, inputs, outputs, and side effects are part of the API. If the
function returns holding a mutex, that IS part of the API. You can leave it
undocumented (making the routine harder to use reliably and harder to
debug), but it's still part of the API.

>> However, despite adding that hazzard, you really haven't solved the
>> problem. Sure, the readers don't wait on a CV, but they still wait, on
>> the mutex, and all would-be readers are SERIALIZED on that mutex to
>> determine that "parallel" reading is allowed. In other words, even
>> without the reader CV, you're still not getting parallel (or even
>> reasonably concurrent) wakeup of multiple readers. When a writer releases
>> the mutex, ONE reader will acquire the mutex, determine it can read,
>> release the mutex... allowing the NEXT reader to do the same... and so
>> forth.
>>
>> This is EXACTLY what happens with the reader CV, except that a mutex is
>> optimized for NON-contention cases and therefore may behave less
>> efficiently in a case like this where contention is expected (and even
>> desired).
>
> So what you are saying is that the mutex lock operation after waking up
> from the condition variable broadcast is somehow more efficient than just
> pending on
> the mutex? I can see that happening for the first waiter (lock morphing),
> but the rest, I would just expect to queue up like in the pure mutex case.
> Thundering herd and all that---except the solution presented doesn't have
> the thundering herd problem; the readers are released serially.

Readers are always released serially without a concurrent schedule
operation, as I've described for an "ideal" read-write lock. Both this
scheme and CV-based schemes release serially.

Some CV implementations might release a thundering herd on broadcast; but
then, so might an odd mutex implementation. (There's no actual rule that
only one waiter be awakened at a time... only that just one RETURN from the
mutex lock call. In fact, there's no rule that a mutex ever BLOCK in the
first place.)

So we're playing games with "maybe" and "my worst case is worse than your
worst case". I'm quoting one end of the worst case spectrum to warn that
the implementation suggested here plays against some of the design criteria
of the POSIX primitives. You're arguing that it's OK because some
implementations might not work that way. Hey -- I already said the mutex
bit isn't illegal, I merely warned that I didn't think it was a good idea.
You're free to ignore me.

> I still feel that the presented solution is the best that can be done on
> top of CV/Mutex.

And I disagree. So go and do what you want, OK? ;-)

>> Again: your code doesn't break any laws (aside from a coding error that
>> could be fixed), but it's likely to be no better in performance than the
>> CV-based model, (and may well be worse on many platforms). Finally, it
>> doesn't really do anything to improve the scheduling of concurrent
>> readers.
>
> See above.

See above.

>> An "ideal" read-write lock needs to be able to schedule one reader per
>> available processor and let them all begin execution in application code
>> "at the same time". Yes, this definition is loose at best on a timeshare
>> system, but "waking each serially" is not a reasonable or useful
>> approximation.
>
> So what you actually do as kernel programmer is transfer the queue of
> blocked readers directly to the ready queue(s) when the writer unlocks?

The point is that what you've said is absolutely routine -- you unblock
something by sticking it on a ready queue. Almost goes without saying,
right? The only trick is in making sure that there's no common contention
point on the way OUT that would cause any of those threads to merely
re-block -- either on a mutex or on an internal scheduling lock. (I didn't
say it was a SMALL trick.)

Konrad Schwarz

unread,

Oct 31, 2002, 1:46:18 PM10/31/02

to

David Butenhof schrieb:

> Not quite, because if the trylock fails you don't know if it's because
> there's a current writer, or because a reader is just entering (or
> exiting). You need to acquire the mutex to reliably determine whether
> there's an active reader, but you can't do that until the writer finishes.

But can an application tell the difference?

David Butenhof

unread,

Nov 1, 2002, 7:49:25 AM11/1/02

to

Konrad Schwarz wrote:

That depends. If it used tryrdlock rather than a normal rdlock just because
the API was there, and it had no real need for a non-blocking lock attempt,
then, no, it probably can't tell the difference.

More importantly, though, one can easily write a portable conforming test
that will reliably diagnose this implementation bug. You just fire off a
thread that will take a write lock and never release it, then fire off
another thread that will use tryrdlock to attempt to acquire a read lock on
the same rwlock object. If tryrdlock SUCCEEDS, you've beaten the write
locker, and you wait and try again. If it returns EBUSY, then the writer
holds the lock and the test succeeds. The implementation under analysis
here will cause this test to HANG until it is killed, either manually or by
a test infrastructure timeout mechanism. The implementation has failed our
simple conformance test.

The quality control buzzer goes off, and this implementation is shunted off
the assembly line into the bit bucket. Next? ;-)

Once again, the only "saving throw" here is to claim that the intended
consumer of this rwlock package has no need for try*lock functions and that
the inability to provide said functions is therefore irrelevant. If nobody
can in good conscience make that claim, then the game is over.

David Schwartz

unread,

Nov 1, 2002, 2:39:31 PM11/1/02

to

Sure, consider an application that never uses write locks. It would
expect a trylock for a read lock to always succeed.

DS

Alexander Terekhov

unread,

Nov 4, 2002, 12:01:01 PM11/4/02

to

Alexander Terekhov also wrote
<https://listman.redhat.com/pipermail/phil-list/2002-November/000212.html>:
> phil...@redhat.com schrieb am 03.11.02 07:34:05:
> > On Sat, 2 Nov 2002, Alexander Terekhov wrote:
> >
> > > P.S. Oh, BTW, I'm just curious: who's the author of NPTL's
> > > DESIGN-rwlock.txt?
> >
> > i wrote the DESIGN-*.txt pseudocode, with fixes & suggestions from Ulrich.
> > The actual implementation in NPTL is Ulrich's.
> >
> > Ingo
>
> It seems to me that the way how you handle futex value transitions
> [0<->1] and waiting in the current rwlock is well, ``Not Good.''
>
> futex_wait(&rwlock->readers_wakeup, 0)
>
> Unless I'm just missing and/or misunderstanding something, this
> might easily degrade to busy-waiting/spinning -- the futex values
> aren't guaranteed to be zero, as far as I can see.

>
> if (!--rwlock->nr_readers_queued)
> rwlock->readers_wakeup = 0;
>
>

> This perceived problem aside, you might want to take a look at
> the discussion ["poor"/"ideal" rwlock, etc.] in the following
> c.p.t. thread:
>
> http://groups.google.com/groups?selm=e5d6e3f1.0209160258.d14d9e6%40posting.google.com
> (comp.programming.threads, Subject: rwlock using pthread_cond)

Luca Barbieri <l...@ldb.ods.org> replied
<https://listman.redhat.com/pipermail/phil-list/2002-November/000213.html>:
> On Mon, Nov 04, 2002 at 01:55:48PM +0100, Alexander Terekhov wrote:
> > phil...@redhat.com schrieb am 03.11.02 07:34:05:
> > > On Sat, 2 Nov 2002, Alexander Terekhov wrote:
> > >
> > > > P.S. Oh, BTW, I'm just curious: who's the author of NPTL's
> > > > DESIGN-rwlock.txt?
> > >
> > > i wrote the DESIGN-*.txt pseudocode, with fixes & suggestions from Ulrich.
> > > The actual implementation in NPTL is Ulrich's.
> > >
> > > Ingo
> >
> > It seems to me that the way how you handle futex value transitions
> > [0<->1] and waiting in the current rwlock is well, ``Not Good.''
> >
> > futex_wait(&rwlock->readers_wakeup, 0)
> >
> > Unless I'm just missing and/or misunderstanding something, this
> > might easily degrade to busy-waiting/spinning -- the futex values
> > aren't guaranteed to be zero, as far as I can see.
>
> I think that you are right: if the rwlock gets write-locked after
> readers are woken up, and there is more than one reader, a busy wait
> happens.
>
>
> Here is an alternative design that should solve the problem.
>
> The basic idea here is that we keep counts of threads blocking reads
> and writes and of ones queued on futexes on that variable: this has
> the side effect of making the design intuitive.
>
> This design also adds a "first come first served" algorithm where an
> unlock awakes everyone and waiting threads never block opposite-mode
> waiters (this isn't very useful, it's included mostly for
> completeness).
>
> IMHO this fixes all possible problems.
>
> The Linux/x86 assembly implementation is left as an exercise for the
> reader.

>
>
> Reader Writer Locks pseudocode
> ==============================
>
> pthread_rwlock_rdlock(pthread_rwlock_t *rwlock);
> pthread_rwlock_unlock(pthread_rwlock_t *rwlock);
> pthread_rwlock_wrlock(pthread_rwlock_t *rwlock);
>
> struct pthread_rwlock_t {
>
> unsigned int lock:
> - internal mutex
>

> unsigned int algorithm;
> - locking mode: READERS_RULE recursive, readers preferred
> WRITERS_RULE nonrecursive, writers preferred
> FIRST_COME_FIRST_SERVED obvious

>
>
> unsigned int readers;
> - number of read-only references various threads have
>
> pthread_t writer;
> - descriptor of the writer or 0
>

> unsigned int read_blockers;
> - number of threads blocking readers
>
> unsigned int write_blockers;
> - number of threads blocking writers

>
> unsigned int nr_readers_queued;
> - number of readers queued up.
>
> unsigned int nr_writers_queued;
> - number of writers queued up.
> }
>
> pthread_rwlock_rdlock(pthread_rwlock_t *rwlock)
> {
> lll_lock(rwlock->lock);

> if(rwlock->algorithm == READERS_RULE)
> ++rwlock->write_blockers;
>
> for (;;) {
> read_blockers = rwlock->read_blockers;
> if (!read_blockers)
> break;
>
> ++rwlock->nr_readers_queued;
> lll_unlock(rwlock->lock);
> futex_wait(&rwlock->read_blockers, read_blockers)
> lll_lock(rwlock->lock);
> --rwlock->nr_readers_queued;
> }
>
> if(rwlock->algorithm != READERS_RULE)
> ++rwlock->write_blockers;

> lll_unlock(rwlock->lock);
> }
>
> pthread_rwlock_tryrdlock(pthread_rwlock_t *rwlock)
> {
> int result = EBUSY;
> lll_lock(rwlock->lock);

> if (!rwlock->read_blockers)
> {
> ++rwlock->write_blockers;
> result = 0;
> }
> lll_unlock(rwlock->lock);
> return result;
> }
>
> rwlock_query(pthread_rwlock_t *rwlock)
> {
> lll_lock(rwlock->lock);
> writers = rwlock->writer ? 1 : 0;
> readers = writers ? 0 : rwlock->write_blockers;
> lll_unlock(rwlock->lock);
> }
>
> pthread_rwlock_wrlock(pthread_rwlock_t *rwlock)
> {
> lll_lock(rwlock->lock);
> if(rwlock->algorithm == WRITERS_RULE)
> ++rwlock->read_blockers;
>
> for (;;) {
> write_blockers = rwlock->write_blockers;
> if (!write_blockers)
> break;
>
> ++rwlock->nr_writers_queued;
> lll_unlock(rwlock->lock);
> futex_wait(&rwlock->write_blockers, write_blockers)
> lll_lock(rwlock->lock);
> --rwlock->nr_writers_queued;
> }
>
> if(rwlock->algorithm != WRITERS_RULE)
> ++rwlock->read_blockers;
>
> rwlock->writer = pthread_self();
> ++rwlock->write_blockers;

> lll_unlock(rwlock->lock);
> }
>
> pthread_rwlock_unlock(pthread_rwlock_t *rwlock)
> {
> lll_lock(rwlock->lock);
>
> if (rwlock->writer)
> {
> rwlock->writer = 0;

> --rwlock->read_blockers;
> }
>
> --rwlock->write_blockers;
>
> wake_writers = !rwlock->write_blockers && rwlock->nr_writers_queued;
> wake_readers = !rwlock->read_blockers && rwlock->nr_readers_queued;
>
> /* if rwlock->algorithm == FIRST_COME_FIRST_SERVED we awake everyone */
> if(wake_writers && (!wake_readers || (rwlock->algorithm == WRITERS_RULE))
> futex_wake(&rwlock->write_blockers, 1);
> if(wake_readers && (!wake_writers || (rwlock->algorithm == READERS_RULE))
> futex_wake(&rwlock->read_blockers, MAX_INT);
>
> lll_unlock(rwlock->lock);
> }
>

What do you think(*)?

regards,
alexander.

(*) http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz
(Acrobat: "479 of 631" -- "futexes", etc.)

Dragan Cvetkovic

unread,

Nov 4, 2002, 12:18:44 PM11/4/02

to

Alexander Terekhov <tere...@web.de> writes:

[a large snip of 350+ lines]

> What do you think(*)?
>
> regards,
> alexander.
>
> (*) http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz
> (Acrobat: "479 of 631" -- "futexes", etc.)

Alexander, why did you post all this just to add these 6 lines above?

Or is again for Googling?

Bye, Dragan

--
Dragan Cvetkovic,

To be or not to be is true. G. Boole No it isn't. L. E. J. Brouwer

Alexander Terekhov

unread,

Nov 4, 2002, 12:32:03 PM11/4/02

to

Dragan Cvetkovic wrote:
>
> Alexander Terekhov <tere...@web.de> writes:
>
> [a large snip of 350+ lines]

of pseudo-code, mostly.

>
> > What do you think(*)?
> >
> > regards,
> > alexander.
> >
> > (*) http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz
> > (Acrobat: "479 of 631" -- "futexes", etc.)
>
> Alexander, why did you post all this just to add these 6 lines above?

First off, I actually forgot to add two more lines:

: Check this (and related papers):
: http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html

So the total is *8* lines! ;-)

>
> Or is again for Googling?

Well, I was just hoping that folks will find it more convenient
and easy to provide their specific comments/whatever using the
quoted stuff in their replies -- without doing copy&paste from
the "original" web pages.

regards,
alexander.

Dragan Cvetkovic

unread,

Nov 4, 2002, 12:39:08 PM11/4/02

to

Alexander Terekhov <tere...@web.de> writes:
> Well, I was just hoping that folks will find it more convenient
> and easy to provide their specific comments/whatever using the
> quoted stuff in their replies -- without doing copy&paste from
> the "original" web pages.

Well, after so many years on Usenet, at least my eyes are "trained" to just
skim over quoted text and therefore this post of your looked as these awful
posts that quote 100+ lines of replays and just adding "Me too" or
something equally useful. That's the reason I have reacted (although not in
the way you expected) since I know you can do better than that :-)

Alexander Terekhov

unread,

Nov 4, 2002, 12:54:56 PM11/4/02

to

Dragan Cvetkovic wrote:
>
> Alexander Terekhov <tere...@web.de> writes:
> > Well, I was just hoping that folks will find it more convenient
> > and easy to provide their specific comments/whatever using the
> > quoted stuff in their replies -- without doing copy&paste from
> > the "original" web pages.
>
> Well, after so many years on Usenet, at least my eyes are "trained" to just
> skim over quoted text and therefore this post of your looked as these awful
> posts that quote 100+ lines of replays and just adding "Me too" or
> something equally useful. That's the reason I have reacted (although not in
> the way you expected) since I know you can do better than that :-)

Okay, here's another try [targeted at folks with "trained" eyes]. ;-) ;-)

Alexander Terekhov wrote:
[...]

> P.S. It seems to me that NPTL's rwlocks are somewhat less broken than its
> current condvars, but [check this out yourself]...

[...]

Alexander Terekhov also wrote
<https://listman.redhat.com/pipermail/phil-list/2002-November/000212.html>:

[...]

Luca Barbieri <l...@ldb.ods.org> replied
<https://listman.redhat.com/pipermail/phil-list/2002-November/000213.html>:

[...]

What do you think(*)?

regards,
alexander.

(*) http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz
(Acrobat: "479 of 631" -- "futexes", etc.)

http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html
(Check also this [and related papers])

Konrad Schwarz

unread,

Nov 5, 2002, 4:40:10 AM11/5/02

to

David Butenhof schrieb:

> > So what you actually do as kernel programmer is transfer the queue of
> > blocked readers directly to the ready queue(s) when the writer unlocks?
>
> The point is that what you've said is absolutely routine -- you unblock
> something by sticking it on a ready queue. Almost goes without saying,
> right? The only trick is in making sure that there's no common contention
> point on the way OUT that would cause any of those threads to merely
> re-block -- either on a mutex or on an internal scheduling lock. (I didn't
> say it was a SMALL trick.)

Perhaps I haven't grasped what you are saying here.

I am assuming that the write-locked rw-lock has a queue of pending readers.

Under a CV/Mutex implementation, the readers need to be released serially when
the writer unlocks the rw-lock (e.g., my implementation needs to atomically increment
the count of readers, which causes the readers to queue up behind the mutex).

With a native implementation, the write-lock releasing thread can simply set the
reader count to the number of waiting reader threads and transfer the entire queue
of waiting readers to the ready queue. Thats where I see the big win: instead
of doing n context switches, we need only one.

Do you mean the same thing?

Konrad Schwarz

unread,

Nov 5, 2002, 4:42:26 AM11/5/02

to

David Schwartz schrieb:

> > But can an application tell the difference?
>
> Sure, consider an application that never uses write locks. It would
> expect a trylock for a read lock to always succeed.

I thought about this some more, and you are of course right.

But I believe that in all situations where the application can tell the difference,
it wouldn't have needed the rw-lock in the first place; e.g.,
if there is no write locker, the rw-lock is useless.

Of course, this is a purely academic exercise, as David Butenhof points out.

David Butenhof

unread,

Nov 5, 2002, 8:08:35 AM11/5/02

to

Konrad Schwarz wrote:

Um, pretty much, at that level.

I was specifying not only what happens when the lock becomes available
("readying the threads") but also what happens when those threads actually
resume execution after having been "batch readied" by the write-unlock.

Specifying only how they're readied isn't enough to gain the assurance that
the read waiters can execute read-locked application code in parallel. You
also need the assurance that the synchronization code won't serialize "on
the way out" after resuming execution.