amortizing read-access in read/write mutex...

Chris M. Thomasson

unread,

Jan 31, 2010, 5:53:02 AM1/31/10

to

You can amortize read access on a read/write mutex by using the following
fairly simple pattern:
______________________________________________________________
static uword32 g_pending = 0;
static rw_mutex g_rw_mutex;

void readers()
{
g_rw_mutex.read_lock();

for (;;)
{
request_type* r = try_to_get_a_read_request();

if (! r)
{
g_rw_mutex.read_unlock();

r = wait_for_a_read_request();

g_rw_mutex.read_lock();
}

process_read_request(r);

if (ATOMIC_LOAD(g_pending))
{
g_rw_mutex.read_unlock();

// perhaps uncomment for "unfair" read/write mutex impls.
// SwitchToThread() or sched_yield(); // Humm...

g_rw_mutex.read_lock();
}
}

g_rw_mutex.read_unlock();
}

void writers()
{
for (;;)
{
request_type* r = wait_for_a_write_request();

ATOMIC_INC(&g_pending);

g_rw_mutex.write_lock();

ATOMIC_DEC(&g_pending);

process_write_request(r);

g_rw_mutex.write_unlock();
}
}
______________________________________________________________

As you can see, I am reducing the number of times I actually "need" to gain
read access by only releasing and reacquiring read access when the
`g_pending' variable is non-zero. To show this off, I believe that the
"normal" way to code the above would be something like:
______________________________________________________________
static rw_mutex g_rw_mutex;

void readers()
{
for (;;)
{
request_type* r = wait_for_a_read_request();

g_rw_mutex.read_lock();

process_read_request(r);

g_rw_mutex.read_unlock();
}
}

void writers()
{
for (;;)
{
request_type* r = wait_for_a_write_request();

g_rw_mutex.write_lock();

process_write_request(r);

g_rw_mutex.write_unlock();
}
}
______________________________________________________________

Yes, it's a heck of a lot simpler, but it will be acquiring and releasing
read access for every darn read request! This should perform much worse than
the amortized version because generally acquiring and releasing read access
is fairly expensive in general purpose read/write mutex implementations. The
ability to skip "unnecessary" atomic RMW's and membars is key aspect in
writing scaleable software. So, AFAICT this amortization pattern might be
worth while for some applications.

Any thoughts on this particular technique?

:^)

David Schwartz

unread,

Jan 31, 2010, 5:58:59 AM1/31/10

to

On Jan 31, 2:53 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> Any thoughts on this particular technique?

I think probably the best way to do it is a function that's a no-op if
there's no pending writer and releases the read lock, waits until the
writer is finished, and the re-acquires the read lock if there is one.
It's good for cases where you have a number of independent things you
need to do while holding a read lock and neither holding the read lock
through the whole list nor releasing it and reacquiring it on each
item makes sense.

DS

Dmitriy Vyukov

unread,

Jan 31, 2010, 6:39:48 AM1/31/10

to

Agree. Something along the lines of:

g_rw_mutex.lock();
while (...)
{
do_something_useful();
g_rw_mutex.i_am_quiescent();
}
g_rw_mutex.unlock();

Perhaps, i_am_quiescent() must return a flag indicating whether lock
was released and reacquired or not, reader thread can do some
recaching if the mutex was reacquired.

As for the technique, I think it's very nice IF it's applicable. It
solves the problem with a little added complexity. However it requires
cooperation from the rest of the code and from a programmer.
If you can count on periodic activity from all user threads, life
indeed becomes much easier in a multi-threaded environment (wrt lock
management, memory management, object life-time management, etc). Then
you can amortize basically everything and move all the overheads to
that periodic function.

--
Dmitriy V'jukov

Chris M. Thomasson

unread,

Jan 31, 2010, 6:45:33 AM1/31/10

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:801a06d8-80e6-4325...@f17g2000prh.googlegroups.com...

On Jan 31, 2:53 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > Any thoughts on this particular technique?

> I think probably the best way to do it is a function that's a no-op if
> there's no pending writer and releases the read lock, waits until the
> writer is finished, and the re-acquires the read lock if there is one.

Agreed. I believe I can add such a function to my read/write mutex
implementation:

http://software.intel.com/en-us/forums/showthread.php?t=65822

which could look something like this:
_____________________________________________________________________
void rdsync() throw()
{
LONG count = *((LONG volatile*)&m_count);

assert(count < LONG_MAX);

if (count < 0)
{
// a writer is pending which means that we need to
// release and reacquire our read access.

rdunlock();

rdlock();

// since this is a 100% fair rw_mutex, we are now guaranteed
// that the writer has fully completed all of it's actions.
}
}
_____________________________________________________________________

That should work fine and I think I will probably model it in Relacy 2.2.

> It's good for cases where you have a number of independent things you
> need to do while holding a read lock and neither holding the read lock
> through the whole list nor releasing it and reacquiring it on each
> item makes sense.

Agreed. I do think it should be able to improve performance in usage
patterns like the one you described...

Chris M. Thomasson

unread,

Jan 31, 2010, 7:01:42 AM1/31/10

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:f8fb9dbe-ad42-4efa...@z26g2000yqm.googlegroups.com...

On Jan 31, 1:58 pm, David Schwartz <dav...@webmaster.com> wrote:
> > On Jan 31, 2:53 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> >
> > > Any thoughts on this particular technique?
> >
> > I think probably the best way to do it is a function that's a no-op if
> > there's no pending writer and releases the read lock, waits until the
> > writer is finished, and the re-acquires the read lock if there is one.
> > It's good for cases where you have a number of independent things you
> > need to do while holding a read lock and neither holding the read lock
> > through the whole list nor releasing it and reacquiring it on each
> > item makes sense.

> Agree. Something along the lines of:

> g_rw_mutex.lock();
> while (...)
> {
> do_something_useful();
> g_rw_mutex.i_am_quiescent();
> }
> g_rw_mutex.unlock();

> Perhaps, i_am_quiescent() must return a flag indicating whether lock
> was released and reacquired or not, reader thread can do some
> recaching if the mutex was reacquired.

YES! That return value would be providing "essential" information to the
readers. Here, let me modify the function I am thinking about adding to my
rw_mutex impl:
_____________________________________________________________________
bool rdsync() throw()

{
LONG count = *((LONG volatile*)&m_count);

assert(count < LONG_MAX);

if (count < 0)
{
// a writer is pending which means that we need to
// release and reacquire our read access.

rdunlock();

rdlock();

// since this is a 100% fair rw_mutex, we are now guaranteed
// that the writer has fully completed all of it's actions.

return true;
}

return false;
}
_____________________________________________________________________

Now one can do:
_____________________________________________________________________
g_rw_mutex.rdlock();

for (;;)
{
do_something_useful();

if (g_rw_mutex.rdsync())
{
// refresh cached data...
}
}

g_rw_mutex.rdunlock();
_____________________________________________________________________

;^)

> As for the technique, I think it's very nice IF it's applicable. It
> solves the problem with a little added complexity. However it requires
> cooperation from the rest of the code and from a programmer.

Yes, this is absolutely correct. I am thinking that it should work out
fairly well in areas where it is applicable. Therefore, in those specific
cases, it actually might be worth the extra complexity and programming
effort...

> If you can count on periodic activity from all user threads, life
> indeed becomes much easier in a multi-threaded environment (wrt lock
> management, memory management, object life-time management, etc). Then
> you can amortize basically everything and move all the overheads to
> that periodic function.

Amen Brother!!!!

:^D

Chris M. Thomasson

unread,

Jan 31, 2010, 7:09:55 AM1/31/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:nie9n.31934$BV.2...@newsfe07.iad...

> "Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
> news:f8fb9dbe-ad42-4efa...@z26g2000yqm.googlegroups.com...

[...]

>> Perhaps, i_am_quiescent() must return a flag indicating whether lock
>> was released and reacquired or not, reader thread can do some
>> recaching if the mutex was reacquired.
>
> YES! That return value would be providing "essential" information to the
> readers. Here, let me modify the function I am thinking about adding to my
> rw_mutex impl:
> _____________________________________________________________________
> bool rdsync() throw()
> {
> LONG count = *((LONG volatile*)&m_count);
>
> assert(count < LONG_MAX);
>
> if (count < 0)
> {
> // a writer is pending which means that we need to
> // release and reacquire our read access.
>
> rdunlock();
>
> rdlock();
>
> // since this is a 100% fair rw_mutex, we are now guaranteed
> // that the writer has fully completed all of it's actions.
>
> return true;
> }
>
> return false;
> }
> _____________________________________________________________________

FWIW, the function above is very similar to the `proxy<...>::sync()'
function in my proxy collector:

http://cpt.pastebin.com/f71480694

_________________________________________________________________
collector& sync(collector& c)
{
// check if the `c' is in the middle of a quiescence process.
if (c.m_count.load(std::memory_order_relaxed) & 0x10U)
{
// drop `c' and get the next collector.
release(c);

return acquire();
}

return c;
}
_________________________________________________________________

This allows for basically the exact same pattern to be applied to proxy
garbage collectors:
_________________________________________________________________
void readers()
{
proxy_type::collector* c = &g_proxy.acquire();

for (;;)
{
do_something_useful();

c = &g_proxy.sync(*c);
}

g_proxy.release(*c);

}
_________________________________________________________________

The model of the algorithm in Relacy 2.2 shows the pattern in action in the
form of reader threads iterating over all the nodes contained within a
lock-free stack.

David Schwartz

unread,

Jan 31, 2010, 8:31:15 AM1/31/10

to

On Jan 31, 3:39 am, Dmitriy Vyukov <dvyu...@gmail.com> wrote:

> Perhaps, i_am_quiescent() must return a flag indicating whether lock
> was released and reacquired or not, reader thread can do some
> recaching if the mutex was reacquired.

I agree. It should be effectively free to add and might help the
caller decide whether it needs to do some extra processing.

I don't think the cases where this is useful are that many, but it
does have genuine use cases. Most reader/writer lock implementations
could add it with very little difficulty. Typical read/write locks
refuse new readers when a writer is pending, so they should be able to
tell if they would refuse a read lock or not without much trouble.

DS

David Schwartz

unread,

Feb 5, 2010, 12:14:57 AM2/5/10

to

I added an implementation to my Linux-specific futex-based reader/
writer lock. Fast paths are fully optimized. Any reports of problems
noted would be helpful.

Here's the implementation. We use a few non-standard primitives:

InterlockedGet(&x) is kind of like *(volatile int *)x

InterlockedInc and InterlockedDec are atomic increments and decrements
that don't return anything.

InterlockedIncrement and InterlockedDecrement are atomic increments
and decrements that return zero if and only if the resulting value is
zero.

InterlockedExchange(&x, y) is an atomic *x=y that returns the original
value of *x

Also, if you're not familiar with the futex operations, we use only
two. FUTEX_WAIT and FUTEX_WAKE. For example:

sys_futex(&writers, FUTEX_WAIT, j, NULL);
This says to atomically check if writers==j, and if not, to block on
the writers futex.

sys_futex(&areaders, FUTEX_WAKE, 1, NULL);
This says to wake 1 thread that is blocked on the areaders futex.

Any integer can become a futex simply by a thread blocking on it.

Here it is:

// Copyright David J. Schwartz <dav...@webmaster.com>, 2006-2010
// All rights reserved
// Offered without any warranty or claim of fitness for any purpose

class RWFutex
{
protected:
int areaders; // number of active readers (active writer blocks for
zero)
int pad1; // pads give each hot variable its own cache line
int wreaders; // number of waiting readers
int pad2;
int writers; // count of active/waiting writers (readers block for
zero)
int pad3;
int writer; // lock the active writer holds (other writers block
for zero)
int pad4;

private:
RWFutex(const RWFutex &); // no implementation
RWFutex &operator=(const RWFutex &); // no implementation

public:
RWFutex(void) : areaders(0), wreaders(0), writers(0), writer(0) { ; }

void ReadLock(void)
{
int j;
while(1)
{
if(InterlockedGet(&writers)!=0)
{ // wait for no writers
InterlockedInc(&wreaders);
while((j=InterlockedGet(&writers))!=0)
sys_futex(&writers, FUTEX_WAIT, j, NULL);
InterlockedDec(&wreaders);
}
InterlockedInc(&areaders); // add us as a reader
if(InterlockedGet(&writer)==0) // did we race with a writer?
return; // no
if(InterlockedDecrement(&areaders)==0) // we were only attempting
reader
if(InterlockedGet(&writers)!=0) // there is at least one waiting
writer
sys_futex(&areaders, FUTEX_WAKE, 1, NULL);
}
}

bool TryReadLock(void)
{
if(InterlockedGet(&writers)!=0) return false; // active/waiting
writers
InterlockedInc(&areaders);
if(InterlockedGet(&writer)==0) // did we race?
return true;
if(InterlockedDecrement(&areaders)==0) // we are only attempting
reader
sys_futex(&areaders, FUTEX_WAKE, 1, NULL);
return false;
}

void ReadUnlock(void)
{
if(InterlockedDecrement(&areaders)!=0) // one less reader
return; // still other readers
if(InterlockedGet(&writers)!=0) // a writer awaits
sys_futex(&areaders, FUTEX_WAKE, 1, NULL); // no readers now
}

bool CondReadUnlock(void)
{ // unlock read if there's a waiting writer and return when writer
is done
if(InterlockedGet(&writers)==0) return false;
ReadUnlock();
ReadLock();
return true;
}

void WriteLock(void)
{
int j;
InterlockedInc(&writers);
while(fInterlockedExchange(&writer, 1)!=0) // wait out other writers
sys_futex(&writer, FUTEX_WAIT, 1, NULL); // optimizable slow path
// we are the only writer
while((j=InterlockedGet(&areaders))!=0) // wait for no readers
sys_futex(&areaders, FUTEX_WAIT, j, NULL);
}

bool TryWriteLock(void)
{
if( (InterlockedGet(&writers)!=0) || (InterlockedGet(&areaders)!
=0) )
return false;
InterlockedInc(&writers); // add us as a writer
if(InterlockedExchange(&writer, 1)!=0) // can we grab the lock
{ // no, did we block anyone?
if(InterlockedDecrement(&writers)!=0) // waiting writer
sys_futex(&writer, FUTEX_WAKE, 1, NULL); // wake him
else if(InterlockedGet(&wreaders)!=0) // waiting reader(s)
sys_futex(&writers, FUTEX_WAKE, INT_MAX, NULL); // wake them
return false;
}
if(InterlockedGet(&areaders)==0) // if no readers, we're done
return true;
InterlockedSet(&writer, 0); // we raced with a reader, release write
lock
if(InterlockedDecrement(&writers)!=0) // waiting writer
sys_futex(&writer, FUTEX_WAKE, 1, NULL); // wake him
else if(InterlockedGet(&wreaders)!=0) // waiting reader(s)
sys_futex(&writers, FUTEX_WAKE, INT_MAX, NULL); // wake them
return false;
}

void WriteUnlock(void)
{
__asm__ __volatile__("" : : : "memory");
InterlockedSet(&writer, 0); // no active writer, no active readers
if(InterlockedDecrement(&writers)!=0) // waiting writer
sys_futex(&writer, FUTEX_WAKE, 1, NULL); // wake writer
else if(InterlockedGet(&wreaders)!=0) // waiting reader(s)
sys_futex(&writers, FUTEX_WAKE, INT_MAX, NULL); // wake readers
}

};

Comments appreciated. I've had this code percolating for many years
but never actually used it in anything deployed.

DS

Chris M. Thomasson

unread,

Feb 5, 2010, 10:41:11 AM2/5/10

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:2799851d-8673-4345...@f17g2000prh.googlegroups.com...

>I added an implementation to my Linux-specific futex-based reader/
> writer lock. Fast paths are fully optimized. Any reports of problems
> noted would be helpful.
>
> Here's the implementation. We use a few non-standard primitives:

[...]

> Comments appreciated.

If I understand this correctly, writers can starve readers right? I cannot
say if this is a problem or not. Well, if you have enough write activity to
starve readers then you should probably not be using a read/write mutex
anyway.

;^)

> I've had this code percolating for many years
> but never actually used it in anything deployed.

I cannot seem to find anything wrong with it right now. Humm, It would be
neat if you could directly model futex based algorithms in Relacy.

David Schwartz

unread,

Feb 5, 2010, 12:06:44 PM2/5/10

to

On Feb 5, 7:41 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> If I understand this correctly, writers can starve readers right?

I don't think that's a meaningful statement. It does have absolute
writer priority, but:

1) If writers aren't rare, what's the point in using a reader/writer
lock? Writers must be much less common than readers or you're using
the wrong primitive.

2) So what? The only guarantee is that you get forward progress. If
the writer isn't going to make useful forward progress, why did it try
to acquire the lock anyway?

> I cannot
> say if this is a problem or not.

If it is, the problem is the choice of a reader/writer lock, IMO.

> Well, if you have enough write activity to
> starve readers then you should probably not be using a read/write mutex
> anyway.

Bingo.

> > I've had this code percolating for many years
> > but never actually used it in anything deployed.

> I cannot seem to find anything wrong with it right now. Humm, It would be
> neat if you could directly model futex based algorithms in Relacy.

The Linux futex abilities get rather complex. But I just use the
simplest one, a combination of an atomic 'test and block' and a
'wake'.

DS

Dmitriy Vyukov

unread,

Feb 5, 2010, 12:14:54 PM2/5/10

to

On Feb 5, 8:06 pm, David Schwartz <dav...@webmaster.com> wrote:

> > > I've had this code percolating for many years
> > > but never actually used it in anything deployed.
> > I cannot seem to find anything wrong with it right now. Humm, It would be
> > neat if you could directly model futex based algorithms in Relacy.
>
> The Linux futex abilities get rather complex. But I just use the
> simplest one, a combination of an atomic 'test and block' and a
> 'wake'.

I added futex emulation to todo list:
http://groups.google.com/group/relacy/web/todo-feature-list
But keep in mind that it's that hard to add something like a futex to
Relacy manually. Ironically, you even do not need to be any
experienced with multithreading to do that, because Relacy uses
cooperative fibers for simulation. There is already a waitset and a
primitive to synchronize memory between threads, so the implementation
will be quite straightforward.

--
Dmitriy V'jukov

Chris M. Thomasson

unread,

Feb 6, 2010, 10:32:53 AM2/6/10

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:c8a86bdd-c77f-4d28...@m35g2000prh.googlegroups.com...

On Feb 5, 7:41 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > If I understand this correctly, writers can starve readers right?

> I don't think that's a meaningful statement. It does have absolute
> writer priority, but:

[...]

> > I cannot
> > say if this is a problem or not.

> If it is, the problem is the choice of a reader/writer lock, IMO.

What about a pattern that knows it will be hit with a constant stream of
read requests and may experience periods of rather "rapid" writer activity
that happens to occur on a non-deterministic basis. IMVHO, a read/write
mutex should "allow" readers in during the bursts of writer activity?

[...]

David Schwartz

unread,

Feb 7, 2010, 6:43:26 AM2/7/10

to

On Feb 6, 7:32 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> What about a pattern that knows it will be hit with a constant stream of
> read requests and may experience periods of rather "rapid" writer activity
> that happens to occur on a non-deterministic basis. IMVHO, a read/write
> mutex should "allow" readers in during the bursts of writer activity?

I cannot imagine why. I've used reader/writer locks for an awfully
long time and never wanted anything but strict writer priority.
However, I have no problem with any implementation that guarantees
writers aren't starved in the face of large numbers of readers. Phase
alternation is fine with me, though it seems like needless extra
effort.

DS

Chris M. Thomasson

unread,

Feb 7, 2010, 7:50:28 AM2/7/10

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:99680d11-fa70-4a91...@q2g2000pre.googlegroups.com...

On Feb 6, 7:32 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > What about a pattern that knows it will be hit with a constant stream of
> > read requests and may experience periods of rather "rapid" writer
> > activity
> > that happens to occur on a non-deterministic basis. IMVHO, a read/write
> > mutex should "allow" readers in during the bursts of writer activity?

> I cannot imagine why.

Humm; you got me there. :^)

Well, I can only think of a __very_contrived__ example: What about a system
that has very frequent read activity, and the writes are fairly frequent as
well. Say, you have an average of 100,000 reads, and around 6,000-15,000
writes per-second. Now, some programmer uses a single rw-mutex to protect
the whole thing! Should his program livelock wrt readers, or be actually be
allowed to progress?

> I've used reader/writer locks for an awfully
> long time and never wanted anything but strict writer priority.
> However, I have no problem with any implementation that guarantees
> writers aren't starved in the face of large numbers of readers. Phase
> alternation is fine with me, though it seems like needless extra
> effort.

Actually, the implementation I can up with is fairly simple. Have you taken
a look at it? The funny thing is that strict fairness was not a goal back
when I was designing the rw-mutex, it manifested as a result of the
synchronization algorithm I settled on; I did not consider it to be a flaw.
I like the algorithm because it's implementation contains no loops which is
pretty cool, IMHO of course!

;^)

David Schwartz

unread,

Feb 7, 2010, 8:38:18 AM2/7/10

to

On Feb 7, 4:50 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> Well, I can only think of a __very_contrived__ example: What about a system
> that has very frequent read activity, and the writes are fairly frequent as
> well. Say, you have an average of 100,000 reads, and around 6,000-15,000
> writes per-second. Now, some programmer uses a single rw-mutex to protect
> the whole thing! Should his program livelock wrt readers, or be actually be
> allowed to progress?

The program will progress as each writer will make progress. If the
writers were not making forward progress, why do they exist? At least
in POSIX-land, the only guarantee any of the typical synchronization
primitives make is that your process as a whole will make forward
progress.

I would argue that if strict writer priority causes you a problem,
then a reader/writer lock is the wrong primitive. i would also argue
that a reader/writer lock is the wrong primitive if you expect the
"read lock" operation to block more than a very small percentage of
the time. The fast paths are the read lock/unlock operation with no
writers.

Perhaps you can make a case for some special-purpose read/write lock
that handles other cases, but this is the typical one. Blocking should
be rare or you're doing it wrong.

DS

Chris M. Thomasson

unread,

Feb 7, 2010, 9:35:30 AM2/7/10

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:305c4fdf-7f5d-4e92...@b9g2000pri.googlegroups.com...

On Feb 7, 4:50 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > Well, I can only think of a __very_contrived__ example: What about a
> > system
> > that has very frequent read activity, and the writes are fairly frequent
> > as
> > well. Say, you have an average of 100,000 reads, and around 6,000-15,000
> > writes per-second. Now, some programmer uses a single rw-mutex to
> > protect
> > the whole thing! Should his program livelock wrt readers, or be actually
> > be
> > allowed to progress?

My answers will be within the context of using a read/write mutex with
strict writer priority in the really contrived example I gave above.

> The program will progress as each writer will make progress. If the
> writers were not making forward progress, why do they exist?

The writers can definitely make forward progress. However, a reader can
stall and not make any progress if the "writer count" inside the rw-mutex
implementation does not drop to zero, and stay zero "long enough" for any
readers to actually acquire read access. I think you can get starvation if
readers will not even try to access the critical-section if the write count
is not zero.

> At least
> in POSIX-land, the only guarantee any of the typical synchronization
> primitives make is that your process as a whole will make forward
> progress.

I think that we are both making explicit scheduling decisions at a "higher
level" than the OS. For instance, my rw-scheduler is 100% fair such that it
strictly alternates between reads and writes while your rw-scheduler gives
100% writer priority because it blocks readers if there is an active writer
or any pending writers. For example, if 4 threads are waiting for write
access, all new readers will be blocked until all 4 of those threads are
finished. If new writers come in and wait before those 4 threads are
finished, then the readers will need to wait for them as well. If a waiter
leaves and comes back before the remaining 3 threads finish, then the
readers will have to wait for it all over again.

> I would argue that if strict writer priority causes you a problem,
> then a reader/writer lock is the wrong primitive. i would also argue
> that a reader/writer lock is the wrong primitive if you expect the
> "read lock" operation to block more than a very small percentage of
> the time. The fast paths are the read lock/unlock operation with no
> writers.
>
> Perhaps you can make a case for some special-purpose read/write lock
> that handles other cases, but this is the typical one. Blocking should
> be rare or you're doing it wrong.

Would you say that the load generated by the contrived case is way too much
pressure for a general purpose read/write mutex to handle? IMVHO, I want my
rw-mutex to perform great in "sane" usage cases. However, I also want it to
guarantee forward progress for both reads and writes if a programmer happens
to use it to solve a high contention scenario with a high read-to-write
ratio.

David Schwartz

unread,

Feb 7, 2010, 10:10:57 AM2/7/10

to

On Feb 7, 6:35 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> Would you say that the load generated by the contrived case is way too much
> pressure for a general purpose read/write mutex to handle?

No, I'd say it's fine. And I'd also say, in that context, that the
writers making forward progress means the process is making forward
progress.

> IMVHO, I want my
> rw-mutex to perform great in "sane" usage cases. However, I also want it to
> guarantee forward progress for both reads and writes if a programmer happens
> to use it to solve a high contention scenario with a high read-to-write
> ratio.

I don't think that's worth slowing the fast paths down. If you can do
it without slowing the fast paths, then go for it. Otherwise, I think
it's a questionable idea. Reader/writer mutexes are appropriate in
cases where reads significantly outnumber writes, overall write demand
is low, and blocking of any thread is the exception rather than the
rule.

DS

Chris M. Thomasson

unread,

Feb 7, 2010, 10:37:30 AM2/7/10

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:2c95e72e-2b4c-49db...@o16g2000prh.googlegroups.com...

On Feb 7, 6:35 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > Would you say that the load generated by the contrived case is way too
> > much
> > pressure for a general purpose read/write mutex to handle?

> No, I'd say it's fine. And I'd also say, in that context, that the
> writers making forward progress means the process is making forward
> progress.

IMHO, I would have to say that the writer threads within the process are all
making forward progress. Therefore, I say that a "portion" of the process is
making forward progress.

> > IMVHO, I want my
> > rw-mutex to perform great in "sane" usage cases. However, I also want it
> > to
> > guarantee forward progress for both reads and writes if a programmer
> > happens
> > to use it to solve a high contention scenario with a high read-to-write
> > ratio.

> I don't think that's worth slowing the fast paths down. If you can do
> it without slowing the fast paths, then go for it.

You should check out the implementation I did:

http://software.intel.com/en-us/forums/showthread.php?t=65822
(read all...)

Here is what the read lock/unlock functions look like:
____________________________________________________________
void rdlock() throw() {
assert(m_count <= LONG_MAX);
LONG count = InterlockedDecrement(&m_count);
if (count < 0) {
if (WaitForSingleObject(m_rdwset,
INFINITE) != WAIT_OBJECT_0) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}

void rdunlock() throw() {
assert(m_count < LONG_MAX);
LONG count = InterlockedIncrement(&m_count);
if (count < 1) {
if (! InterlockedDecrement(&m_rdwake)) {
if (! SetEvent(m_wrwset)) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}
}
____________________________________________________________

And here is the writer lock/unlock:
____________________________________________________________
void wrunlock() throw() {
assert(m_count < 1);
LONG count = InterlockedExchangeAdd(&m_count, LONG_MAX);
if (count < 0) {
if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
RWMUTEX_WIN_UNEXPECTED();
}
}
if (InterlockedIncrement(&m_mtx) < 1) {
if (! ReleaseSemaphore(m_mtxwset, 1, NULL)) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}

void wrtord() throw() {
assert(m_count < 1);
LONG count = InterlockedExchangeAdd(&m_count, LONG_MAX - 1);
if (count < 0) {
if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
RWMUTEX_WIN_UNEXPECTED();
}
}
if (InterlockedIncrement(&m_mtx) < 1) {
if (! ReleaseSemaphore(m_mtxwset, 1, NULL)) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}
____________________________________________________________

> Otherwise, I think it's a questionable idea.

I can enforce strict fairness without slowing any fast-path. I also avoid
any using any loops in the algorithm. IMVHO, that's a pretty nice.

David Schwartz

unread,

Feb 7, 2010, 11:41:35 AM2/7/10

to

On Feb 7, 7:37 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> I can enforce strict fairness without slowing any fast-path. I also avoid
> any using any loops in the algorithm. IMVHO, that's a pretty nice.

You have no loops because you've pushed them down a layer. For
example, your code calls WaitForSingleObject or sem_wait which contain
loops.

Your slow paths are slower than mine, but it doesn't look like you
particularly tried to optimize them. (Nor did I optimize mine.)

It's interesting how different our approaches are. I like the fact
that you managed to get some reader/writer fairness without slowing
the fast paths.

DS

Chris M. Thomasson

unread,

Feb 7, 2010, 12:55:20 PM2/7/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:G6Bbn.19306$3n2....@newsfe01.iad...

> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:2c95e72e-2b4c-49db...@o16g2000prh.googlegroups.com...

[...]

>> I don't think that's worth slowing the fast paths down. If you can do
>> it without slowing the fast paths, then go for it.
>
> You should check out the implementation I did:
>
>
> http://software.intel.com/en-us/forums/showthread.php?t=65822
> (read all...)

[...]

>
> And here is the writer lock/unlock:

[...]

// OOPS! That was the writer unlock and writer-to-reader downgrade
procedures! Here is the missing writer lock procedure:
____________________________________________________________

void wrlock() throw() {
if (InterlockedDecrement(&m_mtx) < 0) {
if (WaitForSingleObject(m_mtxwset,
INFINITE) != WAIT_OBJECT_0) {
RWMUTEX_WIN_UNEXPECTED();
}
}
assert(m_count > -1);
LONG count = InterlockedExchangeAdd(&m_count, -LONG_MAX);
if (count < LONG_MAX) {
if (InterlockedExchangeAdd(&m_rdwake,
LONG_MAX - count) + LONG_MAX - count) {
if (WaitForSingleObject(m_wrwset,
INFINITE) != WAIT_OBJECT_0) {
RWMUTEX_WIN_UNEXPECTED();
}
}
}
}
____________________________________________________________

> ____________________________________________________________
> void wrunlock() throw() {
> assert(m_count < 1);
> LONG count = InterlockedExchangeAdd(&m_count, LONG_MAX);
> if (count < 0) {
> if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
> RWMUTEX_WIN_UNEXPECTED();
> }
> }
> if (InterlockedIncrement(&m_mtx) < 1) {
> if (! ReleaseSemaphore(m_mtxwset, 1, NULL)) {
> RWMUTEX_WIN_UNEXPECTED();
> }
> }
> }

[...]

> ____________________________________________________________
[...]

Chris M. Thomasson

unread,

Feb 7, 2010, 1:26:01 PM2/7/10

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:b3f8bb29-4889-45ab...@b9g2000pri.googlegroups.com...

On Feb 7, 7:37 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> > I can enforce strict fairness without slowing any fast-path. I also
> > avoid
> > any using any loops in the algorithm. IMVHO, that's a pretty nice.

> You have no loops because you've pushed them down a layer. For
> example, your code calls WaitForSingleObject or sem_wait which contain
> loops.

Fair enough.

> Your slow paths are slower than mine, but it doesn't look like you
> particularly tried to optimize them. (Nor did I optimize mine.)

I just defer all of the actual blocking to native semaphores and rely on the
fact that the synchronization algorithm itself provides certain "guarantees"
when a thread returns from a semaphore wait. A thread that returns from a
native wait does not need to read any state because it already "knows" its
state.

Also, keep in mind that the wait operations will only be called one single
time per-slowpath because there is no looping. So, that "can" be more
"efficient" because if you have to loop around you may end up calling a
native wait operation more than one time.

> It's interesting how different our approaches are.

Indeed! :^)

> I like the fact
> that you managed to get some reader/writer fairness without slowing
> the fast paths.

Yeah. The fairness is just sort of an inherent aspect of the algorithm.

Chris M. Thomasson

unread,

Feb 11, 2010, 7:52:10 PM2/11/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:%hd9n.68184$PI7....@newsfe17.iad...

> You can amortize read access on a read/write mutex by using the following
> fairly simple pattern:
> ______________________________________________________________

[...]
> ______________________________________________________________
[...]

I added the functionality to my rw-mutex implementation. Here is a model of
the algorithm in Relacy:

http://cpt.pastebin.com/f48a22282