Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

locking N mutexes simultaneously

126 views
Skip to first unread message

Laurent Deniau

unread,
Oct 19, 2007, 9:46:33 AM10/19/07
to
I am looking for the best way to lock N mutex (2<=N<=5)
"simultaneously" from the point of view of other threads. The only
assumption made is that the concerned mutexes are always locked/
unlocked through this interface. Starting by the stupid broken
solution for N = 2:

void
broken_lock2(pthread_mutex_t *lck1, pthread_mutex_t *lck2)
{
pthread_mutex_lock(lck1);
pthread_mutex_lock(lck2);
}

few problems rose leading to deadlock:
- thread A call broken_lock(lck,lck)
- thread A call broken_lock(lck1,lck2) and hold lck1 but wait for lck2
AND
thread B call broken_lock(lck2,lck1) and hold lck2 but wait for lck1

The first point can be solved by recursive mutex and the second point
by using a critical section (i.e. lck0):

static pthread_mutex_t lck0[1] = { PTHREAD_MUTEX_INITIALIZER };

void
safelock2(pthread_mutex_t *lck1, pthread_mutex_t *lck2)
{
pthread_mutex_lock(lck0);
pthread_mutex_lock(lck1);
pthread_mutex_lock(lck2);
pthread_mutex_unlock(lck0);
}

This solution is simple but may create some bottleneck and requires 3
locks and 1 unlock instead of 2 locks. Another solution which minimize
the use of the critical section is:

void
fastlock2(pthread_mutex_t *lck1, pthread_mutex_t *lck2)
{
if (pthread_mutex_trylock(lck1)) {
safelock2(lck1,lck2);
return;
}
if (pthread_mutex_trylock(lck2)) {
if (pthread_mutex_trylock(lck0)) {
pthread_mutex_unlock(lck1);
safelock2(lck1,lck2);
} else {
pthread_mutex_lock(lck2);
pthread_mutex_unlock(lck0);
}
return;
}
}

The complete code for N = 2 and N = 3 follows.

QUESTIONS:
- Is there any better solution solving this problem? paper, refs would
be welcome.
- David Butenhof said some time ago (two years or so) that the need
for recursive mutexes highglight design problems. But replacing
recursive mutexes by normal mutexes and mutual address comparison
doesn't scale well with N growing (e.g. for N = 5, this adds 10
comparisons).

thanks.

a+, ld.

/***********************************************************/

#include <pthread.h>

/* --- local variables --- */

static pthread_mutex_t lck0[1] = { PTHREAD_MUTEX_INITIALIZER };

/* --- local functions --- */

static void
lock(pthread_mutex_t *lck1)
{
pthread_mutex_lock(lck1);
}

static int
trylock(pthread_mutex_t *lck1)
{
return pthread_mutex_trylock(lck1);
}

static void
lock2(pthread_mutex_t *lck1, pthread_mutex_t *lck2)
{
pthread_mutex_lock(lck1);
pthread_mutex_lock(lck2);
}

static void
lock3(pthread_mutex_t *lck1, pthread_mutex_t *lck2, pthread_mutex_t
*lck3)
{
pthread_mutex_lock(lck1);
pthread_mutex_lock(lck2);
pthread_mutex_lock(lck3);
}

static void
unlock(pthread_mutex_t *lck1)
{
pthread_mutex_unlock(lck1);
}

static void
unlock2(pthread_mutex_t *lck1, pthread_mutex_t *lck2)
{
pthread_mutex_unlock(lck1);
pthread_mutex_unlock(lck2);
}

static void
unlock3(pthread_mutex_t *lck1, pthread_mutex_t *lck2, pthread_mutex_t
*lck3)
{
pthread_mutex_unlock(lck1);
pthread_mutex_unlock(lck2);
pthread_mutex_unlock(lck3);
}

static void
safelock1(pthread_mutex_t *lck1)
{
lock(lck1);
}

static void
safelock2(pthread_mutex_t *lck1, pthread_mutex_t *lck2)
{
lock(lck0);
lock2(lck1,lck2);
unlock(lck0);
}

static void
safelock3(pthread_mutex_t *lck1, pthread_mutex_t *lck2,
pthread_mutex_t *lck3)
{
lock(lck0);
lock3(lck1,lck2,lck3);
unlock(lck0);
}

/* --- global functions --- */

void
fastlock2(pthread_mutex_t *lck1, pthread_mutex_t *lck2)
{
if (trylock(lck1)) {
safelock2(lck1,lck2);
return;
}
if (trylock(lck2)) {
if (trylock(lck0)) {
unlock(lck1);
safelock2(lck1,lck2);
} else {
lock(lck2);
unlock(lck0);
}
return;
}
}

void
fastlock3(pthread_mutex_t *lck1, pthread_mutex_t *lck2,
pthread_mutex_t *lck3)
{
if (trylock(lck1)) {
safelock3(lck1,lck2,lck3);
return;
}
if (trylock(lck2)) {
if (trylock(lck0)) {
unlock(lck1);
safelock3(lck1,lck2,lck3);
} else {
lock2(lck2,lck3);
unlock(lck0);
}
return;
}
if (trylock(lck3)) {
if (trylock(lck0)) {
unlock2(lck1,lck2);
safelock3(lck1,lck2,lck3);
} else {
lock(lck3);
unlock(lck0);
}
return;
}
}

Chris Friesen

unread,
Oct 19, 2007, 12:56:38 PM10/19/07
to
Laurent Deniau wrote:
> I am looking for the best way to lock N mutex (2<=N<=5)
> "simultaneously" from the point of view of other threads. The only
> assumption made is that the concerned mutexes are always locked/
> unlocked through this interface.

The simple method is to always lock them in the same order. One common
ordering is to just use increasing mutex address.

Chris

Howard Hinnant

unread,
Oct 19, 2007, 1:43:22 PM10/19/07
to
In article <13hhoe7...@corp.supernews.com>,
Chris Friesen <cbf...@mail.usask.ca> wrote:

I have experimentally found the "lock in order" algorithm significantly
slower than the following variation of the "try and back off" algorithm,
especially on multi-core platforms (I only looked at Mac and Windows
OS's):

(error checking omitted for brevity)

void lock2(pthread_mutex_t* m1, pthread_mutex_t* m2)
{
while (true)
{
pthread_mutex_lock(m1);
if (pthread_mutex_trylock(m2) == 0))
return;
pthread_mutex_unlock(m1);
sched_yield();
pthread_mutex_lock(m2);
if (pthread_mutex_trylock(m1) == 0))
return;
pthread_mutex_unlock(m2);
sched_yield();
}
}

I have coded a general lock-N variation of the above, but only using
variadic templates in C++.

I found the lock-in-order algorithm slower in terms of overall
application performance and more prone to thread starvation during
stress testing.

-Howard

Chris Thomasson

unread,
Oct 19, 2007, 9:56:30 PM10/19/07
to

"Laurent Deniau" <Laurent...@gmail.com> wrote in message
news:1192801593.1...@q5g2000prf.googlegroups.com...

>I am looking for the best way to lock N mutex (2<=N<=5)
> "simultaneously" from the point of view of other threads. The only
> assumption made is that the concerned mutexes are always locked/
> unlocked through this interface. Starting by the stupid broken
> solution for N = 2:

Use two-phase locking or:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e0c011baf08844c4

Chris Thomasson

unread,
Oct 20, 2007, 7:36:47 PM10/20/07
to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:AOWdnc13Y_yywoTa...@comcast.com...

Remember that you don't have to use try_lock on a sorted order of locks.

Laurent Deniau

unread,
Oct 22, 2007, 4:37:58 AM10/22/07
to

Ok. But sorting requires somehow n.log n comparisons (and swap) which
means 12 comparisons (and swap) for N=5, _always_. Does this behave
better (in average) than a sequence of try_lock and rollback on
failure? Thanks.

a+, ld.

Laurent Deniau

unread,
Oct 22, 2007, 4:42:40 AM10/22/07
to
On 19 oct, 19:43, Howard Hinnant <howard.hinn...@gmail.com> wrote:
> In article <13hhoe7335ar...@corp.supernews.com>,

Interesting, it assumes the use of recursive mutex right? Could you
give an example for 3 locks, just to see how it scales to N (do you
try all the combinations)? what do you think about using try_lock
first and switch to the sorting version if anyone fails?

a+, ld.

Laurent Deniau

unread,
Oct 22, 2007, 4:44:40 AM10/22/07
to
On 21 oct, 01:36, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Chris Thomasson" <cris...@comcast.net> wrote in message
>
> news:AOWdnc13Y_yywoTa...@comcast.com...
>
>
>
> > "Laurent Deniau" <Laurent.Den...@gmail.com> wrote in message

> >news:1192801593.1...@q5g2000prf.googlegroups.com...
> >>I am looking for the best way to lock N mutex (2<=N<=5)
> >> "simultaneously" from the point of view of other threads. The only
> >> assumption made is that the concerned mutexes are always locked/
> >> unlocked through this interface. Starting by the stupid broken
> >> solution for N = 2:
>
> > Use two-phase locking or:

Two-phase locking can deadlock easily in my case. It is not a suitable
solution.

> >http://groups.google.com/group/comp.programming.threads/browse_frm/th...


>
> Remember that you don't have to use try_lock on a sorted order of locks.

Right, this could be a suitable approach (see my other replies).

a+, ld.

Chris Thomasson

unread,
Oct 22, 2007, 6:51:49 AM10/22/07
to
"Laurent Deniau" <Laurent...@gmail.com> wrote in message
news:1193042680.7...@i13g2000prf.googlegroups.com...

I should point out that applying a sorting algorithm to the list of mutex's
you indeed to lock can help eliminate the live-locking that can exist in
most "try_lock" schemes.

Laurent Deniau

unread,
Oct 22, 2007, 7:30:15 AM10/22/07
to
On 22 oct, 12:51, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Laurent Deniau" <Laurent.Den...@gmail.com> wrote in message
>
> news:1193042680.7...@i13g2000prf.googlegroups.com...
>
>
>
> > On 21 oct, 01:36, "Chris Thomasson" <cris...@comcast.net> wrote:
> >> "Chris Thomasson" <cris...@comcast.net> wrote in message
>
> >>news:AOWdnc13Y_yywoTa...@comcast.com...
>
> >> > "Laurent Deniau" <Laurent.Den...@gmail.com> wrote in message
> >> >news:1192801593.1...@q5g2000prf.googlegroups.com...
> >> >>I am looking for the best way to lock N mutex (2<=N<=5)
> >> >> "simultaneously" from the point of view of other threads. The only
> >> >> assumption made is that the concerned mutexes are always locked/
> >> >> unlocked through this interface. Starting by the stupid broken
> >> >> solution for N = 2:
>
> >> > Use two-phase locking or:
>
> > Two-phase locking can deadlock easily in my case. It is not a suitable
> > solution.
>
> >> >http://groups.google.com/group/comp.programming.threads/browse_frm/th...
>
> >> Remember that you don't have to use try_lock on a sorted order of locks.
>
> > Right, this could be a suitable approach (see my other replies).
>
> I should point out that applying a sorting algorithm to the list of mutex's
> you indeed to lock can help eliminate the live-locking that can exist in
> most "try_lock" schemes.

Do you mean that try_lock based solutions (e.g. H.Hinnant proposal)
may also deadlock?

Just to summarize, it happens that better scheme than just sort the
mutexes and lock in order doesn't exist. Right?

a+, ld.

Chris Thomasson

unread,
Oct 22, 2007, 9:29:27 AM10/22/07
to
"Laurent Deniau" <Laurent...@gmail.com> wrote in message
news:1193052615....@i13g2000prf.googlegroups.com...

Not deadlock, livelock.

> Just to summarize, it happens that better scheme than just sort the
> mutexes and lock in order doesn't exist. Right?

series of try_lock's; unlock, backoff and retry on failure is susceptible to
live-lock.

Laurent Deniau

unread,
Oct 22, 2007, 10:18:02 AM10/22/07
to

Right, but I saw it as a kind of deadlock unless you use something
like sched_yield() as in the H.Hinnart.

> > Just to summarize, it happens that better scheme than just sort the
> > mutexes and lock in order doesn't exist. Right?
>
> series of try_lock's; unlock, backoff and retry on failure is susceptible to
> live-lock.

Ok. So sorting seems to be the only robust and efficient (in average)
solution unless you touch to the threads scheduling.

a+, ld.

Howard Hinnant

unread,
Oct 22, 2007, 10:59:31 AM10/22/07
to
In article <P7adnV6BetkQOYHa...@comcast.com>,
"Chris Thomasson" <cri...@comcast.net> wrote:

> > Do you mean that try_lock based solutions (e.g. H.Hinnant proposal)
> > may also deadlock?
>
> Not deadlock, livelock.

<shrug> Instead of guessing I tested and measured, then tested and
measured some more.

Without the yield statements I believe I may have observed brief
(unstable) periods of livelock. Even so the yield-less algorithm still
outperformed the lock-in-order algorithm by 35% on dual processor
machines. On single core machines the performance of the two
algorithms was nearly identical.

With the yields included (as I posted), I measured an 88% performance
advantage of the try_lock based solution over the lock in order (on dual
processor only, single processor showed no significant performance
difference). Much of the gains observed were in eliminating "system"
time, and secondarily in eliminating thread starvation.

In the sort-based algorithm the number of threads in the test tended to
taper off more gradually at the end of the test, with the effect that
the entire test took longer while waiting for the once-starved threads
to finish.

In the try_lock based algorithm, all threads tended to end their work at
the same time, keeping both processors busy throughout the test.

This test was with a very high contention for the locks. Such a test
would be the most likely to trigger a stable livelock (not to mention a
deadlock). For low contention contexts, I'm sure it doesn't make any
difference what algorithm you use (as long as you avoid deadlock).

-Howard

Chris Friesen

unread,
Oct 22, 2007, 11:06:38 AM10/22/07
to
Howard Hinnant wrote:

> void lock2(pthread_mutex_t* m1, pthread_mutex_t* m2)
> {
> while (true)
> {
> pthread_mutex_lock(m1);
> if (pthread_mutex_trylock(m2) == 0))
> return;
> pthread_mutex_unlock(m1);
> sched_yield();

Unless you're running in the SCHED_RR or SCHED_FIFO scheduler policy,
the behaviour of sched_yield() is not well defined.

In linux, for instance, it has changed substantially over the years.
Among other things, one implementation yielded to every other process on
the system.

Chris

Eric Sosman

unread,
Oct 22, 2007, 11:26:15 AM10/22/07
to
Laurent Deniau wrote On 10/22/07 04:37,:

There's something wrong with your arithmetic. Five
objects can be permuted in 120 < 2**7 ways, so seven or
possibly eight comparisons should suffice. Knuth shows
a seven-comparison method invented by H.B. DeMuth in 1956,
and shows how to generalize it for larger N (see TAOCP
section 5.3.1).

To me, though, the interesting question in all this
is "Why?" Why do you need (apparent) simultaneity of the
lock operations on several locks? If Thread T1 tries to
acquire locks A B C D E while T2 tries for C E F, what
harm will it cause if T2 gets C after T1 gets A?

--
Eric....@sun.com

Howard Hinnant

unread,
Oct 22, 2007, 11:29:55 AM10/22/07
to
In article <13hpf3v...@corp.supernews.com>,
Chris Friesen <cbf...@mail.usask.ca> wrote:


I tested only on Mac OS X (using pthread_mutex_t and sched_yield) and
Windows (using CRITICAL_SECTION and Sleep). I would welcome test
results from other OS's / API's.

My test rig was a classic "dining philosophers" set up with 5 threads
and 5 locks (each thread needing two locks at a time). I also ran a
version with 8 threads and 12 locks, each thread needing 3 locks at a
time and got similar results.

-Howard

Laurent Deniau

unread,
Oct 22, 2007, 11:38:33 AM10/22/07
to
On 22 oct, 16:59, Howard Hinnant <howard.hinn...@gmail.com> wrote:
> In article <P7adnV6BetkQOYHanZ2dnUVZ_uCin...@comcast.com>,

Does this test use normal locks for the sort-based algorithm and
recursive locks for the try_lock version? Could give an example for
N=3? Thanks.

a+, ld.

Laurent Deniau

unread,
Oct 22, 2007, 11:40:10 AM10/22/07
to
On 22 oct, 16:59, Howard Hinnant <howard.hinn...@gmail.com> wrote:
> In article <P7adnV6BetkQOYHanZ2dnUVZ_uCin...@comcast.com>,

Does this test use normal locks for the sort-based algorithm and

Laurent Deniau

unread,
Oct 22, 2007, 12:05:39 PM10/22/07
to
On 22 oct, 17:26, Eric Sosman <Eric.Sos...@sun.com> wrote:
> Laurent Deniau wrote On 10/22/07 04:37,:
> > On 19 oct, 18:56, Chris Friesen <cbf...@mail.usask.ca> wrote:
>
> >>Laurent Deniau wrote:
> >>>I am looking for the best way to lock N mutex (2<=N<=5)
> >>>"simultaneously" from the point of view of other threads. The only
> >>>assumption made is that the concerned mutexes are always locked/
> >>>unlocked through this interface.
>
> >>The simple method is to always lock them in the same order. One common
> >>ordering is to just use increasing mutex address.
>
> > Ok. But sorting requires somehow n.log n comparisons (and swap) which
> > means 12 comparisons (and swap) for N=5, _always_. Does this behave
> > better (in average) than a sequence of try_lock and rollback on
> > failure? Thanks.
>
> There's something wrong with your arithmetic. Five
> objects can be permuted in 120 < 2**7 ways, so seven or
> possibly eight comparisons should suffice.

I took the basic n log_2 n complexity which for n=5 gives 12 (rounded
up).

> Knuth shows
> a seven-comparison method invented by H.B. DeMuth in 1956,
> and shows how to generalize it for larger N (see TAOCP
> section 5.3.1).

I'll get a look tonight (my TAOCP is at home).

> To me, though, the interesting question in all this
> is "Why?" Why do you need (apparent) simultaneity of the
> lock operations on several locks? If Thread T1 tries to
> acquire locks A B C D E while T2 tries for C E F, what
> harm will it cause if T2 gets C after T1 gets A?

No idea ;-) I am implementing a proxy class called a Locker which
automatically locks an object (its delegate) when entering in one of
its method. Since I support multimethods, I have to consider all the
Lockers of the multimethod simultaneously otherwise I would create
deadlocks. This is a typical problem with C++ smart pointers used as
lockers since they lock objects one by one and have no knowledge about
the other lockers. Typical cases are T locks A and A or T1 locks A and
B while T2 locks B and A.

a+, ld.

Howard Hinnant

unread,
Oct 22, 2007, 12:38:31 PM10/22/07
to
In article <1193067513....@z24g2000prh.googlegroups.com>,
Laurent Deniau <Laurent...@gmail.com> wrote:

> Does this test use normal locks for the sort-based algorithm and
> recursive locks for the try_lock version?

No, non-recursive mutexes for both algorithms.

> Could give an example for
> N=3? Thanks.

Sure, be forewarned that I've written this up on the fly, without even
compiling it. I am translating from a C++ implementation which uses
higher level locks and RAII to deal with exception safety, and uses
templates to deal with more generality, and uses variadic templates to
deal with the general N case. So the translation to C is rough (but the
general idea is the same)...

int try_lock2(pthread_mutex_t* m1, pthread_mutex_t* m2)
{
if (pthread_mutex_trylock(m1) == 0)
{
if (pthread_mutex_trylock(m2) == 0)
return -1;
pthread_mutex_unlock(m1);
return 1;
}
return 0;
}

void lock3(pthread_mutex_t* m1, pthread_mutex_t* m2, pthread_mutex_t* m3)
{
while (true)
{
pthread_mutex_lock(m1);
switch (try_lock2(m2, m3))
{
case -1:
return;
case 0:
pthread_mutex_unlock(m1);
pthread_mutex_t* temp = m1;
m1 = m2;
m2 = m3;
m3 = temp;
break;
case 1:
pthread_mutex_unlock(m1);
pthread_mutex_t* temp = m3;
m3 = m2;
m2 = m1;
m1 = temp;
break;
}
sched_yield();
}
}

-Howard

Eric Sosman

unread,
Oct 22, 2007, 4:06:42 PM10/22/07
to
Laurent Deniau wrote On 10/22/07 12:05,:

> On 22 oct, 17:26, Eric Sosman <Eric.Sos...@sun.com> wrote:
>
>>Laurent Deniau wrote On 10/22/07 04:37,:
>>
>>>On 19 oct, 18:56, Chris Friesen <cbf...@mail.usask.ca> wrote:
>>
>>>>Laurent Deniau wrote:
>>>>
>>>>>I am looking for the best way to lock N mutex (2<=N<=5)
>>>>>"simultaneously" from the point of view of other threads. The only
>>>>>assumption made is that the concerned mutexes are always locked/
>>>>>unlocked through this interface.
>>
>>>>The simple method is to always lock them in the same order. One common
>>>>ordering is to just use increasing mutex address.
>>
>>>Ok. But sorting requires somehow n.log n comparisons (and swap) which
>>>means 12 comparisons (and swap) for N=5, _always_. Does this behave
>>>better (in average) than a sequence of try_lock and rollback on
>>>failure? Thanks.
>>
>> There's something wrong with your arithmetic. Five
>>objects can be permuted in 120 < 2**7 ways, so seven or
>>possibly eight comparisons should suffice.
>
>
> I took the basic n log_2 n complexity which for n=5 gives 12 (rounded
> up).

Aha! Big-Oh strikes again. Recall that the purpose
of Big-Oh is to conceal trivial matters like low-order
terms -- and like constant factors. The fact that some
operation is O(N log N) does not mean that N * log(N) is
a useful estimate.

>
>>Knuth shows
>>a seven-comparison method invented by H.B. DeMuth in 1956,
>>and shows how to generalize it for larger N (see TAOCP
>>section 5.3.1).
>
>
> I'll get a look tonight (my TAOCP is at home).
>
>
>> To me, though, the interesting question in all this
>>is "Why?" Why do you need (apparent) simultaneity of the
>>lock operations on several locks? If Thread T1 tries to
>>acquire locks A B C D E while T2 tries for C E F, what
>>harm will it cause if T2 gets C after T1 gets A?
>
>
> No idea ;-) I am implementing a proxy class called a Locker which
> automatically locks an object (its delegate) when entering in one of
> its method. Since I support multimethods, I have to consider all the
> Lockers of the multimethod simultaneously otherwise I would create
> deadlocks. This is a typical problem with C++ smart pointers used as
> lockers since they lock objects one by one and have no knowledge about
> the other lockers. Typical cases are T locks A and A or T1 locks A and
> B while T2 locks B and A.

If "T locks A and A" isn't a typo, you need either a
recursive lock or better discipline -- and if you're just
handling lock requests from someone else, it may not be
possible to enforce the latter.

For the other cases, the usual way to avoid deadlock
is to impose a partial ordering on the locks. If A B are
always acquired as "A first, then B" or as "A only" or as
"B only" but never as "B first, then A," there will be no
deadlocks. Again, this may be beyond your power to
enforce if you're just a catspaw for clumsy clients.

Even then, I don't see how (apparent) atomicity helps
anything. If client threads T1 and T2 tell you of their
desires to lock {A B C} and {C B D} you can acquire each
group of three locks as if embedded in the order D,C,B,A
and all will be well; simultaneity doesn't seem necessary.
Nor does simultaneity seem sufficient: If T1 asks for {A}
and then {B} in two transactions while T2 asks for {B} and
then {A}, you're right back where you started. If you can
be sure this doesn't happen -- for example, if your layer
acquires all its locks at one site, operates, then releases
them all and returns to the caller -- then ordering is
enough and simultaneity seems irrelevant.

... unless I'm completely misunderstanding you, which
is always a possibility to be considered ...

--
Eric....@sun.com

Chris Thomasson

unread,
Oct 22, 2007, 5:56:31 PM10/22/07
to
"Howard Hinnant" <howard....@gmail.com> wrote in message
news:howard.hinnant-9A5...@johnf2.biosci.ohio-state.edu...

> In article <P7adnV6BetkQOYHa...@comcast.com>,
> "Chris Thomasson" <cri...@comcast.net> wrote:
>
>> > Do you mean that try_lock based solutions (e.g. H.Hinnant proposal)
>> > may also deadlock?
>>
>> Not deadlock, livelock.
>
> <shrug> Instead of guessing I tested and measured, then tested and
> measured some more.
> Without the yield statements I believe I may have observed brief
> (unstable) periods of livelock.

Its equal to a giant spinlock with more than one way to get contention.
sched_yield for backoff. Humm. Well, if it performs better in your
implementation of things, so be it. You should probably add a pause
instruction in there on x86.


[...]

>
> With the yields included (as I posted), I measured an 88% performance
> advantage of the try_lock based solution over the lock in order (on dual
> processor only, single processor showed no significant performance
> difference). Much of the gains observed were in eliminating "system"
> time, and secondarily in eliminating thread starvation.

> In the sort-based algorithm the number of threads in the test tended to
> taper off more gradually at the end of the test, with the effect that
> the entire test took longer while waiting for the once-starved threads
> to finish.

Your getting significant thread starvation? How is your test structured? Are
you using mutex per-object for the locking?


> In the try_lock based algorithm, all threads tended to end their work at
> the same time, keeping both processors busy throughout the test.

Well, yeah. Both processors

Chris Thomasson

unread,
Oct 22, 2007, 6:25:05 PM10/22/07
to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:SIidnbKL2LADgIDa...@comcast.com...

> "Howard Hinnant" <howard....@gmail.com> wrote in message
> news:howard.hinnant-9A5...@johnf2.biosci.ohio-state.edu...
[...]

>> In the try_lock based algorithm, all threads tended to end their work at
>> the same time, keeping both processors busy throughout the test.
> Well, yeah. Both processors


I wonder how that would scale up to the 64-core processors out there. Any
random periods of live-locking tends to show up when you add more and more
processors. Anyway, were you only using the two-lock version, or were you
using your N-lock one? If so, what was the average locking depth
per-transaction?

Laurent Deniau

unread,
Oct 23, 2007, 3:07:27 AM10/23/07
to
On 22 oct, 18:38, Howard Hinnant <howard.hinn...@gmail.com> wrote:
> In article <1193067513.715981.32...@z24g2000prh.googlegroups.com>,

> Laurent Deniau <Laurent.Den...@gmail.com> wrote:
>
> > Does this test use normal locks for the sort-based algorithm and
> > recursive locks for the try_lock version?
>
> No, non-recursive mutexes for both algorithms.
>
> > Could give an example for
> > N=3? Thanks.
>
> Sure, be forewarned that I've written this up on the fly, without even
> compiling it. I am translating from a C++ implementation which uses
> higher level locks and RAII to deal with exception safety, and uses
> templates to deal with more generality, and uses variadic templates to
> deal with the general N case. So the translation to C is rough (but the

I am pretty confortable with C++ ;-) so feel free to post C++ code if
you want.

Ok, so you rotate the list of mutexes in order to start with the
faulty mutex first for the next wake-up. Thanks

a+, ld.

Laurent Deniau

unread,
Oct 23, 2007, 5:40:35 AM10/23/07
to
On 22 oct, 22:06, Eric Sosman <Eric.Sos...@sun.com> wrote:
> Laurent Deniau wrote On 10/22/07 12:05,:
> > On 22 oct, 17:26, Eric Sosman <Eric.Sos...@sun.com> wrote:
> >>Laurent Deniau wrote On 10/22/07 04:37,:
> >>>On 19 oct, 18:56, Chris Friesen <cbf...@mail.usask.ca> wrote:
> >>>>Laurent Deniau wrote:
>
> >>>>>I am looking for the best way to lock N mutex (2<=N<=5)
> >>>>>"simultaneously" from the point of view of other threads. The only
> >>>>>assumption made is that the concerned mutexes are always locked/
> >>>>>unlocked through this interface.
>
> >>>>The simple method is to always lock them in the same order. One common
> >>>>ordering is to just use increasing mutex address.
>
> >>>Ok. But sorting requires somehow n.log n comparisons (and swap) which
> >>>means 12 comparisons (and swap) for N=5, _always_. Does this behave
> >>>better (in average) than a sequence of try_lock and rollback on
> >>>failure? Thanks.
>
> >> There's something wrong with your arithmetic. Five
> >>objects can be permuted in 120 < 2**7 ways, so seven or
> >>possibly eight comparisons should suffice.
>
> > I took the basic n log_2 n complexity which for n=5 gives 12 (rounded
> > up).
>
> Aha! Big-Oh strikes again. Recall that the purpose
> of Big-Oh is to conceal trivial matters like low-order
> terms -- and like constant factors.

Right, this is the asymptotic average behavior of general sorting.

> The fact that some
> operation is O(N log N) does not mean that N * log(N) is
> a useful estimate.

120<3^5 permutations with partial order would lead to 2*5=10
comparisons at least. Since I do have the time to find the optimal
scheme, 12 seems to be a good worst-case approximation.

> >>Knuth shows
> >>a seven-comparison method invented by H.B. DeMuth in 1956,
> >>and shows how to generalize it for larger N (see TAOCP
> >>section 5.3.1).
>
> > I'll get a look tonight (my TAOCP is at home).

This algorithm is not suitable for this problem. In fact, all the
section 5.3.1 is about special cases where either a strict order,
either a reduced set of values are considered (or both).

In my case, I need to map N mutexes to a sorted array of M mutexes
with 0<M<=N. D=M-N being the number of duplicated mutexes which must
be explicitly be handled since I would like to avoid the use of
recursive mutexes (and with recursive mutexes you'll get 7+5
comparisons anyway).

> >> To me, though, the interesting question in all this
> >>is "Why?" Why do you need (apparent) simultaneity of the
> >>lock operations on several locks? If Thread T1 tries to
> >>acquire locks A B C D E while T2 tries for C E F, what
> >>harm will it cause if T2 gets C after T1 gets A?
>
> > No idea ;-) I am implementing a proxy class called a Locker which
> > automatically locks an object (its delegate) when entering in one of
> > its method. Since I support multimethods, I have to consider all the
> > Lockers of the multimethod simultaneously otherwise I would create
> > deadlocks. This is a typical problem with C++ smart pointers used as
> > lockers since they lock objects one by one and have no knowledge about
> > the other lockers. Typical cases are T locks A and A or T1 locks A and
> > B while T2 locks B and A.
>
> If "T locks A and A" isn't a typo, you need either a

It isn't.

> recursive lock or better discipline -- and if you're just
> handling lock requests from someone else, it may not be
> possible to enforce the latter.

Sorting and removing the duplicates should be enough.

> For the other cases, the usual way to avoid deadlock
> is to impose a partial ordering on the locks. If A B are

I cannot impose anything, I can just do the right thing.

> always acquired as "A first, then B" or as "A only" or as
> "B only" but never as "B first, then A," there will be no
> deadlocks. Again, this may be beyond your power to
> enforce if you're just a catspaw for clumsy clients.
>
> Even then, I don't see how (apparent) atomicity helps
> anything. If client threads T1 and T2 tell you of their
> desires to lock {A B C} and {C B D} you can acquire each
> group of three locks as if embedded in the order D,C,B,A
> and all will be well; simultaneity doesn't seem necessary.

But T1 and T2 are not aware of each other and T1 may even not know the
existence of D or T2.

> Nor does simultaneity seem sufficient: If T1 asks for {A}
> and then {B} in two transactions while T2 asks for {B} and
> then {A}, you're right back where you started. If you can
> be sure this doesn't happen -- for example, if your layer
> acquires all its locks at one site, operates, then releases
> them all and returns to the caller -- then ordering is
> enough and simultaneity seems irrelevant.

Right. Simultaneity is only apparent between threads, mutexes are
acquired one by one within a thread.

> ... unless I'm completely misunderstanding you, which
> is always a possibility to be considered ...

I don't know. I don't excalty see where you want to go ;-)

Let me try to formulate the problem by a trivial _static_ example (the
real problem is _dynamic_ and use proxies):

// global
OBJ queue1 = new(Queue);
OBJ queue2 = new(Queue);

OBJ safe_concat_queue(OBJ q1, OBJ q2)
{
OBJ cq;
lock2(q1,q2);
cq = concat_queue(q1,q2); // defined somewhere else, returns a new
queue
unlock2(q1,q2);
return cq;
}

assuming that T1 does (for its own purpose):

OBJ q1q2(void)
{
return safe_concat_queue(queue1,queue2)
}

and T2 does (for its own purpose):

OBJ q2q1(void) // reverse q1 vs q2 priority
{
return safe_concat_queue(queue2,queue1)
}

and T3 does (for its own purpose):

OBJ q1q1(void) // duplicate q1 elements
{
return safe_concat_queue(queue1,queue1)
}

This is the kind of situation which may happen within Locker methods.

a+, ld.

Howard Hinnant

unread,
Oct 23, 2007, 10:29:50 AM10/23/07
to
In article <SIidnbKL2LADgIDa...@comcast.com>,
"Chris Thomasson" <cri...@comcast.net> wrote:

> Your getting significant thread starvation? How is your test structured? Are
> you using mutex per-object for the locking?

This is one of my tests that experiences partial thread starvation when
using the lock-in-order algorithm:

http://en.wikipedia.org/wiki/Dining_philosophers_problem

-Howard

Howard Hinnant

unread,
Oct 23, 2007, 10:43:18 AM10/23/07
to
In article <3eudnZ0Q5-aIv4Da...@comcast.com>,
"Chris Thomasson" <cri...@comcast.net> wrote:

I ran experiments with a 3-lock version as well. I'm not positive what
you mean by "locking depth". If you are referring to recursive locking,
there was none.

I too would like to see tests run with more processors than two.

-Howard

Howard Hinnant

unread,
Oct 23, 2007, 10:50:24 AM10/23/07
to
In article <1193123247....@y27g2000pre.googlegroups.com>,
Laurent Deniau <Laurent...@gmail.com> wrote:

> I am pretty confortable with C++ ;-) so feel free to post C++ code if
> you want.

// returns index of failed lock or -1 on success
template <class L1, class L2>
int
try_lock(L1& l1, L2& l2)
{
unique_lock<L1> u(l1, try_to_lock);
if (u.owns_lock())
{
if (l2.try_lock())
{
u.release();
return -1;
}
return 1;
}
return 0;
}

template <class L1, class L2, class ...L3>
int
try_lock(L1& l1, L2& l2, L3& l3...)
{
unsigned r = 0;
unique_lock<L1> u(l1, try_to_lock);
if (u.owns_lock())
{
r = try_lock(l2, l3...);
if (r == -1)
u.release();
else
++r;
}
return r;
}

template <class L1, class L2>
void
lock(L1& l1, L2& l2)
{
while (true)
{
{
unique_lock<L1> u1(l1);
if (l2.try_lock())
{
u1.release();
break;
}
}
this_thread::yield();
{
unique_lock<L2> u2(l2);
if (l1.try_lock())
{
u2.release();
break;
}
}
this_thread::yield();
}
}

template <class L1, class L2, class ...L3>
void
__lock_first(int i, L1& l1, L2& l2, L3& l3...)
{
while (true)
{
switch (i)
{
case 0:
{
unique_lock<L1> u1(l1);
i = try_lock(l2, l3...);
if (i == -1)
{
u1.release();
return;
}
}
++i;
this_thread::yield();
break;
case 1:
{
unique_lock<L2> u2(l2);
i = try_lock(l3..., l1);
if (i == -1)
{
u2.release();
return;
}
}
if (i == sizeof...(L3))
i = 0;
else
i += 2;
this_thread::yield();
break;
default:
__lock_first(i - 2, l3..., l1, l2);
return;
}
}
}

template <class L1, class L2, class ...L3>
inline
void
lock(L1& l1, L2& l2, L3& l3...)
{
__lock_first(0, l1, l2, l3...);
}

The above is based on the lock design outlined here:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2406.html#mutex_
synopsis

-Howard

Eric Sosman

unread,
Oct 23, 2007, 12:03:05 PM10/23/07
to
Laurent Deniau wrote On 10/23/07 05:40,:
> [...]

>
> Let me try to formulate the problem by a trivial _static_ example (the
> real problem is _dynamic_ and use proxies):
>
> // global
> OBJ queue1 = new(Queue);
> OBJ queue2 = new(Queue);
>
> OBJ safe_concat_queue(OBJ q1, OBJ q2)
> {
> OBJ cq;
> lock2(q1,q2);
> cq = concat_queue(q1,q2); // defined somewhere else, returns a new
> queue
> unlock2(q1,q2);
> return cq;
> }
>
> assuming that T1 does (for its own purpose):
>
> OBJ q1q2(void)
> {
> return safe_concat_queue(queue1,queue2)
> }
>
> and T2 does (for its own purpose):
>
> OBJ q2q1(void) // reverse q1 vs q2 priority
> {
> return safe_concat_queue(queue2,queue1)
> }
>
> and T3 does (for its own purpose):
>
> OBJ q1q1(void) // duplicate q1 elements
> {
> return safe_concat_queue(queue1,queue1)
> }
>
> This is the kind of situation which may happen within Locker methods.

This seems easily solved by the standard approach
of imposing an order on the locks. None of the threads
need to be aware of this order (in the scenario you've
shown, where locks are released before returning); all
you need is that the ordering must unambiguously declare
that "L1 precedes L2" or "L2 precedes L1" for all pairs
of locks L1,L2 and that the result is the same for as
long as both locks exist. Memory addresses will do, if
objects don't get shuffled by a garbage collector or
some such.

It's then easy to write lock2:

void lock2(lock l1, lock l2) {
if (l1 "<" l2) {
lock1(l1); lock1(l2);
}
else if (l2 "<" l1) {
lock1(l2); lock1(l1);
}
else {
lock1(l1); /* l1 "==" l2 */
}
}

lockN isn't much harder:

void lockN(lock[N] lockarray) {
sort(lockarray, "<");
lock1(lockarray[0]);
for (int i = 1; i < N; ++i) {
if (lockarray[i-1] "<" lockarray[i])
lock1(lockarray[i]);
}
}

(This is schematic; you probably don't want to sort
lockarray itself, but an array of pointers to the locks
it contains. Or maybe lockarray is already such an
array of pointers, but you need to clone it to leave
the original undisturbed, or ... Fiddly details of
that nature just obscure the main idea.)

The sort probably shouldn't worry you as much as
it appears to. N should be smallish (a program that
acquires hundreds of locks at the same time in a single
thread has a disease you'll be unable to cure anyhow),
so even a simple insertion sort won't take long. If
you like, you could write straight-line code for a
few small fixed values of N and resort to a general-
purpose sort only for N > 3, say.

This all seems so basic, though. Maybe I've fixated
too much on "simultaneously" and have managed to overlook
some more important feature of your problem -- but IMHO,
"simultaneously" is a red herring, not worthy of your
attention.

--
Eric....@sun.com

Laurent Deniau

unread,
Oct 23, 2007, 2:43:18 PM10/23/07
to
On 23 oct, 18:03, Eric Sosman <Eric.Sos...@sun.com> wrote:
> This seems easily solved by the standard approach
> of imposing an order on the locks. None of the threads
> need to be aware of this order (in the scenario you've
> shown, where locks are released before returning); all
> you need is that the ordering must unambiguously declare
> that "L1 precedes L2" or "L2 precedes L1" for all pairs
> of locks L1,L2 and that the result is the same for as
> long as both locks exist. Memory addresses will do, if
> objects don't get shuffled by a garbage collector or
> some such.

Absolutely right (no GC). I didn't say that it's complex, but I was
looking for the most efficient way to do it.

> It's then easy to write lock2:
>
> void lock2(lock l1, lock l2) {
> if (l1 "<" l2) {
> lock1(l1); lock1(l2);
> }
> else if (l2 "<" l1) {
> lock1(l2); lock1(l1);
> }
> else {
> lock1(l1); /* l1 "==" l2 */
> }
> }

I agree that lock2 is obvious, but lock3 is not so obvious and the
tree based approach is not efficient:

void lock3(struct Locker *l1, struct Locker *l2, struct Locker *l3)
{
if (l1 < l2) {
if (l2 < l3)
lock(l1), lock(l2), lock(l3); // 123
else if (l2 == l3)
lock(l1), lock(l2); // 12
else {
if (l1 < l3)
lock(l1), lock(l3), lock(l2); // 132
else if (l1 == l3)
lock(l1), lock(l2); // 12
else
lock(l3), lock(l1), lock(l2); // 312
}
} else if (l1 == l2) {
if (l2 < l3)
lock(l2), lock(l3); // 23
else if (l2 == l3)
lock(l1); // 1
else
lock(l3), lock(l1); // 31
} else {
if (l3 < l2)
lock(l3), lock(l2), lock(l1); // 321
else if (l2 == l3)
lock(l3), lock(l1); // 31
else {
if (l1 < l3)
lock(l2), lock(l1), lock(l3); // 213
else if (l1 == l3)
lock(l2), lock(l3); // 23
else
lock(l2), lock(l3), lock(l1); // 231
}
}
}

in that case I prefer to switch to the equivalent sorting network
(TAOCP 5.3.4):

#define SORT(l1,l2) if (l1>l2) (tmp=l1,l1=l2,l2=tmp)

void lock3(struct Locker *l1, struct Locker *l2, struct Locker *l3)
{
struct Locker *tmp;

SORT(l1,l2);
SORT(l1,l3);
SORT(l2,l3);

lock(l1);
if (l1 != l2) lock(l2);
if (l2 != l3) lock(l3);
}

but again, this approach for lock4 is not efficient since it requires
to sort the data twice (lock and unlock). So sharing the sorting is
better for N>3 which means a separate function for sorting, removing
the duplicates and storing the result in an array:

static int
sort4(struct Locker *l[],
struct Locker *l1, struct Locker *l2, struct Locker *l3,
struct Locker *l4)
{
struct Locker *tmp;
int n = 0;

SORT(l1,l2); SORT(l3,l4);
SORT(l1,l3); SORT(l2,l4);
SORT(l2,l3);

l[n++] = l1;
if (l1 != l2) l[n++] = l2;
if (l2 != l3) l[n++] = l3;
if (l3 != l4) l[n++] = l4;

return n;
}

Then for N>4, it's nice to take into account the ALU/Pipeline
parallelism:

static int
sort5(struct Locker *l[],
struct Locker *l1, struct Locker *l2, struct Locker *l3,
struct Locker *l4, struct Locker *l5)
{
struct Locker *tmp;
int n = 0;

SORT(l1,l2); SORT(l4,l5); // optimal sorting network for N=5
SORT(l1,l3);
SORT(l2,l3); SORT(l1,l4);
SORT(l3,l4); SORT(l2,l5);
SORT(l2,l3); SORT(l4,l5);

l[n++] = l1;
if (l1 != l2) l[n++] = l2;
if (l2 != l3) l[n++] = l3;
if (l3 != l4) l[n++] = l4;
if (l4 != l5) l[n++] = l5;

return n;
}

> lockN isn't much harder:
>
> void lockN(lock[N] lockarray) {
> sort(lockarray, "<");
> lock1(lockarray[0]);
> for (int i = 1; i < N; ++i) {
> if (lockarray[i-1] "<" lockarray[i])
> lock1(lockarray[i]);
> }
> }
>
> (This is schematic; you probably don't want to sort
> lockarray itself, but an array of pointers to the locks
> it contains. Or maybe lockarray is already such an
> array of pointers, but you need to clone it to leave
> the original undisturbed, or ... Fiddly details of
> that nature just obscure the main idea.)

I agree that the main idea is simple. I was looking for the most
efficient way to do it since locking is the bottleneck of this proxy
class which provides the smallest locking granularity (highest
concurrency) to the user. But sometimes this may not be optimal if
operations on the delegate are very short (e.g. incrementing a
counter).

> The sort probably shouldn't worry you as much as
> it appears to. N should be smallish (a program that
> acquires hundreds of locks at the same time in a single
> thread has a disease you'll be unable to cure anyhow),

I have N<=5 (mentionned in the first post).

> so even a simple insertion sort won't take long. If
> you like, you could write straight-line code for a
> few small fixed values of N and resort to a general-
> purpose sort only for N > 3, say.

sorting networks are nice for 3<=N<=5 and even above.

> This all seems so basic, though.

It is the first time I try to multi-locking ;-)

> Maybe I've fixated
> too much on "simultaneously" and have managed to overlook
> some more important feature of your problem -- but IMHO,
> "simultaneously" is a red herring, not worthy of your
> attention.

thanks for your time.

a+, ld.

Chris Thomasson

unread,
Oct 23, 2007, 3:41:44 PM10/23/07
to
"Howard Hinnant" <howard....@gmail.com> wrote in message
news:howard.hinnant-7C8...@johnf2.biosci.ohio-state.edu...

> In article <3eudnZ0Q5-aIv4Da...@comcast.com>,
> "Chris Thomasson" <cri...@comcast.net> wrote:
>
>> "Chris Thomasson" <cri...@comcast.net> wrote in message
>> news:SIidnbKL2LADgIDa...@comcast.com...
>> > "Howard Hinnant" <howard....@gmail.com> wrote in message
>> > news:howard.hinnant-9A5...@johnf2.biosci.ohio-state.edu...
>> [...]
>> >> In the try_lock based algorithm, all threads tended to end their work
>> >> at
>> >> the same time, keeping both processors busy throughout the test.
>> > Well, yeah. Both processors
>>
>>
>> I wonder how that would scale up to the 64-core processors out there. Any
>> random periods of live-locking tends to show up when you add more and
>> more
>> processors. Anyway, were you only using the two-lock version, or were you
>> using your N-lock one? If so, what was the average locking depth
>> per-transaction?
>
> I ran experiments with a 3-lock version as well. I'm not positive what
> you mean by "locking depth".

If you use an N lock version, what the greatest number N got to during a run
of your test-bed.

> If you are referring to recursive locking,
> there was none.
>
> I too would like to see tests run with more processors than two.
>

Yup.

Chris Thomasson

unread,
Oct 23, 2007, 3:42:28 PM10/23/07
to

"Howard Hinnant" <howard....@gmail.com> wrote in message
news:howard.hinnant-528...@johnf2.biosci.ohio-state.edu...

I was more curious as to your implementation details as to when, and how
often, your threads take those locks...

Chris Thomasson

unread,
Oct 23, 2007, 3:46:34 PM10/23/07
to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:lc-dnaxQwOXr0IPa...@comcast.com...


I was more curious as to your implementation details wrt when, and how

Chris Thomasson

unread,
Oct 23, 2007, 3:56:15 PM10/23/07
to

"Howard Hinnant" <howard....@gmail.com> wrote in message
news:howard.hinnant-528...@johnf2.biosci.ohio-state.edu...

That problem seems to have an implicit starvation problem that you have to
explicitly work around...

Are you using the two locks to acquire two forks in a single atomic
operation? I think you can do this with a thread per-diner... The diners are
not allowed to communicate with each other, but isn't the act of releasing a
fork act as a communication anyway? They don't need to speak to each
other...


;^)

Chris Thomasson

unread,
Oct 23, 2007, 4:22:27 PM10/23/07
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:P6ydnUsARfkvzYPa...@comcast.com...

something like:

diner(int order) {
fork myforks[2];
getforks_ordered(&myforks, order);

for(int i = 1;;++i) {
if ! (i % 2) {
printf("(%p)-diner::thinking\n", (void*)this);
Sleep(650);
} else {
lock2(myforks);
printf("(%p)-diner::eating\n", (void*)this);
Sleep(450);
unlock2(myforks);
}
}
}

Howard Hinnant

unread,
Oct 23, 2007, 5:07:35 PM10/23/07
to
In article <9aidnURPWYDx04Pa...@comcast.com>,
"Chris Thomasson" <cri...@comcast.net> wrote:

> "Chris Thomasson" <cri...@comcast.net> wrote in message
> news:lc-dnaxQwOXr0IPa...@comcast.com...
> >
> > "Howard Hinnant" <howard....@gmail.com> wrote in message
> > news:howard.hinnant-528...@johnf2.biosci.ohio-state.edu...
> >> In article <SIidnbKL2LADgIDa...@comcast.com>,
> >> "Chris Thomasson" <cri...@comcast.net> wrote:
> >>
> >>> Your getting significant thread starvation? How is your test structured?
> >>> Are
> >>> you using mutex per-object for the locking?
> >>
> >> This is one of my tests that experiences partial thread starvation when
> >> using the lock-in-order algorithm:
> >>
> >> http://en.wikipedia.org/wiki/Dining_philosophers_problem
> >
> > I was more curious as to your implementation details as to when, and how
> > often, your threads take those locks...
>
>
> I was more curious as to your implementation details wrt when, and how
> often, your threads take those locks...

Very often. It is all they do:

#include <thread>
#include <mutex>
#include <iostream>
#include <functional>
#include <time.h>

using namespace std;

unsigned random_eat(unsigned& random_next)
{
random_next = random_next * 1103515245 + 12345;
return((random_next >> 16) & 0x7FFF);
}

struct Chopstick
{
Chopstick() : random_next_(1) {}
mutex mut_;
unsigned random_next_;
};

unsigned eat(unsigned i)
{
while (i)
--i;
return i;
}

void eater(Chopstick& left, Chopstick& right)
{
unsigned i = 10000000;
unsigned random_next = 1;
while (--i)
{
unique_lock<mutex> l(left.mut_, defer_lock);
unique_lock<mutex> r(right.mut_, defer_lock);
lock(l, r); // The point of the test here!
eat(random_eat(left.random_next_) +
random_eat(right.random_next_));
}
}

int main()
{
time_t t1 = time(0);
Chopstick A, B, C, D, E;
thread T1(eater, ref(A), ref(B));
thread T2(eater, ref(B), ref(C));
thread T3(eater, ref(C), ref(D));
thread T4(eater, ref(D), ref(E));
thread T5(eater, ref(E), ref(A));
T1.join();
T2.join();
T3.join();
T4.join();
T5.join();
time_t t2 = time(0);
cout << " time = " << difftime(t2, t1) << '\n';
}

-Howard

Chris Thomasson

unread,
Oct 23, 2007, 6:34:43 PM10/23/07
to

"Howard Hinnant" <howard....@gmail.com> wrote in message
news:howard.hinnant-528...@johnf2.biosci.ohio-state.edu...

Here is a simple solution I just whipped up:

http://appcore.home.comcast.net/~appcore/misc/diner_c.html

Any thoughts?

Chris Thomasson

unread,
Oct 23, 2007, 6:47:17 PM10/23/07
to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:O6Odnd26MNVI6IPa...@comcast.com...

Okay. I think I can see some of the starvation you were talking about. Humm,
let me keep testing and see if my results are consistent with yours...

;^)

Howard Hinnant

unread,
Oct 23, 2007, 7:39:55 PM10/23/07
to
In article <WridnQFRYt5b5YPa...@comcast.com>,
"Chris Thomasson" <cri...@comcast.net> wrote:

> "Chris Thomasson" <cri...@comcast.net> wrote in message
> news:O6Odnd26MNVI6IPa...@comcast.com...
> >
> > "Howard Hinnant" <howard....@gmail.com> wrote in message
> > news:howard.hinnant-528...@johnf2.biosci.ohio-state.edu...
> >> In article <SIidnbKL2LADgIDa...@comcast.com>,
> >> "Chris Thomasson" <cri...@comcast.net> wrote:
> >>
> >>> Your getting significant thread starvation? How is your test structured?
> >>> Are
> >>> you using mutex per-object for the locking?
> >>
> >> This is one of my tests that experiences partial thread starvation when
> >> using the lock-in-order algorithm:
> >>
> >> http://en.wikipedia.org/wiki/Dining_philosophers_problem
> >
> > Here is a simple solution I just whipped up:
> >
> > http://appcore.home.comcast.net/~appcore/misc/diner_c.html
> >
> > Any thoughts?
>
> Okay. I think I can see some of the starvation you were talking about. Humm,
> let me keep testing and see if my results are consistent with yours...
>
> ;^)

If I understand your test harness, I think you may want to put:

printf("diner_eat_trylock\n");

in diner_eat_trylock, and:

printf("diner_eat_sortlock\n");

in diner_eat_sortlock, and observe the results. Also if
diner_eat_trylock is meant to model the algorithm I posted here:

(oops google seems to be behind on caching posts)

well, that I posted earlier :-), I don't think it is quite there. The
idea is lock the "first" lock, then try_lock the second. If that fails
then unlock the first, yield, and then do the same thing in the reverse
order. Lather, rinse, repeat until both locks are locked.

Thanks for coding up a test!

-Howard

Chris Thomasson

unread,
Oct 23, 2007, 8:05:36 PM10/23/07
to
"Howard Hinnant" <howard....@gmail.com> wrote in message
news:howard.hinnant-8F8...@johnf2.biosci.ohio-state.edu...

Okay.

> (oops google seems to be behind on caching posts)

I noticed that too.

> well, that I posted earlier :-), I don't think it is quite there. The
> idea is lock the "first" lock, then try_lock the second. If that fails
> then unlock the first, yield, and then do the same thing in the reverse
> order. Lather, rinse, repeat until both locks are locked.

Here is some updated code:

http://appcore.home.comcast.net/~appcore/misc/diner_c.html
_______________________________________
void
diner_eat_trylock(
diner* const _this
) {
int status, i = 1, old;
/* lock in reverse-order on purpose... */
while(! (status =
pthread_mutex_lock(&_this->forks[i]->lock))) {
old = i;
i = (old) ? 0 : 1;
if (! (status =
pthread_mutex_trylock(&_this->forks[i]->lock))) {
break;
}
assert(old != i);
if (status != EBUSY) {
break;
}
if ((status =
pthread_mutex_unlock(&_this->forks[old]->lock))) {
break;
}
sched_yield();
}
if (status) { assert(! status); abort(); }
}
_______________________________________


How about that?

> Thanks for coding up a test!

No problem :^)

David Schwartz

unread,
Oct 24, 2007, 12:03:11 AM10/24/07
to
On Oct 22, 8:06 am, Chris Friesen <cbf...@mail.usask.ca> wrote:

> In linux, for instance, it has changed substantially over the years.
> Among other things, one implementation yielded to every other process on
> the system.

So what? I have never in my life understood this objection. Is the
assumption that threads are at war with each other and yielding is
tantamount to surrender?

This thread is probably the one thread in the system that is ready-to-
run but cannot usefully make forward progress. Every other thread in
the system can likely use CPU time more efficiently than this one can.

So the best thing for the system as a whole is to let every other
process/thread run, and then try this one again. That is what you
should want in this case.

DS

David Schwartz

unread,
Oct 24, 2007, 12:06:48 AM10/24/07
to

Chris Friesen

unread,
Oct 24, 2007, 12:24:12 PM10/24/07
to
David Schwartz wrote:
> On Oct 22, 8:06 am, Chris Friesen <cbf...@mail.usask.ca> wrote:

>>In linux, for instance, it has changed substantially over the years.
>>Among other things, one implementation yielded to every other process on
>>the system.

> So what? I have never in my life understood this objection. Is the
> assumption that threads are at war with each other and yielding is
> tantamount to surrender?

Basically, yes. Yielding to the world causes potentially large
latencies. sched_yield() is like telling the scheduler, "I'm waiting
for something, but I'm not telling you what...nyah, nyah."

Consider a nice -19 task that calls sched_yield(). What it really wants
is for the other nice -19 task (that is currently holding the resource)
to run, free up the resource, and give the first guy a chance to run
again right after that. Instead, potentially every task on the system
gets a chance to run, including the seti client, the search engine
indexer, and all the other background tasks.

Of course the proper solution is to wait for something specific (semas,
mutexes, completions, etc.) so that the scheduler knows what's going on
and can make decisions accordingly.

Unfortunately, given the number of legacy apps out there using
sched_yield(), it's not realistic to fix all of them. (Hence the
sched_yield() tunable in the current scheduler.)


Chris

0 new messages