pthread_cond_wait is unfair (POSIX threads)

Frank Mueller

unread,

Jun 3, 1996, 3:00:00 AM6/3/96

to

Hi there!

It recently occurred to me that pthread_cond_wait is inherently unfair
and cannot avoid starvation by itself. If this comes as no surprise to
you, skip this msg. Otherwise, read on and you may find that even the
outline on how to implement semaphores on top of condition variables
in the Annex B.11.2.9.2 of POSIX 1003.1c (Pthreads) is broken.

Description of the problem: If two threads P1,P2 of the same priority
are waiting on the same condition variable and the condition variable
is signaled ONCE, then both of them may wake up due to the cond_wait()
semantics (see comments about multiprocessor implementations and
spurious wakeups in the standard). Assuming that P2 suspended before
P1, P2 may now wake up before P1 and re-lock the mutex. P2 then finds
the loop condition false, exits the loop and changes some global state (typically the
loop condition of the cond_wait loop) before releasing the mutex. When
P1 gets the mutex, it finds the loop condition true and goes back to
sleep.

Thus, P2 may always race by P1 and P1 can starve.

This can be fixed by explicitly passing mutexes upon signaling
condition variables but requires that the queues of the condition
variables are explicitly programmed by the user. In more detail, P1
enters it's thread ID into the user queue before suspending on the
cond_wait(). So does P2. Upon signaling, some global variable
"id_signaled" is set to the thread ID of the queue head. P1 and P2
re-check the loop condition around the cond_wait, which now checks if
"id_signaled" matches the current thread ID. If not, go back to sleep
(P2); if it does, continue (P1).

This works but requires that queues are kept within the Pthreads
implementation and at the user level. This seems like a waste.

I would propose a new policy FAIR for condition variables and
routines pthread_condattr_set/getprotocol that handle
thread-directed signaling implicitly (w/o spurious wakeups,
no loops around cond_wait required).

You may tell me that mutexes are low-level and I just showed how I can
build the wanted semantic on top. But would claim:

(a) People write programs that would not with spurious wakeups.
(b) You were not aware of the problem either.

If you don't agree with (b), consider this:

The Annex B.11.2.9.2 of POSIX 1003.1c (Pthreads) is partially
broken. It gives an example on how to define semaphores via mutex and
cond. vars without passing ownership. The semantics of POSIX
semaphores does not define a wakeup order in the absence of prio
scheduling, i.e. semaphores are unfair and may cause
starvation. However, when prio scheduling is defined, the semantics
says that the highest prio thread that has been waiting on a semaphore
for the longest time shall be unblocked. In this case, the example in
the annex would be wrong.

Here is a short version of semaphores implemented via mutexes and
cond. vars, certainly NOT a fair one that could avoid starvation:

void sem_p(s)
sem_t *s;
{
pthread_mutex_lock(&s->lock);
while (s->count == 0)
pthread_cond_wait(&s->cond, &s->lock);
s->count--;
pthread_mutex_unlock(&s->lock);
}

void sem_v(s)
sem_t *s;
{
int contention;

pthread_mutex_lock(&s->lock);
contention = (s->count++ == 0);
pthread_mutex_unlock(&s->lock);
if (contention)
pthread_cond_signal(&s->cond);
}
--
Frank Mueller E-Mail: mue...@informatik.hu-berlin.de WWW: http://www.informatik.hu-berlin.de/~mueller

Steve Emmerson

unread,

Jun 5, 1996, 3:00:00 AM6/5/96

to

mue...@informatik.hu-berlin.de (Frank Mueller) writes:

> Thus, P2 may always race by P1 and P1 can starve.

Isn't this behavior consistent with the semantics of counting
semaphores? Namely, that the number of threads that are waiting on the
semaphore that are allowed to continue equals the number of times the
semaphore is posted (after it reaches zero, of course). The solution
to the above problem would be to ensure (via the design of the program)
that the semaphore is posted at least twice.

Have I missed something?
--

Steve Emmerson st...@unidata.ucar.edu ...!ncar!unidata!steve

Frank Mueller

unread,

Jun 6, 1996, 3:00:00 AM6/6/96

to

In article <steve.834000977@gummo>, st...@unidata.ucar.edu (Steve Emmerson) writes:
> mue...@informatik.hu-berlin.de (Frank Mueller) writes:
>
> > Thus, P2 may always race by P1 and P1 can starve.
>
> Isn't this behavior consistent with the semantics of counting
> semaphores? Namely, that the number of threads that are waiting on the
> semaphore that are allowed to continue equals the number of times the
> semaphore is posted (after it reaches zero, of course). The solution
> to the above problem would be to ensure (via the design of the program)
> that the semaphore is posted at least twice.

Depends on your semaphore semantics. The original msg pointed out that
non-prio POSIX semaphores don't require fairness (total ordering on wakeup)
whereas prio POSIX semaphores require this fairness.

When you look at common textbooks (Ben-Ari, Andrews, Tannenbaum) they assume
fair semaphores. Otherwise you cannot easily prevent starvation. You had to
build prio queues on the user level. Yet, these queues already exist at the
implementation lavel (kernel/library).

My points were:
(a) Be aware of the lack of fairness of condition variables.
(b) POSIX should think about providing a fair protocol to avoid replicated queues
on multiple levels.

Bil Lewis

unread,

Jun 6, 1996, 3:00:00 AM6/6/96

to

: When you look at common textbooks (Ben-Ari, Andrews, Tannenbaum) they assume

: fair semaphores. Otherwise you cannot easily prevent starvation. You had to
: build prio queues on the user level. Yet, these queues already exist at the
: implementation lavel (kernel/library).

: My points were:
: (a) Be aware of the lack of fairness of condition variables.
: (b) POSIX should think about providing a fair protocol to avoid replicated queues
: on multiple levels.

This is not as obvious a thing as it might appear at first. The beautiful,
mathematically perfect and fair lock (or whatever) verges on the unattainable.
Maybe it IS unattainable!

I'm pretty sure I have this correct... (double check my figuring!)

What you WANT is to guarantee that lock ownership is transferred in order of
request -- a FIFO mutex. WITH BOUNDED TRANSFER TIME.

The unbounded case is easy. It'll be about 2x slower than the non-guaranteed,
unbounded case, but it's simple (as long as I get to define order of request to
mean order inserted into sleep queue).

The trick is that the sleep queue for a mutex (or whatever) is a shared data
object. Which must be protected by a mutex. Which will have a sleep queue. Which
must be protected by a mutex. etc. For an unbounded number of threads, the delay
time to inserting yourself onto the sleep queue is unbounded. :-(

Now, if you're willing to do some fancy hardware hacking, you can make it all
happen in exactly the order you desire. I'll bet it wouldn't have to be all that
horribly ugly either.

However, I think it would be highly unpractical. An excessive amount of effort
for no improvement in any real program.

I have seen programs where the unbounded FIFO mutex is valuable and works fine.
I don't know of any examples that require the unbounded case.

-Bil
--
=====================
USE THIS ADDRESS --> B...@LambdaCS.com

http://www.LambdaCS.com
Lambda Computer Science
555 Bryant St. #194
Palo Alto, CA,
94301

Phone/FAX: (415) 328-8952

Frank Mueller

unread,

Jun 7, 1996, 3:00:00 AM6/7/96

to

In article <4p740a$9...@easy2.mediacity.com>, b...@easy1.mediacity.com (Bil Lewis) writes:
> This is not as obvious a thing as it might appear at first. The beautiful,
> mathematically perfect and fair lock (or whatever) verges on the unattainable.
> Maybe it IS unattainable!

[]

> What you WANT is to guarantee that lock ownership is transferred in order of
> request -- a FIFO mutex. WITH BOUNDED TRANSFER TIME.

bounded is not impossible (see real-time papers on bounded-time mutual exclusion)
but unbounded is fine for me.

[]

> Now, if you're willing to do some fancy hardware hacking, you can make it all
> happen in exactly the order you desire. I'll bet it wouldn't have to be all that
> horribly ugly either.

No hardware hacking required, all user-level programming. I pointed out that keeping
FIFO queues explicitly at the user level is sufficient in my 1st msg. What I don't
like about it is that the threads implementation already uses the same queues but does
not let me access them. (Even worse, if priority inheritance/ceiling are being used,
it's impossible to get the effective prio of a thread, i.e. to get the FIFO queuing
right at the user level.)

Dave Butenhof

unread,

Jun 7, 1996, 3:00:00 AM6/7/96

to

Frank Mueller wrote:
>
> Hi there!
>
> It recently occurred to me that pthread_cond_wait is inherently unfair
> and cannot avoid starvation by itself. If this comes as no surprise to
> you, skip this msg. Otherwise, read on and you may find that even the
> outline on how to implement semaphores on top of condition variables
> in the Annex B.11.2.9.2 of POSIX 1003.1c (Pthreads) is broken.

First and foremost, POSIX.1*'s Annex B is "rationale", which is nothing more
than commentary upon the standard, supplied by the working group. Some of
it is in hopes of avoiding misunderstanding. Some was to pave over political
differences. Some was just there because someone felt like writing it. NONE
of it is part of the standard. (One consequence of this is that the working
group as a whole had little control over rationale -- balloters could not
formally object to any aspect of the rationale and the technical editors
were under no obligation to make any "requested" changes.)

The fact that an example in POSIX.1c uses the term "semaphore" does not
require that the example is intended to model or conform to any aspect
of POSIX.1b. It is merely a "simple counting semaphore" demonstrating
a use of mutexes and condition variables to solve a specific hypothetical
synchronization problem.

> Description of the problem: If two threads P1,P2 of the same priority
> are waiting on the same condition variable and the condition variable
> is signaled ONCE, then both of them may wake up due to the cond_wait()
> semantics (see comments about multiprocessor implementations and
> spurious wakeups in the standard). Assuming that P2 suspended before
> P1, P2 may now wake up before P1 and re-lock the mutex. P2 then finds
> the loop condition false, exits the loop and changes some global state
> (typically the
> loop condition of the cond_wait loop) before releasing the mutex. When
> P1 gets the mutex, it finds the loop condition true and goes back to
> sleep.
>

> Thus, P2 may always race by P1 and P1 can starve.

Even "worse". BOTH P1 and P2 may starve if there are other threads that use
the same predicate. Signalling a condition variable very deliberately does
not guarantee anything about which thread will first see the predicate in
question. Signalling merely guarantees that at least one thread "interested
in" the predicate WILL BE SCHEDULED to ensure that, for example, an available
resource can be utilized.

Realtime priority condition variable waiters (threads with SCHED_FIFO or
SCHED_RR policy) must be priority sorted, and must be awakened in priority
order. However, the awakened thread must acquire the associated mutex
before it can re-evaluate the predicate and determine whether to continue.

Another simultaneously executing thread may already own the mutex, and may
be evaluating the same predicate prior to determining whether to wait.
Parallel throughput is far better served by allowing that thread to
continue without waiting, than to pay for (at least one) context switch
to allow the awakened thread to continue.

Spurious wakeups are a separate issue. In most cases, "spurious wakeups" are
just a way of ensuring that the application predicate loop is carefully
coded, and not an artifact of the threads implementation. If, for example,
your P2 thread was waiting in pthread_cond_timedwait, it might chance to
time out around when the predicate became true. While many condition wait
loops don't bother to evaluate the predicate when the wait returns ETIMEDOUT,
it would be legal (and in many cases important) to do so. Because P2 may
have been unblocked by timeout before P1 was unblocked by pthread_cond_signal,
and because mutexes do not have "ownership passoff" semantics, it would be
legitimate for P2 to detect the changed predicate and perform whatever
action might be required. P1 now detects a "spurious wakeup" (so far as it
can determine).

Spurious wakeups are, for the most part, just one way of stating that a
condition variable is not a "monitor". A thread unblocked from a monitor
is guaranteed that it was awakened through a grand design and is inevitably
destined to become the grand poobah of whatever resource was controlled
by the monitor. Condition variables, in contrast, are humble signalling
mechanisms -- the grand design and promotion to grand poobah are entirely
the responsibility of the caller.

"Real" spurious wakeups, caused by allowable timing races or other
simplifications within the "Pthread implementation" are uncommon border
cases. The standard allows them so that the implementation may avoid
complicated and inefficient (and in most cases useless) "sequentialization"
within the condition variable implementation, especially on multiprocessors.
While your scenerio may occur once in a while, it should not have a
substantial effect on the behavior of your application.

You will get "most predictable" behavior if

1) All threads possibly interested in some predicate (that might wait upon
a given condition variable) run in SCHED_FIFO policy, with the same
priority
2) The POSIX.1c "allocation domain" is 1 (essentially, a uniprocessor) and
3) All code paths that signal or broadcast the condition variable do so while
owning the associated mutex.

This means that all unblocked threads will wait on the mutex sorted by time,
and a "non-waiting" thread cannot enter the mutex wait queue (or, therefore,
lock the mutex) ahead of any waiter unless the non-waiting thread has higher
priority. I know of no good reason for a "Pthread implementation" spurious
wakeup to occur on a uniprocessor, regardless of the fact that they are not
disallowed.

> This can be fixed by explicitly passing mutexes upon signaling
> condition variables but requires that the queues of the condition
> variables are explicitly programmed by the user. In more detail, P1
> enters it's thread ID into the user queue before suspending on the
> cond_wait(). So does P2. Upon signaling, some global variable
> "id_signaled" is set to the thread ID of the queue head. P1 and P2
> re-check the loop condition around the cond_wait, which now checks if
> "id_signaled" matches the current thread ID. If not, go back to sleep
> (P2); if it does, continue (P1).
>
> This works but requires that queues are kept within the Pthreads
> implementation and at the user level. This seems like a waste.
>
> I would propose a new policy FAIR for condition variables and
> routines pthread_condattr_set/getprotocol that handle
> thread-directed signaling implicitly (w/o spurious wakeups,
> no loops around cond_wait required).

I've considered some sort of "fairness" protocol -- though I'm not sure
whether it ought to be an attribute of the synchronization objects, or
of the threads. (There would be advantages and disadvantages to either.)
I believe there would need to be a lot of prototyping and experimentation
to get it "right".

> You may tell me that mutexes are low-level and I just showed how I can
> build the wanted semantic on top. But would claim:
>
> (a) People write programs that would not with spurious wakeups.
> (b) You were not aware of the problem either.

Always use condition waits in predicate-testing loops, and forget about
spurious wakeups. If you find that "Pthread implementation" spurious
wakeups are really a statistically significant factor in your application's
behavior, then talk to your vendor... either the Pthread implementation has
a problem, or you are using some features in a manner they hadn't anticipated.

> If you don't agree with (b), consider this:
>
> The Annex B.11.2.9.2 of POSIX 1003.1c (Pthreads) is partially
> broken. It gives an example on how to define semaphores via mutex and
> cond. vars without passing ownership. The semantics of POSIX
> semaphores does not define a wakeup order in the absence of prio
> scheduling, i.e. semaphores are unfair and may cause
> starvation. However, when prio scheduling is defined, the semantics
> says that the highest prio thread that has been waiting on a semaphore
> for the longest time shall be unblocked. In this case, the example in
> the annex would be wrong.

> --
> Frank Mueller E-Mail: mue...@informatik.hu-berlin.de WWW: http://www.informatik.hu-berlin.de/~mueller

--
+-------------------------------------------------------------------+
| Dave Butenhof Digital Equipment Corporation |
| bute...@zko.dec.com 110 Spit Brook Rd, ZKO2-3/Q18 |
| PH 603.881.2218, FAX 603.881.0120 Nashua, NH 03062-2711 |
+----------+ "Better Living Through Concurrency" +----------+
+---------------------------------------------+

David Brownell

unread,

Jun 8, 1996, 3:00:00 AM6/8/96

to

Frank Mueller wrote:
>
> Description of the problem: If two threads P1,P2 of the same priority
> are waiting on the same condition variable

That is, both are in standard loops that look like

pthread_mutex_lock (mutex);
...
while (requirement_is_unsatisfied) // different tests are OK
pthread_cond_wait (cond, mutex);
...
pthread_mutex_unlock (mutex);

> and the condition variable
> is signaled ONCE, then both of them may wake up due to the cond_wait()
> semantics (see comments about multiprocessor implementations and
> spurious wakeups in the standard). Assuming that P2 suspended before
> P1, P2 may now wake up before P1 and re-lock the mutex. P2 then finds
> the loop condition false, exits the loop

That is, you're assuming buggy programs. The rules about the legal
returns from pthread_cond_wait() encourage the "while" above, ensuring that
when the condition isn't satisfied P2 won't "exit the loop" but rather
that it goes back to waiting. I've yet to see a program where it was a
good idea to omit the "while" loop. Omitting it just causes flakiness
in some cases, and can't (portably) improve anything.

> and changes some global state (typically the
> loop condition of the cond_wait loop) before releasing the mutex.

This will normally be a second bug in your program. Not attributable
to POSIX threads at all!

> When
> P1 gets the mutex, it finds the loop condition true and goes back to
> sleep.

This really doesn't match any pthread_cond_wait() idiom I've ever seen
used in a correct program. I attribute the flakey behaviour you describe
to buggy application code. POSIX can't prevent buggy applications.

- Dave

Frank Mueller

unread,

Jun 10, 1996, 3:00:00 AM6/10/96

to

In article <31B80B...@zko.dec.com>, Dave Butenhof <bute...@zko.dec.com> writes:

Thanks, Dave. This should clarify how to use cond_wait correctly for anyone.
(I just brought it up because I am the author of a free POSIX Threads package
yet must admit that I had not realized the unfairness of the model. This lead
me to believe that others may not have realized it either.)

However, I do disagree on the following:

> You will get "most predictable" behavior if
>
> 1) All threads possibly interested in some predicate (that might wait upon
> a given condition variable) run in SCHED_FIFO policy, with the same
> priority
> 2) The POSIX.1c "allocation domain" is 1 (essentially, a uniprocessor) and
> 3) All code paths that signal or broadcast the condition variable do so while
> owning the associated mutex.
>
> This means that all unblocked threads will wait on the mutex sorted by time,
> and a "non-waiting" thread cannot enter the mutex wait queue (or, therefore,
> lock the mutex) ahead of any waiter unless the non-waiting thread has higher
> priority. I know of no good reason for a "Pthread implementation" spurious
> wakeup to occur on a uniprocessor, regardless of the fact that they are not
> disallowed.

(a) If the standard allows spurious wakeups and your programming model does not
consider it, your program becomes non-portable. Why use POSIX in first place?
(b) A uni-processor Pthreads implementation may allow suprious wakeups upon receiving
a signal during a cond_wait (upon return from the signal handler). You can check
errno but the result may even have been overwritten by a timeout (cond_timedwait).

> Always use condition waits in predicate-testing loops, and forget about
> spurious wakeups. If you find that "Pthread implementation" spurious
> wakeups are really a statistically significant factor in your application's
> behavior, then talk to your vendor... either the Pthread implementation has
> a problem, or you are using some features in a manner they hadn't anticipated.

While they should not be "statistically significant" (in order to not hamper performance),
they may occur infrequently and one should be aware of the possibility of starvation (i.e.
unfairness, although maybe statistically not important for many applications).

Frank Mueller

unread,

Jun 10, 1996, 3:00:00 AM6/10/96

to

In article <31B9AA...@ix.netcom.com>, David Brownell <brow...@ix.netcom.com> writes:
> That is, you're assuming buggy programs. The rules about the legal
> returns from pthread_cond_wait() encourage the "while" above, ensuring that
> when the condition isn't satisfied P2 won't "exit the loop" but rather
> that it goes back to waiting. I've yet to see a program where it was a
> good idea to omit the "while" loop. Omitting it just causes flakiness
> in some cases, and can't (portably) improve anything.

Read the original post again. pthread_cond_wait should ALWAYS be in a loop.
Even though, it may cause starvation the way most people use it.
(And notice that the loop condition will have to test a global variable
that is shared between threads and protected by the mutex used in the cond_wait!
You did not seem to have understood my statement about this either.)

> > When
> > P1 gets the mutex, it finds the loop condition true and goes back to
> > sleep.
>
> This really doesn't match any pthread_cond_wait() idiom I've ever seen
> used in a correct program. I attribute the flakey behaviour you describe
> to buggy application code. POSIX can't prevent buggy applications.

Read the POSIX 1003.1c standard. This can actually happen (and is allowed)!
BTW, I did not observe it in any of my code since I'm using my (the FSU PThreads)
implementation that passed mutex ownership on cond_signal (if possible),
so this problem does not occur at all.

However, I was told that there are implementations with spurious wakeups.

Tom Payne

unread,

Jun 10, 1996, 3:00:00 AM6/10/96

to

Frank Mueller (mue...@informatik.hu-berlin.de) wrote:
:
: In article <31B80B...@zko.dec.com>, Dave Butenhof <bute...@zko.dec.com> writes:
[...]
:
: > priority. I know of no good reason for a "Pthread implementation" spurious

: > wakeup to occur on a uniprocessor, regardless of the fact that they are not
: > disallowed.

[...]
: (b) A uni-processor Pthreads implementation may allow suprious wakeups upon receiving

: a signal during a cond_wait (upon return from the signal handler). You can check
: errno but the result may even have been overwritten by a timeout (cond_timedwait).

:
: > Always use condition waits in predicate-testing loops, and forget about

: > spurious wakeups. If you find that "Pthread implementation" spurious
: > wakeups are really a statistically significant factor in your application's
: > behavior, then talk to your vendor... either the Pthread implementation has
: > a problem, or you are using some features in a manner they hadn't anticipated.

I'd be interested to know the reason that the standard permits
spurious wakeups. Occasionally, they require a workaround that
increases code complexity somewhat.

Tom Payne (t...@cs.ucr.edu)

David Brownell

unread,

Jun 11, 1996, 3:00:00 AM6/11/96

to

Frank Mueller wrote:
>
> (a) If the standard allows spurious wakeups and your programming model does not
> consider it, your program becomes non-portable. Why use POSIX in first place?

The essence of my post was that this is a bug in your APPLICATION. It was relying
on behaviour that is not guaranteed. In fact, the degree of "fair scheduling" you
seem to expect is rare ... I've never worked on a system which provided it, and have
not suffered from the lack.

> (b) A uni-processor Pthreads implementation may allow suprious wakeups upon receiving
> a signal during a cond_wait (upon return from the signal handler). You can check
> errno but the result may even have been overwritten by a timeout (cond_timedwait).

In POSIX Threads, "errno" is not used with pthread_cond_wait() ... it's a return
value, not a global, so nothing will have been overwritten. Moreover, even if you
chose to store the result of pthread_cond_wait() in "errno", that value is specific
to yoru thread ... so it won't have been overwritten in that case either!

> > Always use condition waits in predicate-testing loops, and forget about
> > spurious wakeups. If you find that "Pthread implementation" spurious
> > wakeups are really a statistically significant factor in your application's
> > behavior, then talk to your vendor... either the Pthread implementation has
> > a problem, or you are using some features in a manner they hadn't anticipated.
>

> While they should not be "statistically significant" (in order to not hamper performance),
> they may occur infrequently and one should be aware of the possibility of starvation (i.e.
> unfairness, although maybe statistically not important for many applications).

I'll just say that with lots of experience on the Solaris threading implementation,
the existence of "spurious" wakeups on uniprocessors or multiprocessors has never
caused a problem in the code I've written or used. I never saw starvation, and never
saw programs impacted by unfair scheduling at that level.

- Dave Brownell
http://www.netcom.com/~brownell

David Brownell

unread,

Jun 11, 1996, 3:00:00 AM6/11/96

to

Frank Mueller wrote:
>
> David Brownell <brow...@ix.netcom.com> writes:
> > That is, you're assuming buggy programs. The rules about the legal
> > returns from pthread_cond_wait() encourage the "while" above, ensuring that
> > when the condition isn't satisfied P2 won't "exit the loop" but rather
> > that it goes back to waiting. I've yet to see a program where it was a
> > good idea to omit the "while" loop. Omitting it just causes flakiness
> > in some cases, and can't (portably) improve anything.
>
> Read the original post again.

It was kind of long, and the initial explanation (to which I referred) seemed
to be claiming the standard behaviour is a bug. The rest of the post depended
on that assertion. In the context of POSIX threads, the bug was in application
code which depended on nonportable scheduling assumptions. (Which I guess was
the main point you were explaining!)

> BTW, I did not observe it in any of my code since I'm using my (the FSU PThreads)
> implementation that passed mutex ownership on cond_signal (if possible),
> so this problem does not occur at all.

It'd occur on a pthread_cond_broadcast() though, wouldn't it? Or in (exotic)
cases where a single condition variable was used with different mutexes, when
only one of the mutexes was unlocked?

Remember that pthread_cond_signal() is an optimization, and it's reasonable
in some cases (and legal in all cases!) to replace it with a broadcast().
If you keep that in mind, your application's assumptions will be better.

Think of how this would work on a multiprocessor, with scheduling affinity
such that threads run preferentially on the same CPU they last ran on (the
one that has hardware caches preloaded for that thread's state). Isn't it
better to wake up the threads that'd run on idle processors, rather than to
deschedule running threads in the name of "fairness"? A scheduler that aims
to maximize throughput won't always be "fair".

- Dave

Frank Mueller

unread,

Jun 13, 1996, 3:00:00 AM6/13/96

to

In article <31BDA6...@ix.netcom.com>, David Brownell <brow...@ix.netcom.com> writes:
> > BTW, I did not observe it in any of my code since I'm using my (the FSU PThreads)
> > implementation that passed mutex ownership on cond_signal (if possible),
> > so this problem does not occur at all.
>
> It'd occur on a pthread_cond_broadcast() though, wouldn't it? Or in (exotic)
> cases where a single condition variable was used with different mutexes, when
> only one of the mutexes was unlocked?

You are absolutely right. Again, unfairness may theoretically cause starvation.
You may never observe it. It's more of a theoretical issue anyways. (In practice,
rela-time applications may want liveness guarantees, not just absence of starvation --
such as "will be served in less than x amount of time". Since Pthreads is part of
the POSIX real-time extension, I thought this issue was relevant. It certainly is to me.)

Dave Butenhof

unread,

Jun 13, 1996, 3:00:00 AM6/13/96

to

Frank Mueller wrote:

>
> In article <31B9AA...@ix.netcom.com>, David Brownell <brow...@ix.netcom.com> writes:
> > That is, you're assuming buggy programs. The rules about the legal
> > returns from pthread_cond_wait() encourage the "while" above, ensuring that
> > when the condition isn't satisfied P2 won't "exit the loop" but rather
> > that it goes back to waiting. I've yet to see a program where it was a
> > good idea to omit the "while" loop. Omitting it just causes flakiness
> > in some cases, and can't (portably) improve anything.
>

> Read the original post again. pthread_cond_wait should ALWAYS be in a loop.
> Even though, it may cause starvation the way most people use it.
> (And notice that the loop condition will have to test a global variable
> that is shared between threads and protected by the mutex used in the cond_wait!
> You did not seem to have understood my statement about this either.)

"Starvation" is in the eye of the beholder. The "disconnect" is that, while you (and
others) are interested in "thread fairness", the designers were more interested in
what I guess I'll call "predicate fairness" (I just made that up... do you like it?
NO? Oh well, we can also call it "throughput" :-) ). That is, if a predicate has been
satisfied, IT DOESN'T MATTER WHICH THREAD "ACCEPTS" THE PREDICATE -- ONLY THAT ONE
DOES SO AS SOON AS POSSIBLE.

In other words, if you have three threads waiting for a predicate, and another one
that's about to look at the predicate before waiting, you don't care how long the
waiters have been there. You want the thread that's "about to look" to accept the
predicate because it can do so SOONEST, without requiring an expensive context
switch.

In the POSIX model, threads are partners working to fulfill a contract, not
adversaries. That is, aside from priority scheduling rules intended to be sure
threads with the most important subcontracts get preference, it makes no difference
which thread performs a given step first. Unless the contract states that electrical
hookup is more important than plumbing, who really cares whether the electrician or
the plumber get through the door first when the framer says the frame is done? And,
assuming the entire house is controlled by one mutex, does it really matter if the
electrician reaches EACH room first instead of alternating rooms with the plumber?
Why would you want the electrician to have to go away after wiring the bedroom, and
wait for the plumber to drive out to the site, just to ensure the plumber can have
her fair chance to plumb the bathroom next rather than allow the electrician to
"unfairly" wire two rooms in a row?

If two threads are "interested in" the same predicate, you want the predicate to be
handled by whichever thread can handle it soonest, or with least overhead.

The problem with this is when resources are multiplexed between threads acting as
subcontractors for different jobs. Sometimes you DO care about "thread fairness". In
a multithreaded server, for example, you might control access to a "communications
resource" using a condition variable, for example, a network connector. You may have
three threads representing different users that all need the same network connector,
and thus need to wait for the same predicate. But it really does matter that access
to the connector is somewhat fair, because "server throughput" is no longer the only
criteron for success -- each CLIENT must be convinced that its own throughput is
acceptable.

This model was not considered in the POSIX architecture, or in the architecture of
any other mainstream thread package. I've come to believe that was an oversight, but
as I mentioned in my previous response I haven't convinced myself of the proper
solution yet. I don't think it's as simple as "fair condition variables" or "fair
mutexes", though it may be. (Or at least, that may be enough.)

> > > When
> > > P1 gets the mutex, it finds the loop condition true and goes back to
> > > sleep.
> >
> > This really doesn't match any pthread_cond_wait() idiom I've ever seen
> > used in a correct program. I attribute the flakey behaviour you describe
> > to buggy application code. POSIX can't prevent buggy applications.
>
> Read the POSIX 1003.1c standard. This can actually happen (and is allowed)!

> BTW, I did not observe it in any of my code since I'm using my (the FSU PThreads)
> implementation that passed mutex ownership on cond_signal (if possible),
> so this problem does not occur at all.

This may be a fairness optimization, but it's a throughput pessimization because you
guarantee that at least one context switch is required to handle the predicate. It's
legal, but since vastly more threaded programs are concerned with throughput than
with fairness, I do not believe it is desirable default behavior unless you're sure
that the majority of the users of your package want fairness instead of throughput.
It'd be a good candidate for a condition variable attribute, though.

> However, I was told that there are implementations with spurious wakeups.

Certainly there are. And even on an implementation that doesn't provide spurious
wakeups, the caller should code for them. That gives robust code, including loops to
test predicates, that help insure more reliable execution. Without the loop, for
example, an overly optimistic cond_signal (or any cond_signal made without the mutex
locked) may end up being a "spurious wakeup" since the predicate may no longer be
true once the awakened thread runs.

> --
> Frank Mueller E-Mail: mue...@informatik.hu-berlin.de WWW: http://www.informatik.hu-berlin.de/~mueller

+-------------------------------------------------------------------+

Tom Payne

unread,

Jun 13, 1996, 3:00:00 AM6/13/96

to

Dave Butenhof (bute...@zko.dec.com) wrote:

: Frank Mueller wrote:
: >
: > In article <31B9AA...@ix.netcom.com>, David Brownell <brow...@ix.netcom.com> writes:
: > > That is, you're assuming buggy programs. The rules about the legal
: > > returns from pthread_cond_wait() encourage the "while" above, ensuring that
: > > when the condition isn't satisfied P2 won't "exit the loop" but rather
: > > that it goes back to waiting. I've yet to see a program where it was a
: > > good idea to omit the "while" loop. Omitting it just causes flakiness
: > > in some cases, and can't (portably) improve anything.
: >
: > Read the original post again. pthread_cond_wait should ALWAYS be in a loop.
: > Even though, it may cause starvation the way most people use it.
: > (And notice that the loop condition will have to test a global variable
: > that is shared between threads and protected by the mutex used in the cond_wait!
: > You did not seem to have understood my statement about this either.)
:
: "Starvation" is in the eye of the beholder. The "disconnect" is that, while you (and
: others) are interested in "thread fairness", the designers were more interested in
: what I guess I'll call "predicate fairness" (I just made that up... do you like it?
: NO? Oh well, we can also call it "throughput" :-) ). That is, if a predicate has been
: satisfied, IT DOESN'T MATTER WHICH THREAD "ACCEPTS" THE PREDICATE -- ONLY THAT ONE
: DOES SO AS SOON AS POSSIBLE.

Also, there are almost always other *possibilities* for starvation
(e.g. spinlocks) in the implementation of conditions and mutexes.
Genearlly, fairness should be achieved through scheduling protocols,
implemented via prioritized wait operations (where a thread's
resumption priority is specified by a parameter to wait). It's not
immediately clear to me how to do this efficiently with a predicate.

Tom Payne (t...@cs.ucr.edu)

Frank Mueller

unread,

Jun 14, 1996, 3:00:00 AM6/14/96

to

In article <31C004...@zko.dec.com>, Dave Butenhof <bute...@zko.dec.com> writes:
> This may be a fairness optimization, but it's a throughput pessimization because you
> guarantee that at least one context switch is required to handle the predicate. It's
> legal, but since vastly more threaded programs are concerned with throughput than
> with fairness, I do not believe it is desirable default behavior unless you're sure
> that the majority of the users of your package want fairness instead of throughput.
> It'd be a good candidate for a condition variable attribute, though.

Exactly! I never proposed to change the current behavoir of the POSIX model. I just
pointed out one of it's shortcomings and I agree that this "shortcoming" allows a
much more efficient implementation.

I did propose, however, to introduce a new attribute (for cond vars) since fairness
on the user level would replicate Pthreads kernel/lib data structures, which seems
like unnecessary overhead.

That's all.

Topher Eliot

unread,

Jun 14, 1996, 3:00:00 AM6/14/96

to

[Discussion of fairness related to cond_wait]

You are absolutely right. Again, unfairness may theoretically cause starvation.
You may never observe it. It's more of a theoretical issue anyways. (In practice,
rela-time applications may want liveness guarantees, not just absence of starvation --
such as "will be served in less than x amount of time". Since Pthreads is part of
the POSIX real-time extension, I thought this issue was relevant. It certainly is to me.)

This is not correct. POSIX 1003.1 ("real time", aka 1003.4) and POSIX 1003.1C
("threads", aka 1003.4A) are separate. An implementation may provide neither,
either one, or both.

Topher Eliot Data General Unix Core Development
(919) 248-6371 eliot at dg-rtp.dg.com
Obviously, I speak for myself, not for DG.
misc.consumers.house archivist. Send empty mail to house-...@dg-rtp.dg.com
Sigh. You'll have to type in my email address by hand, because I don't
want it to be available to junk-mail software.