pthread_cond_wait waking up spuriously?

Vincent Predoehl

ungelesen,

01.05.2001, 10:57:0801.05.01

an

I saw this in the man pages for pthread_cond_wait:

When using condition variables there is always a boolean predicate
involving shared variables associated with each condition wait that is
true if the thread should proceed. Spurious wakeups from the
pthread_cond_wait() or pthread_cond_timedwait() functions may occur.
Since the return from pthread_cond_wait() or pthread_cond_timedwait()
does not imply anything about the value of this predicate, the predicate
should be re-evaluated upon such return.

Can anyone explain to me why this is so?

Thanks,
--
Vincent

Michael Fuhr

ungelesen,

01.05.2001, 13:52:2101.05.01

an

Vincent Predoehl <vpre...@mac.com> writes:

The FAQ contains some info on spurious wakeups -- search for "spurious":

http://www.LambdaCS.com/cpt/FAQ.html

Butenhof's book _Programming with POSIX Threads_ says that "Spurious
wakeups may sound strange, but on some multiprocessor systems, making
condition wakeup completely predictable might substantially slow all
condition variable operations." (p. 80)

Nichols, Buttlar, and Farrell's book _Pthreads Programming_ says that
"Perhaps a signaling thread has, in error or due to an unexpected
condition, awakened our waiting thread when the expected condition has
not in fact been met. In addition, the Pthreads library allows an
underlying threads library to issue spurious wakeups to a waiting
thread without violating the standard. We need to guard against this
possibility as well." (p. 83)

Although Butenhof notes that "The race conditions that cause spurious
wakeups should be considered rare," the defensive programmer should
always check for them. Murphy's law guarantees that "rare" occurrences
will happen more often that you think, and always at a bad time.

Hope this helps.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Jokinen Jyke

ungelesen,

01.05.2001, 14:28:2001.05.01

an

mf...@dimensional.com (Michael Fuhr) writes:

> Butenhof's book _Programming with POSIX Threads_ says that "Spurious
> wakeups may sound strange, but on some multiprocessor systems, making
> condition wakeup completely predictable might substantially slow all
> condition variable operations." (p. 80)

This seems the most common explanation for spurious wakeups, but I have never
seen any concrete examples of this. Could someone show a more detailed example
of a (hardware) system which makes predictable atomic operations a practical
impossibility? (Most moders architectures have test-and-set or similar
operation which is quaranteed to be atomic even between processors. I've
always thought this is enough for a mutex/condition variable implementation.)

- Jyke Jokinen

Eric Sosman

ungelesen,

01.05.2001, 15:58:0401.05.01

an

Even without non-signalled wakeups, correct code must still
re-test the predicate after a wakeup[*]. The programmer writes
absolutely no extra code to cope with possible spurious wakeups;
the code which deals with them is the code that would be required
anyhow.

All of which means that there's not much reason other than
morbid curiousity to ask why spurious wakeups might occur. You'd
write exactly the same code whether they did or did not, so any
answer to the question would have precisely zero influence on your
code.

[*] Why re-test the predicate if wakeups only occur in response
to signals and broadcasts? Consider the following INCORRECT code
for a producer-consumer arrangement:

/* Producer */
pthread_mutex_lock(&mutex);
add_something_to_queue();
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&condvar);

/* INCORRECT Consumer */
pthread_mutex_lock(&mutex);
if (queue_is_empty())
pthread_cond_wait(&condvar, &mutex);
remove_something_from_queue();
pthread_mutex_unlock();

Scenario: Thread C1 (a consumer) locks the mutex, finds the
queue empty, and goes to sleep waiting on the condition variable.
Thread P1 comes along, adds an element to the queue, and signals
the condition variable to awaken a sleeper. At just that moment,
Thread C2 grabs the lock, discovers that there's something in
the queue (P1 just put it there), removes the queued item, and
proceeds on its merry way. Now C1 awakens and tries to remove
something from the queue -- but the queue is empty, even though
there was nothing in the least bit "spurious" about the wakeup.

Lots of other scenarios can be formulated; this is not a
special case. A helpful way to think about condition variables
is to keep in mind that the C.V. itself has no idea what kind of
a predicate is being waited for; when some thread signals a C.V.
it isn't saying "the predicate is now true," it's saying "some
component of the predicate *may* have been changed." It's up
to the waiter to decide whether the criteria for proceeding have
been satisfied or not.

--
Eric....@east.sun.com

Kaz Kylheku

ungelesen,

01.05.2001, 22:57:0001.05.01

an

It is so because implementations can sometimes not avoid inserting
these spurious wakeups; it might be costly to prevent them.

Perhaps more importantly, your own program's logic can introduce spurious
wakeups which cannot be eliminated. This can start happening as soon as there
are more than two threads.

You see, a condition waiting thread which has been signaled may have to compete
with another thread in order to re-acquire the mutex. If that other thread
gets the mutex first, it can change the predicate, so that when finally the
original thread acquires it, the predicate is false.

This is also a spurious wakeup, for all purposes. To make this form of
spurious wakeup go away, the semantics of condition variables would have to
change in troublesome ways, back to the original monitors and conditions
concept introduced by Quicksort father C. A. R. Hoare. Under Hoare's monitors
and conditions signaling a condition would atomically transfer the monitor to
the first task waiting on the condition, so that woken task could just assume
that the predicate is true: if (!predicate()) wait(&condition); /* okay */

The very useful broadcast operation does not quite fit into Hoare's model, for
obvious reasons; the signaler can choose only one task to become the
next monitor owner.

Also, such atomic transfers of lock ownership are wasteful, especially on a
multiprocessor; the ownership transfer spans an entire context switch from one
task to another, during which that lock is not available to other tasks.
The switch can take thousands of cycles, inflating the length of a small
critical region hundreds of times!

Lastly, a problem with Hoare's approach is that a ``clique'' of tasks can form
which bounce ownership of the monitor among themselves, not allowing any other
task entry into the monitor. No reliable provision can be made for
priority-based entry into the monitor, because the signal operation implicitly
ingores such priority; at best it can choose the highest priority thread that
is waiting on the condition, which ignores tasks that are waiting to get into
the monitor. In the POSIX model, a condition variable signal merely wakes up a
thread, making it runnable. The scheduling policy will effectively decide
fairness, by selecting who gets to run from among runnable threads. Waking up
of threads waiting on monitors and conditions is done in priority order also,
depending on the scheduling policy.

Tom Payne

ungelesen,

02.05.2001, 08:33:0402.05.01

an

Kaz Kylheku <k...@ashi.footprints.net> wrote:

: On Tue, 01 May 2001 14:57:08 GMT, Vincent Predoehl <vpre...@mac.com> wrote:
:>I saw this in the man pages for pthread_cond_wait:
:>
:>When using condition variables there is always a boolean predicate
:>involving shared variables associated with each condition wait that is
:>true if the thread should proceed. Spurious wakeups from the
:>pthread_cond_wait() or pthread_cond_timedwait() functions may occur.
:>Since the return from pthread_cond_wait() or pthread_cond_timedwait()
:>does not imply anything about the value of this predicate, the predicate
:>should be re-evaluated upon such return.
:>
:>Can anyone explain to me why this is so?

: It is so because implementations can sometimes not avoid inserting
: these spurious wakeups; it might be costly to prevent them.

But why? Why is this so difficult? For example, are we talking about
situations where a wait times out just as a signal arrives?

: Perhaps more importantly, your own program's logic can introduce spurious

: wakeups which cannot be eliminated. This can start happening as soon as there
: are more than two threads.

Right, but that's a different matter.

: You see, a condition waiting thread which has been signaled may have to compete

: with another thread in order to re-acquire the mutex. If that other thread
: gets the mutex first, it can change the predicate, so that when finally the
: original thread acquires it, the predicate is false.
: This is also a spurious wakeup, for all purposes. To make this form of
: spurious wakeup go away, the semantics of condition variables would have to
: change in troublesome ways, back to the original monitors and conditions
: concept introduced by Quicksort father C. A. R. Hoare. Under Hoare's monitors
: and conditions signaling a condition would atomically transfer the monitor to
: the first task waiting on the condition, so that woken task could just assume
: that the predicate is true: if (!predicate()) wait(&condition); /* okay */

: The very useful broadcast operation does not quite fit into Hoare's model, for
: obvious reasons; the signaler can choose only one task to become the
: next monitor owner.

Hoare fakes broadcasts with daisy chains of signals: the application
is written so that each signallee signals the next. However, in such
cases it is impossible to wake up just one waitor.

: Also, such atomic transfers of lock ownership are wasteful, especially on a

: multiprocessor; the ownership transfer spans an entire context switch from one
: task to another, during which that lock is not available to other tasks.
: The switch can take thousands of cycles, inflating the length of a small
: critical region hundreds of times!

In my limited experience, things are't that bad if the signaller
passes its CPU along with the lock. Things fall apart, however, if
the signaller simply makes the signallee "runnable".

: Lastly, a problem with Hoare's approach is that a ``clique'' of tasks can form

: which bounce ownership of the monitor among themselves, not allowing any other
: task entry into the monitor.

Any time you have a low-level lock that is almost always held, there
is a deeper problem than the semantics of lock passing. Either the
applications are trying to do too much while holding the lock or the
implementation of lock passing is messed up. As I mentioned above,
the natural mistake in implementing Hoare semantics is for the
signaller to simply put the signallee into runnable state and then
defer, without directly passing control to the signallee.

: No reliable provision can be made for

: priority-based entry into the monitor, because the signal operation implicitly
: ingores such priority; at best it can choose the highest priority thread that
: is waiting on the condition, which ignores tasks that are waiting to get into
: the monitor.

Another way to look at it is that under Hoare semantics, the signallee
has absolutely top priority for the lock, even over the signaller, who
has top priority after the signallee. In general, however, locks are
not the place to do long-term waiting or implement scheduling policies
-- that's what conditions are for.

: In the POSIX model, a condition variable signal merely wakes up a

: thread, making it runnable. The scheduling policy will effectively decide
: fairness, by selecting who gets to run from among runnable threads. Waking up
: of threads waiting on monitors and conditions is done in priority order also,
: depending on the scheduling policy.

In POSIX signalling, to what extent, if any, does the awakened
signallee have priority over other contenders for the lock? (I've
gotten conflicting impressions at various times.)

Indeterminacy regarding who gets the lock after a signal introduces
more possibilities for the programmer to worry about, more
opportunities for mistakes, and more difficulties in avoiding
opportunities for indefinite postponement (which are of concern in
"hard" real-time systems).

The real cost of Hoare's approach is two context switches, rather than
one, each time a signal occurs --- the signaller passes control to the
signallee who passes control back to the signaller upon exiting the
monitor. That overhead can be avoided, however, if either the signal
or the wait is the last operation before releasing the lock (i.e.,
nearly all cases).

Tom Payne

Joe Seigh

ungelesen,

02.05.2001, 09:55:2502.05.01

an

Basically, to allow flexibility in the implementation. I don't know what
specifics they had in mind when they cast the spec but I can suggest hypothetical
situations.

You might have a really efficient signaling mechanism that very occasionally generates
false positives due to a race condition. Overall system performance might be dramatically
improved if using the efficient signaling mechanism with the spurious wakeups was allowed.

You could have a situation where a pthread_cond_signal could only be "presented" to a
wait set asynchronously. If you couldn't determine which threads were in the wait set
at the time the signal was done, you have a problem determining which threads were
eligible to be woken up. Spurious wake ups allow you to implement pthread_cond_signal
as pthread_cond_broadcast. Just wake all the theads up. (there's possibly problems or
side effects associated with doing things this way, but we won't go into them here).

Or suppose you have system with unlimited processors and really efficient shared
memory. You could do a pthreads implementation where every thread got their own
processor, the pthread mutexs would be spin locks, pthread_cond_signal and
pthread_cond_broadcast would be noops, and pthread_cond_wait would be an unlock/lock
sequence more or less.

Joe Seigh

Patrick Doyle

ungelesen,

03.05.2001, 09:24:5403.05.01

an

In article <9couq0$eo5$1...@glue.ucr.edu>, Tom Payne <t...@hill.cs.ucr.edu> wrote:

>Kaz Kylheku <k...@ashi.footprints.net> wrote:
>
>: It is so because implementations can sometimes not avoid inserting
>: these spurious wakeups; it might be costly to prevent them.
>
>But why? Why is this so difficult? For example, are we talking about
>situations where a wait times out just as a signal arrives?

You know, I wonder if the designers of pthreads used logic like this:
users of condition variables have to check the condition on exit anyway,
so we will not be placing any additional burden on them if we allow
spurious wakeups; and since it is conceivable that allowing spurious
wakeups could make an implementation faster, it can only help if we
allow them.

They may not have had any particular implementation in mind.

--
--
Patrick Doyle
doy...@eecg.toronto.edu

Dave Butenhof

ungelesen,

04.05.2001, 09:27:4004.05.01

an

Patrick Doyle wrote:

You're actually not far off at all, except you didn't push it far enough.

The intent was to force correct/robust code by requiring predicate loops. This was
driven by the provably correct academic contingent among the "core threadies" in
the working group, though I don't think anyone really disagreed with the intent
once they understood what it meant.

We followed that intent with several levels of justification. The first was that
"religiously" using a loop protects the application against its own imperfect
coding practices. The second was that it wasn't difficult to abstractly imagine
machines and implementation code that could exploit this requirement to improve
the performance of average condition wait operations through optimizing the
synchronization mechanisms.

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

Tom Payne

ungelesen,

06.05.2001, 01:55:4306.05.01

an

Dave Butenhof <David.B...@compaq.com> wrote:
: Patrick Doyle wrote:

:> In article <9couq0$eo5$1...@glue.ucr.edu>, Tom Payne <t...@hill.cs.ucr.edu> wrote:
:> >Kaz Kylheku <k...@ashi.footprints.net> wrote:
:> >
:> >: It is so because implementations can sometimes not avoid inserting
:> >: these spurious wakeups; it might be costly to prevent them.
:> >
:> >But why? Why is this so difficult? For example, are we talking about
:> >situations where a wait times out just as a signal arrives?
:>
:> You know, I wonder if the designers of pthreads used logic like this:
:> users of condition variables have to check the condition on exit anyway,
:> so we will not be placing any additional burden on them if we allow
:> spurious wakeups; and since it is conceivable that allowing spurious
:> wakeups could make an implementation faster, it can only help if we
:> allow them.
:>
:> They may not have had any particular implementation in mind.

: You're actually not far off at all, except you didn't push it far enough.

: The intent was to force correct/robust code by requiring predicate loops. This was
: driven by the provably correct academic contingent among the "core threadies" in
: the working group, though I don't think anyone really disagreed with the intent
: once they understood what it meant.

: We followed that intent with several levels of justification. The first was that
: "religiously" using a loop protects the application against its own imperfect
: coding practices.

So, "spurious signals" started out as a boogey-man story to frighten
impressionable programmers into minding ther Ps and Qs. ;-)

: The second was that it wasn't difficult to abstractly imagine

: machines and implementation code that could exploit this requirement to improve
: the performance of average condition wait operations through optimizing the
: synchronization mechanisms.

This is where it gets interesting. My naive impression is that an
event can always be turned into a condition (in the sense of
"predicate") by setting an event-occurred flag. When one thread
issues a deliberate signal to a condition variable, it can set a flag
indicating that a non-spurious signal has occurred, and the signallee
can check that flag. It would seem that such a flag could be set and
checked by the implementation as easily as by the programmer, thereby
preventing the overhead of spurious context switches and perhaps some
unnecessary predicate testing.

Tom Payne

Dave Butenhof

ungelesen,

07.05.2001, 08:43:4007.05.01

an

Tom Payne wrote:

> Dave Butenhof <David.B...@compaq.com> wrote:
> : Patrick Doyle wrote:
>
> : The intent was to force correct/robust code by requiring predicate loops. This was
> : driven by the provably correct academic contingent among the "core threadies" in
> : the working group, though I don't think anyone really disagreed with the intent
> : once they understood what it meant.
>

> So, "spurious signals" started out as a boogey-man story to frighten
> impressionable programmers into minding ther Ps and Qs. ;-)

More or less. Good rule to remember. If you can't educate 'em, scare the heck out of
'em. The results are almost as good, and much easier to achieve. ;-)

> : The second was that it wasn't difficult to abstractly imagine
> : machines and implementation code that could exploit this requirement to improve
> : the performance of average condition wait operations through optimizing the
> : synchronization mechanisms.
>
> This is where it gets interesting. My naive impression is that an
> event can always be turned into a condition (in the sense of
> "predicate") by setting an event-occurred flag. When one thread
> issues a deliberate signal to a condition variable, it can set a flag
> indicating that a non-spurious signal has occurred, and the signallee
> can check that flag. It would seem that such a flag could be set and
> checked by the implementation as easily as by the programmer, thereby
> preventing the overhead of spurious context switches and perhaps some
> unnecessary predicate testing.

Not quite that easy, since the issue is really "simultaneous" events across processors.
You'd need an atomic count rather than just a flag: "so many signals not yet resulting
in a thread resuming execution". But that, remember, would only account for internal
thread library "spurious wakes", not application spurious wakes. (Which, in practice,
are a more significant issue.) The condition variable is not a monitor: the thread
library doesn't know anything about the application predicate, so pretending to avoid
spurious wakes seems pointless. If you have to leave a hole in the floor, make it
obvious; covering it over with thin plaster just guarantees someone's going to fall in.

Furthermore, an atomic counter isn't free, or even particularly cheap; it's that sort
of cost we were trying to avoid. Especially since the cost would have avoided only part
of the problem. (And would have weakened the "scared straight" approach to teaching
good basic programming practice. ;-) )

aekka...@gmail.com

ungelesen,

18.03.2020, 21:03:1118.03.20

an

aekka...@gmail.com

ungelesen,

18.03.2020, 21:03:4818.03.20

an

Am Dienstag, 1. Mai 2001 21:57:08 UTC+7 schrieb Vincent Predoehl:

Bonita Montero

ungelesen,

19.03.2020, 01:50:2419.03.20

an

Also consider stolen wakeups !