What excatly is a spurious wakeup? (Was: discussion on barriers)

A. Meijster

unread,

Apr 13, 2001, 6:26:57 AM4/13/01

to

Hi,

I read several times in the newsgroup about people
stating things on 'spurious wakeups'. What do you mean
by that? Does it mean a thread got woken up, while there
was no signal/broadcast? In that case, it seems
to me very hard to write correct threaded programs in the
first place. See, e.g. the discussion we had on barrier
implementations. It is now completely unclear to
me which implementation is correct/wrong.

Can anybody shed some light on this?

Cheers,

Arnold (a.mei...@rc.rug.nl)

Dave Butenhof

unread,

Apr 13, 2001, 7:48:27 AM4/13/01

to

"A. Meijster" wrote:

> I read several times in the newsgroup about people
> stating things on 'spurious wakeups'. What do you mean
> by that? Does it mean a thread got woken up, while there
> was no signal/broadcast? In that case, it seems
> to me very hard to write correct threaded programs in the
> first place. See, e.g. the discussion we had on barrier
> implementations. It is now completely unclear to
> me which implementation is correct/wrong.

Spurious wakeups are two things:

* Write your program carefully, and make sure it works even if you
missed something.
* Support efficient SMP implementations

There may be rare cases where an "absolutely, paranoiacally correct"
implementation of condition wakeup, given simultaneous wait and
signal/broadcast on different processors, would require additional
synchronization that would slow down ALL condition variable operations
while providing no benefit in 99.99999% of all calls. Is it worth the
overhead? No way!

But, really, that's an excuse because we wanted to force people to write
safe code. (Yes, that's the truth.)

What if you've got two threads servicing a work queue. A thread queues
an item, and signals the condition variable to awaken a server thread.
Except only one of the two server threads is waiting. The other, having
just completed a work item, swings back around to look for a new one.
Gee, there's a work item just waiting for it. How convenient!

Now... does it make sense for this running, "cache hot" thread to go to
sleep waiting for the NEXT work item, or to simply take the one that's
waiting and go with it? The one you awakened may or may not immediately
find a processor, and will take a bit of time to start up. Forcing the
application to wait for it would be counterproductive. So, yeah, the one
already running should take it. OK, so now the awakened thread spins up,
pulls an item from the queue, and the program dies with a SIGSEGV
because it's empty. What's wrong? The problem is that the thread wasn't
"defensively codedd" to check the bloody queue before removing an item.
If it had checked, it'd have seen that the queue was empty, and then it
could simply go back to sleep until the next item.

THAT's a spurious wakeup, too. (Alternately termed a "stolen wakeup";
but the distinction is irrelevant to the call site.) Spurious wakeups
are good, but for implementation and application performance. And the
solution is trivial; and should be immediately obvious to anyone who's
ever taken a basic programming class. Don't ASSuME... CHECK!

POSIX says you cannot write:

pthread_mutex_lock(&m);
pthread_cond_wait(&c,&m);
pthread_mutex_unlock(&m);

That doesn't tell future maintainers why you're waiting. It doesn't
protect against stolen or spurious wakes. Instead, you write:

pthread_mutex_lock(&m);
while (!predicate)
pthread_cond_wait(&c,&m);
pthread_mutex_unlock(&m);

Now anyone looking at your code knows why you're waiting. And if you, or
the implementation, "screwed up", the situation is self correcting.

Yes, this can be a performance sink if it's done too much. You should
consider implementation spurious wakeups to be extremely rare events,
though. And if you get enough stolen wakes to be significant, then
probably either your "work units" are too small to keep the servers
busy, or you've got too many servers contending for the same queue.

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

Kaz Kylheku

unread,

Apr 13, 2001, 6:18:51 PM4/13/01

to

On Fri, 13 Apr 2001 10:26:57 GMT, A. Meijster <A.Mei...@rc.rug.nl> wrote:
>
>
>Hi,
>
>I read several times in the newsgroup about people
>stating things on 'spurious wakeups'. What do you mean
>by that? Does it mean a thread got woken up, while there
>was no signal/broadcast?

That's exactly what it means.

>In that case, it seems
>to me very hard to write correct threaded programs in the
>first place.

It doesn't make it hard to design correct threaded programs, however it is true
that it opens up the possibility of making a mistake.

>See, e.g. the discussion we had on barrier
>implementations. It is now completely unclear to
>me which implementation is correct/wrong.

A condition variable is simply a place for threads to give up a mutex and enter
into an efficient wait state. That wait state pertains to some logical
predicate---a true or false statement based on the values of shared objects
protected by the mutex and possibly other values as well. A condition variable
wakeup is merely a hint to the thread that the predicate may be true; the
thread must test the predicate.

The basic schema is this:

lock(&shared_data.mutex);

/* need certain predicate to be true */

while (!predicate(&shared_data)) {
/* somehow let other threads execute and acquire the lock,
while sitting in an efficient wait state until someone makes the
predicate true. */
}

The solution to the problem expressed by the large comment is to use
a condition variable, and remember to broadcast that variable elsewhere
whenever the predicate is made true. The loop will not terminate until
the predicate is true; when the loop does terminate, it is guaranteed
that the predicate is true and that the thread owns the lock.

Correct programming is then performed entirely by designing the structure of
the shared data, paying attention to its state transitions and predicates.
Since condition variables have no programmer visible state, they contribute
nothing to the complexity of the program's state space.

Using the barrier as an example, to implement the barrier, we have to identify
what states that object can be in, and to express the predicate which tells a
thread that the barrier has fired. Firstly, to determine when this predicate
needs to be made true, a barrier needs to count how many threads are waiting
for it. So the object must include a counter. When the barrier fires, a state
change must occur which all the waiting threads can test for when they wake up.
Therefore, at the very least, it needs to have a binary flag which toggles on
firing. The predicate which expresses the state change is simply a test that
the new state is different from a previously observed state:

/* store current state in a local variable */
bool observed_state = barrier.state;

while (observed_state == barrier.state) {
/* wait on condition variable */
}

/* barrier has fired! */
assert (observed_state != barrier.state);

A. Meijster

unread,

Apr 14, 2001, 2:39:54 PM4/14/01

to

Thanks Dave and Kaz for replying question.
However, now I am still puzzled. Let's have a look at
the following scenario, with two threads:

thread1: lock(m) ; put something in queue and signal/broadcast ; unlock(m);

thread2: lock(m); while (queue is empty) wait (cv,m); .....; unlock(m);

Now thread2 locks m, finds the queue empty and waits. The while
is because of spurious wakeups, otherwise an if would have been enough.
No let say there's a spurious wakeup, and thread2 becomes active, and finds
the queue empty. Just before it starts waiting thread1 inserts something in the
queue, and signals. This signal is missed by thread2 since it was not
waiting yet. what happens?

Perhaps, this is not possible because of the lock m. Can a spurious wakeup
happen while another process 'owns' the mutex? My feeling about spurious
wakeups is that it was to hard or expensive to implement things right/fully.
How much freedom is allowed though? Is a spurious wakeup not possible under
protection of the mutex m?

Any comment appreciated, since this is really bothering me. I
have working code on two architectures, but am not sure anymore
if they are actually correct.

Cheers,

Arnold

Kenneth Chiu

unread,

Apr 14, 2001, 3:30:36 PM4/14/01

to

In article <3AD89985...@rc.rug.nl>,

A. Meijster <A.Mei...@rc.rug.nl> wrote:
>thread1: lock(m) ; put something in queue and signal/broadcast ; unlock(m);
>
>thread2: lock(m); while (queue is empty) wait (cv,m); .....; unlock(m);
>
>Now thread2 locks m, finds the queue empty and waits. The while is
>because of spurious wakeups, otherwise an if would have been enough.
>No let say there's a spurious wakeup, and thread2 becomes active, and
>finds the queue empty. Just before it starts waiting thread1 inserts
>something in the queue, and signals. This signal is missed by thread2
>since it was not waiting yet. what happens?

This is not possible. The mutex is reacquired by thread2 on a spurious
wakeup. To insert something in the queue, thread1 would have to block
till the mutex was released. Since the release of the mutex and beginning
of the wait is atomic, there is no "window" during which thread1 could
insert into the queue.

Note that the signal can be moved out of the critical region.

Kaz Kylheku

unread,

Apr 15, 2001, 11:18:24 AM4/15/01

to

On Sat, 14 Apr 2001 18:39:54 GMT, A. Meijster <A.Mei...@rc.rug.nl> wrote:
>
>
>Thanks Dave and Kaz for replying question.
>However, now I am still puzzled. Let's have a look at
>the following scenario, with two threads:
>
>
>
>thread1: lock(m) ; put something in queue and signal/broadcast ; unlock(m);

You'd normally want to do the broadcast after giving up the lock to avoid
unnecessarily stretching the critical section over a potentially
expensive function.

>thread2: lock(m); while (queue is empty) wait (cv,m); .....; unlock(m);
>
>Now thread2 locks m, finds the queue empty and waits. The while
>is because of spurious wakeups, otherwise an if would have been enough.

Even if the implementation never generates spurious wakeups, you still
need the while loop if the signal is moved outside of the lock, or
if there are multiple consumers.

If signaling is done outside of the lock:

thread2 can sneak into the lock and consume the ready item without waiting
on the condition. Then, yet before thread1 has signaled, thread2 can
perhaps cycle around and get the mutex again, this time wait on the
condition because the queue is empty. The signaling will then take place,
and thread2 will wake up, yet there is no new item added.

If there are multiple consumers:

the thread woken by the condition signal can contend for the mutex with
some other thread. That other thread can get it first and ``steal'' the
item, thus negating the predicate ``queue has one or more items''.
Essentially, the signal went to one thread, but some other thread got the
goods. So the thread which got the signal is spuriously woken. Move the
signal operation inside the lock does not help.

So you see, it's only under limited circumstances that you can get away with a
simple if---not only does the implementation avoid generating spurious
wakeups, but your thread interaction has to fit into a narrow schema.

>No let say there's a spurious wakeup, and thread2 becomes active, and finds
>the queue empty. Just before it starts waiting thread1 inserts something in the
>queue, and signals. This signal is missed by thread2 since it was not
>waiting yet. what happens?

This is not possible, because the wait operation has special semantics.
It's not possible for thread1 to insert something into the queue before
thread2 starts waiting, because thread2 holds the mutex, and the wait
operation is specially designed so that unlocking the mutex and entering
the wait is atomic with respect to thread1's acquisition of the mutex
and subsequent signaling.

If this were a problem, condition variables would be rather useless!