A word of caution when juggling pthread_cond_signal/pthread_mutex

Bryan Ischo

unread,

Mar 19, 2004, 12:15:14 AM3/19/04

to

Hi all. I read with much interest the threads on this newsgroup which
discuss which is the "better" way to signal a pthreads Condition
Variable; the two alternatives seem to be:

---

pthread_mutex_lock(&mutex);
set value which makes condition true;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);

AND:

pthread_mutex_lock(&mutex);
set value which makes condition true;
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&cond);

---

I originally wrote my code using the first method because it's what is
listed in most pthread code examples and documentation. Then I read
on this board all of the good reasons to signal the condition variable
while not holding the mutex (seems to make sense - you don't need to
hold the mutex to signal the condition variable and you might gain a
very slight performance advantage by signalling outside of the mutex
lock).

So I examined my code, decided that it would be just fine to switch it
to be the latter way, and did so. But I eventually discovered a
problem with this approach.

First allow me to say that there is only a problem in rare
circumstances to signal a condition variable while holding the mutex;
it is quite appropriate and a fine way to do it in most cases. And I
also do not mean to suggest that anyone on this board was wrong in
suggesting that the second way is perfectly fine in most cases, or
that I got any kind of bad advice here; the only reason that I ran
into a problem in doing it the latter way is that *I* did not fully
understand the implications of doing so. But I would like to describe
the problem that I encountered in hopes that it might benefit someone
else who is writing the same kind of code.

Basically, I had some code which looked something like this:

void handle_workers()
{
while (true) {
// Wait for a worker to be on the list
pthread_mutex_lock(&workers_mutex);
while (worker_list_is_empty()) {
pthread_cond_wait(&workers_cond, &workers_mutex);
}

// Get the first worker off of the list
worker *worker = pop_worker_list_head();

// "Trade" mutexes; we were holding the list lock, now we want
the
// individual worker lock, and during the trade, clear the flag
which
// indicates that the worker is on the list
pthread_mutex_lock(&(worker->mutex));
worker->is_on_work_list = false;
pthread_mutex_unlock(&workers_mutex);

// Set it to "is_working"
worker->is_working = true;

// Release its lock while we call its "Work" method
pthread_mutex_unlock(&(worker->mutex));
worker->Work();
pthread_mutex_lock(&(worker->mutex));

// It's not working anymore, so reset its flag, signal its
condition
// variable, unlock its mutex, and go back to the top of the
loop
worker->is_working = false;
pthread_cond_signal(&(worker->cond));
pthread_mutex_unlock(&(worker->mutex));
}
}

OK, so the above just waits for a worker to be put on the list, and
then calls its work method. Note that a worker's "is_on_work_list"
flag is true whenever it is on the work list, and its "is_working" is
true whenever it is off of the list but in its Work() method. Note
also that it signals the worker's condition variable at the very end;
we'll see why here:

void worker_destroy(worker *worker)
{
pthread_mutex_lock(&(worker->mutex));
// If the worker is working, wait for it to be done before
deleting it
while (worker->is_on_work_list || worker->is_working) {
pthread_cond_wait(&(worker->cond), &(worker->mutex));
}
delete worker;
}

The worker_destroy function deletes a worker, but it waits until any
outstanding calls to Work() that it has to make, are done. Note that
external mechanisms must be used to ensure that the worker is not
referenced anywhere else except in the work list when worker_destroy()
is called on it. But I have that covered in my code.

Then I moved the pthread_cond_signal(&(worker->cond)) inside
handle_workers() outside of the mutex lock, so that the end of the
loop block looked like this:

// Release its lock while we call its "Work" method
pthread_mutex_unlock(&(worker->mutex));
worker->Work();
pthread_mutex_lock(&(worker->mutex));
worker->is_working = false;
pthread_mutex_unlock(&(worker->mutex));
// Note the gap between the unlock and the signal
pthread_cond_signal(&(worker->cond));

So the problem I encountered, in *very* rare circumstances (I had to
run the test case through over 1 million iterations through the
handle_workers() loop before I saw this problem manifest itself), is
that the worker_destroy() method might actually acquire the worker's
mutex immediately after it is released but before the condition
variable is signalled, right in the gap between the unlock and the
signal. It would then see that the is_on_work_list and is_working
flags were false and would skip over its condition variable wait, and
go straight to deleting the worker. Then the handle_workers() thread
would resume and try to signal the condition variable, but it was
embedded in the worker which was just destroyed, so it was garbage
memory at this point. Seg fault!

So I would suggest that signalling a condition variable while holding
the lock is definitely the safest thing to do because you will never
be bitten by a race condition like this. Of course, if you are really
careful and you always recognize cases where the signal should happen
inside the mutex because the condition variable itself will
potentially be destroyed by the thread which is waiting for its
signal, then you should be OK too. Myself, I will stick with putting
the condition variable signals inside the mutex locks because I find
that these issues are so subtle that it's worth a little extra
caution.

Of course, the gurus on this board would never make such a mistake as
the one I made so this advice doesn't really apply to them :) ...

Thank you, and best wishes,
Bryan

Loic Domaigne

unread,

Mar 19, 2004, 4:16:14 AM3/19/04

to

Hi Bryan,

Thanks for your post. That's nice that you share your experiences with
others. That's actually the whole point of NGs like c.p.t.

Yes, you have constructed a pratical example where the two forms:

> ---

> pthread_mutex_lock(&mutex);
> set value which makes condition true;
> pthread_cond_signal(&cond);
> pthread_mutex_unlock(&mutex);

> AND:

> pthread_mutex_lock(&mutex);
> set value which makes condition true;
> pthread_mutex_unlock(&mutex);
> pthread_cond_signal(&cond);

aren't equivalent. People could argue that your design is wrong, because
it's your responsability to ensure that you are not freeing the resource
holding the condition variable that you want to signal... And, yes, they
are other alternative that would solve your problem, in which variant (2)
works.

But I believe, the fastest and safest way to accomode this issue was
probably to stick to variant (1) as you did, that is signal under mutex
protection...

Welcome to the Joy of threads programming ;-)

HAND,
Loic.
--
Article posté via l'accès Usenet http://www.mes-news.com
Accès par Nnrp ou Web

Patrick TJ McPhee

unread,

Mar 20, 2004, 4:48:40 PM3/20/04

to

In article <d976b79f.04031...@posting.google.com>,
Bryan Ischo <bji...@ischo.com> wrote:

% Hi all. I read with much interest the threads on this newsgroup which
% discuss which is the "better" way to signal a pthreads Condition
% Variable; the two alternatives seem to be:

Let me start by saying that I used to believe that signalling without
holding the lock was better, but the only reason for that was that
you tended to get fewer replies bringing up this subject when you did
that in this newsgroup. It no longer seems to be the case, and it's
probably best to avoid using pthread_cond_signal() in postings to this
group.

With respect to your example, David Schwartz (I apologise if I've
mis-spelled that) gave a similar example a few weeks ago, and David
Butenhof either said or suggested that this was an inappropriate use
of pthread_cond_signal(). My opinion is that it's a bad idea to use
pthread_cond_signal() if you need to wake a particular thread. You should
use pthread_cond_broadcast() unless you know that exactly one thread
is needed, and it doesn't matter which thread it is.
--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

Loic Domaigne

unread,

Mar 21, 2004, 4:52:14 PM3/21/04

to

> % Hi all. I read with much interest the threads on this newsgroup which
> % discuss which is the "better" way to signal a pthreads Condition
> % Variable; the two alternatives seem to be:

> With respect to your example, David Schwartz (I apologise if I've
> mis-spelled that) gave a similar example a few weeks ago, and David
> Butenhof either said or suggested that this was an inappropriate use
> of pthread_cond_signal().

The example of DS looked to me rather highly specialized (something like
an academic case)... But - I confess up - I don't perhaps took enough time
to study DS example in more extend.

However, I found Bryan's example extremely easy to grab from the first
reading.

Cheers,
Loic.

David Schwartz

unread,

Mar 23, 2004, 10:05:02 PM3/23/04

to

"Loic Domaigne" <loic...@gmx.net> wrote in message
news:3e0374e6.04032...@posting.google.com...

> The example of DS looked to me rather highly specialized (something like
> an academic case)... But - I confess up - I don't perhaps took enough time
> to study DS example in more extend.

You are correct, it was obviously deliberately constructed to create a
problem and bore little to no resemblance to anything a person could
actually code.

> However, I found Bryan's example extremely easy to grab from the first
> reading.

Yes, it's a much better example of a way you can get into trouble.

DS

A word of caution when juggling pthread_cond_signal/pthread_mutex_unlock

Bryan Ischo

Loic Domaigne

Patrick TJ McPhee

Loic Domaigne

David Schwartz