pthread_cond_* implementation questions

tere...@my-deja.com

unread,

Jan 19, 2001, 1:30:04 PM1/19/01

to

Hi,

I have questions about two different
implemenations of
POSIX conditions.

Firstly, my understanding of the subject is as
follows
(please correct me if i am wrong)

for condition wait (calling thread owns the
condition mutex)
==================================================
==========
calling thread atomically releases the mutex and
is added to
the condition wait queue.

for condition signal with condition mutex owned
by some thread
==================================================
============
one thread from the condition wait queue (which
one - depends
on scheduling policy) is atomically moved from
the condition
wait queue to the condition mutex wait queue.

for condition broadcast with condition mutex
owned by some thread
==================================================
===============
all threads from the condition wait queue are
atomicaly moved
from the condition wait queue to the condition
mutex wait queue.

Questions...

Q1: Is the implementation (condition
wait/signal/broadcast calls)
which:

a) provides atomic release of condition mutex and
addition of the
calling threads (wait call) to the condition wait
queue;

b) does not guarantee that thread(s) released
from condition wait
queue by condition signal/broadcast will be added
to condition
mutex wait queue BEFORE thread(s) released by
subsequent call
to signal/broadcast;

conformant with POSIX??

Q2: Is the implementation (condition
wait/signal/broadcast calls)
which:

a) does not provide _atomic_ release of condition
mutex and addition
of the calling threads (wait call) to the
condition wait queue;

b) does guarantee that thread(s) released from
condition wait queue by
condition signal/broadcast will be added to
condition mutex wait queue
BEFORE thread(s) released by subsequent call to
signal/broadcast;

c) postpones the move of later threads from the
condition wait queue
to the condition mutex wait queue until all
earlier released threads
_pass_through_ the condition mutex (mutex is
owned by last earlier
released waiter thread);

conformant with POSIX??

thanks,

regards,
alexander.

Sent via Deja.com
http://www.deja.com/

David Schwartz

unread,

Jan 19, 2001, 3:09:49 PM1/19/01

to

tere...@my-deja.com wrote:

An implementation may delay wakeups as long as it wants. The standard
provides no scheduling guarantees. However the semantics would break
horribly if the 'release mutex and wait for condition' wasn't atomic.

DS

tere...@my-deja.com

unread,

Jan 20, 2001, 4:22:42 PM1/20/01

to

David,

>An implementation may delay wakeups as long as it wants.
>The standard provides no scheduling guarantees.

I understand that. But what I am not sure about is the
sequence of wakeups:

Suppose we have 4 condition waiter threads waiting for
a signal/broadcast. It happens to be a broadcast. All 4
are now supposed to compete for mutex ownership, return
from their wait call (one after another) and do whatever
they want afterwards... Now, waiters 1 and 2 (after return
from their wait calls) call condition wait again...
waiter 3 (after return from his wait) releases the mutex
and calls condition signal. Now assume that waiter 4
(which was unblocked by the broadcast) not only does not
own the mutex yet but did not even _try_ to acquire its
ownership (was not put on mutex wait queue) because his
thread was preempted. Now the question... such an
implementation may allow waiter 1 or 2 (unblocked by
signal) to overtake waiter 4 without even giving him a
chance to compete for mutex ownership (i.e. second wave
waiter(s) may end up owning the mutex not because of
winning (in the mutex wait queue - prty sched) over
first wave waiter thread(s) but simply because first
wave waiter thread(s) happened to be preempted on the
way to mutex.acquire() call after being unblocked.

IS SUCH IMPLEMENTATION CORRECT IN RESPECT TO
SEQUENCE OF WAKEUPS??

(please see the code below)

> However the semantics would break horribly if the
>'release mutex and wait for condition' wasn't atomic.

Well, I suppose that what you mean is that
signals/broadcasts should not be lost between
'release mutex' and 'wait for condition'. But it seems
that it could be done w/o atomic operation...
Some implementations based on semaphores (ACE,
pthreads-win32) do exactly that:

================
waiting threads:
================

{ /** Critical Section

inc cond.waiters_count

}

/*
/* Atomic only if using Win32 SignalObjectAndWait
/*
cond.mtx.release
cond.sem.wait

/*** ^^-- WAITER CAN BE PREEMPTED AFTER BEING UNBLOCKED...

{ /** Critical Section

dec cond.waiters_count

/*** ^^- ...AND BEFORE DECREMENTING THE COUNT (1)

last_waiter = ( cond.was_broadcast &&
cond.waiters_count == 0 )

if ( last_waiter )

cond.was_broadcast = FALSE

endif

}

if ( last_waiter )

/*
/* Atomic only if using Win32 SignalObjectAndWait
/*
cond.auto_reset_event_or_sem.post /* Event for Win32
cond.mtx.acquire

/*** ^^-- ...AND BEFORE CALL TO mtx.acquire (2)

/*** ^^-- NESTED BROADCASTS RESULT IN A DEADLOCK

else

cond.mtx.acquire

/*** ^^-- ...AND BEFORE CALL TO mtx.acquire (3)

endif

===============
signal threads:
===============

{ /** Critical Section

waiters_count = cond.waiters_count

}

if ( waiters_count != 0 )

sem.post 1

/*** ^^-- SPURIOUS WAKEUPS DUE TO (1)

endif

==================
broadcast threads:
==================

{ /** Critical Section

waiters_count = cond.waiters_count

if ( waiters_count != 0 )

cond.was_broadcast = TRUE

endif

}

if ( waiters_count != 0 )

cond.sem.post waiters_count

/*** ^^^^^--- SPURIOUS WAKEUPS DUE TO (1)

cond.auto_reset_event_or_sem.wait /* Event for Win32

/*** ^^^^^--- DEADLOCK FOR FURTHER BROADCASTS IF THEY
HAPPEN TO GO INTO WAIT WHILE PREVIOUS
BROADCAST IS STILL IN PROGRESS/WAITING

endif

It seems that the "problems" with spurious wakeups and
deadlock could be solved in current implementation,
but I am also thinking about a different implementation
(also semaphore based but with multiple semaphores),
which would try to solve that sequencing "problem" too.
But before going forward I am really interested to know
from more experienced people on this forum whether all
these three problems do really exist in the current
implementation... a) at all b) sequencing in respect to
POSIX standard / practical considerations and c) that a
different implementation which

>c) postpones the move of later threads from the
>condition wait queue
>to the condition mutex wait queue until all
>earlier released threads
>_pass_through_ the condition mutex (mutex is
>owned by last earlier
>released waiter thread);

would not be equally incorrect from standard
compliance / practical point of view.

David Schwartz

unread,

Jan 20, 2001, 4:36:16 PM1/20/01

to

tere...@my-deja.com wrote:

> IS SUCH IMPLEMENTATION CORRECT IN RESPECT TO
> SEQUENCE OF WAKEUPS??

Yes. The standard makes no guarantees with respect to timeliness of
wakeups. An implementation could randomly delay wakeups by random
amounts of time and still be fully compliant.

DS

tere...@my-deja.com

unread,

Jan 20, 2001, 6:26:07 PM1/20/01

to

David,

>The standard makes no guarantees with respect to timeliness
>of wakeups. An implementation could randomly delay wakeups
>by random amounts of time and still be fully compliant.

Well, but how would you explain the following:

===========================================================
http://www.opengroup.org/onlinepubs/007908799/xsh/
pthread_cond_signal.html

(The Single UNIX ® Specification, Version 2)

If more than one thread is blocked on a condition variable,
the scheduling policy determines the order in which threads
are unblocked.

The thread(s) that are unblocked contend for the mutex
according to the scheduling policy (if applicable), and as
if each had called pthread_mutex_lock().

==========================================================

My understanding of this text is that the order/sequence of
wakeups is NOT random - it is defined by scheduling policy.

And if I have e.g. 2 threads blocked on the condition
(A:HIGH and B:LOW priority) and I am doing:

a) cond.signal; cond.signal;

- or -

b) cond.broadcast;

the specification guarantees that the result is the same as if

1. Thread A had called mutex_lock() // HI
2. Thread B had called mutex_lock() // LO

And that the final order of wakeups for thread A and B depends
on the scheduling policy too (for condition mutex - another
HIGH priority thread C may delay wakeup of thread A):

1. Thread A had called mutex_lock() // HI; WAS WAITING
2. Thread B had called mutex_lock() // LO; WAS WAITING
.
.
.
3. Thread C had called mutex_lock() // HI; WAS _NOT_ WAITING
4. Mutex owner thread had called mutex_unlock();

.
.
.

1. Thread A returns from _wait; calls mutex_unlock()
2. Thread C returns from mutex_lock(); calls mutex_unlock()
3. Thread B returns from _wait; ...

??

It seems to me that your statement is correct only for
implementations with undefined scheduling policy.

Don't you think that according to the specification,
on the implementations WITH DEFINED scheduling policy
conditions should NOT behave randomly?

David Schwartz

unread,

Jan 20, 2001, 6:46:51 PM1/20/01

to

tere...@my-deja.com wrote:

> Don't you think that according to the specification,
> on the implementations WITH DEFINED scheduling policy
> conditions should NOT behave randomly?

Yes. However, that would be part of the scheduling policy.

DS

Kaz Kylheku

unread,

Jan 20, 2001, 11:57:35 PM1/20/01

to

On Sat, 20 Jan 2001 23:26:07 GMT, tere...@my-deja.com
<tere...@my-deja.com> wrote:
>David,
>
>>The standard makes no guarantees with respect to timeliness
>>of wakeups. An implementation could randomly delay wakeups
>>by random amounts of time and still be fully compliant.
>
>Well, but how would you explain the following:
>
>===========================================================
>http://www.opengroup.org/onlinepubs/007908799/xsh/
>pthread_cond_signal.html
>
>(The Single UNIX ® Specification, Version 2)
>
>If more than one thread is blocked on a condition variable,
>the scheduling policy determines the order in which threads
>are unblocked.

This only means that the thread are woken in some priority order, not
that they will actually execute in that order, and---in
particular---it's not a guarantee that the threads will acquire the
mutex in that order.

>My understanding of this text is that the order/sequence of
>wakeups is NOT random - it is defined by scheduling policy.

But what is a wakeup? In many systems, a wakeup simply means that a
thread is transferred from a suspend queue to a ready queue, not that
it is dispatched!

tere...@my-deja.com

unread,

Jan 22, 2001, 6:00:13 AM1/22/01

to

>>My understanding of this text is that the order/sequence of
>>wakeups is NOT random - it is defined by scheduling policy.

>But what is a wakeup?

Well, let me try that way:

For threads waiting for signals/broadcasts on condition variable,
following will happen (as a result of signal/broadcast):

1. Unblock - "as if each had called pthread_mutex_lock()".
2. Wakeup - thread acquires the mutex and returns from _cond_wait()

>In many systems, a wakeup simply means that a
>thread is transferred from a suspend queue to a ready queue,
>not that it is dispatched!

I am not expecting anything more than that!

But where I see a problem (with the current semaphore based impl) is
the order in which "unblocked" threads appear on that "suspend queue"
(mutex wait queue)... "unblock" means pthread_mutex_lock() call!

It is my understanding that the order of these calls is "guaranteed"
(determined by the scheduling policy _at_the_time_of_calling
signal/broadcast) and SHOULD HAVE NOTHING TO DO WITH PATTERN/TIMINGS
OF FURTHER WAIT/SIGNAL/BROADCAST CALLS (that is what I think _CAN_
happen with the current implementation). It is my understanding that
implementation which does not follow this rule is incorrect with
respect to order/sequence of unblocks as it is defined by specification.

Don't you think so?

In the application which has only condition waiter threads
contending for the mutex it is perfectly legal to expect that the
final order of "wakeups" (who gets the mutex) will be the same as the
order of "unblocks". (After all it is the same scheduling policy,
which determines both!) That is why I was referring to both
as "wakeups".

It seems to me that there is another sequencing problem with the
current implementation (w/o atomic SignalObjectAndWait)..

A signalling thread may end up consuming its own signal if it happens
to go into wait on the same condition while waiter thread(s) are
pre-empted on the way to sem.wait:

================
waiting threads:
================

{ /** Critical Section

inc cond.waiters_count

}

/*
/* Atomic only if using Win32 SignalObjectAndWait
/*
cond.mtx.release

/*** ^^-- A THREAD WHICH DID SIGNAL MAY ACQUIRE THE MUTEX,
/*** GO INTO WAIT ON THE SAME CONDITION AND OVERTAKE
/*** ORIGINAL WAITER(S) CONSUMING ITS OWN SIGNAL!

cond.sem.wait

.
.
.

That will probably result in a spurious wakeup for signalling
thread and complete loss of signal for waiters. Not good as
it seems to me..

Could you please share your view on other perceived "problems"
such as nested broadcast deadlock, spurious wakeups and
(the latest one) lost signals??

David Schwartz

unread,

Jan 22, 2001, 6:24:21 AM1/22/01

to

tere...@my-deja.com wrote:

> In the application which has only condition waiter threads
> contending for the mutex it is perfectly legal to expect that the
> final order of "wakeups" (who gets the mutex) will be the same as the
> order of "unblocks". (After all it is the same scheduling policy,
> which determines both!) That is why I was referring to both
> as "wakeups".

No it's not. A "wakeup" doesn't guarantee any particular scheduling
latency. It's entirely possible for a thread to be "woken up" and then
assigned to a CPU that takes an interrupt. This interrupt could, for
example, stall the thread so long that other threads that are woken up
"later" actually get to run (on another CPU) first. The default
scheduling policy provides no guarantees about when a "woken up" thread
will actually get the CPU.

DS

tere...@my-deja.com

unread,

Jan 22, 2001, 8:15:44 AM1/22/01

to

David,

>No it's not. A "wakeup" doesn't guarantee any particular scheduling
>latency. It's entirely possible for a thread to be "woken up" and then
>assigned to a CPU that takes an interrupt. This interrupt could, for
>example, stall the thread so long that other threads that are woken up
>"later" actually get to run (on another CPU) first. The default
>scheduling policy provides no guarantees about when a "woken up"
>thread will actually get the CPU.

In general, I totally agree with what you are telling me...

But in the context of this discussion

I AM TALKING ABOUT THREADS CONTENDING FOR THE SAME MUTEX
(via phread_cond_wait):

>>...final order of "wakeups" (who gets the mutex)...

*and* under the assumption that there no other (non-waiter) threads
contending for the same mutex in the application at all:

>> In the application which has only condition waiter threads

>> contending for the mutex...

It does not matter how many interrupts or CPUs you get..
All waiting threads should remain blocked until the mutex owner
thread releases the mutex. And the order of who gets it next
is determined by the scheduling policy which in turn -normally-
does not depend on things like interrupts.. (if its is not
a random/undefined scheduling policy)

BTW, could you please also share your view on other perceived

Kaz Kylheku

unread,

Jan 22, 2001, 1:31:58 PM1/22/01

to

On Mon, 22 Jan 2001 11:00:13 GMT, tere...@my-deja.com
<tere...@my-deja.com> wrote:
>
>>>My understanding of this text is that the order/sequence of
>>>wakeups is NOT random - it is defined by scheduling policy.
>
>>But what is a wakeup?
>
>Well, let me try that way:
>
>For threads waiting for signals/broadcasts on condition variable,
>following will happen (as a result of signal/broadcast):
>
>1. Unblock - "as if each had called pthread_mutex_lock()".
>2. Wakeup - thread acquires the mutex and returns from _cond_wait()

That is not a reasonable definition of wakeup. It may take hundreds, or
thousands of machine cycles between the time the thread is actually
dispatched, and the time it actually acquires the mutex. Moreover,
there may be a time window when the thread is made runnable before it
is actually dispatched by the scheduler.

>>In many systems, a wakeup simply means that a
>>thread is transferred from a suspend queue to a ready queue,
>>not that it is dispatched!
>
>I am not expecting anything more than that!

No, you are expecting the mutex to be acquired as part of the wakeup.

David Schwartz

unread,

Jan 22, 2001, 4:31:17 PM1/22/01

to

tere...@my-deja.com wrote:

> It does not matter how many interrupts or CPUs you get..
> All waiting threads should remain blocked until the mutex owner
> thread releases the mutex. And the order of who gets it next
> is determined by the scheduling policy which in turn -normally-
> does not depend on things like interrupts.. (if its is not
> a random/undefined scheduling policy)

The order of who gets it next is dependent upon the scheduling policy.
The standard allows an implementation to fail to define its scheduling
policy. So the implementation imposes no requirements upon who gets it
next unless the implementation chooses to provide them.

> BTW, could you please also share your view on other perceived
> "problems" such as nested broadcast deadlock, spurious wakeups
> and (the latest one) lost signals??

I'm not sure what you mean. The standard allows an implementation to do
almost whatever it likes. In fact, you could implement pthread_cond_wait
by releasing the mutex, sleeping a random amount of time, and then
reacquiring the mutex. Of course, this would be a pretty poor
implementation, but any code that didn't work under that implementation
wouldn't be strictly compliant.

DS

tere...@my-deja.com

unread,

Jan 22, 2001, 6:24:51 PM1/22/01

to

> >>In many systems, a wakeup simply means that a
> >>thread is transferred from a suspend queue to a ready queue,
> >>not that it is dispatched!
> >
> >I am not expecting anything more than that!
>
> No, you are expecting the mutex to be acquired as part of the wakeup.

I do expect the mutex to be acquired as a prerequisite/reason for
wakeup (the same as with plain mutex_lock in place of cond_wait).

Do you suggest that the following statement from the specification:

“If more than one thread is blocked on a condition variable,

the scheduling policy determines the order in which threads

are unblocked.”

does not have any practical meaning because threads may
become pre-empted/interrupted/… => may end up contending
for the mutex in some completely different (undefined) order??

(If so -> my expectation with respect to mutex/wakeups and
waves of waiters/broadcasts is indeed not correct)

Kaz Kylheku

unread,

Jan 22, 2001, 7:18:15 PM1/22/01

to

On Mon, 22 Jan 2001 23:24:51 GMT, tere...@my-deja.com
<tere...@my-deja.com> wrote:
>
>> >>In many systems, a wakeup simply means that a
>> >>thread is transferred from a suspend queue to a ready queue,
>> >>not that it is dispatched!
>> >
>> >I am not expecting anything more than that!
>>
>> No, you are expecting the mutex to be acquired as part of the wakeup.
>
>I do expect the mutex to be acquired as a prerequisite/reason for
>wakeup (the same as with plain mutex_lock in place of cond_wait).
>
>Do you suggest that the following statement from the specification:
>
>“If more than one thread is blocked on a condition variable,
>the scheduling policy determines the order in which threads
>are unblocked.”
>
>does not have any practical meaning because threads may
>become pre-empted/interrupted/… => may end up contending
>for the mutex in some completely different (undefined) order??

No and yes. It still has practical meaning even though the threads may
contend for the mutex in some different order.

What does it mean to contend for a mutex? If there is only one
processor, then only one thread at a time can run. When it wants the
mutex, either the mutex is free in which case the thread gets it, or
the mutex is owned by some thread, which is currently descheduled.
So there is contention; the current thread blocks on the mutex.
POSIX says that when a mutex is released, and two or more threads are
already waiting on that mutex, which one gets the mutex depends on the
scheduling policy. If no thread is waiting on the mutex, it is simply
released, and the first thread to come along will get it.

If a mixture of threads having different priorities are all woken up
(made runnable), then it is unlikely that a low priority thread from
among these will be dispatched before a high priority one; the
scheduler will surely choose a high priority thread to execute.
So some high priority thread will get first crack at the mutex simply
by being eligible to execute sooner. If the mutex is already occupied,
then this thread will block; but when the mutex is subsequently
released, it will go to the highest priority one of the ones that are
already waiting.

So there are really three rules that work together: the order in which
the scheduler dispatches threads, the order in which a mutex is granted
to waiting threads, and the order in which threads wake on a condition
variable. No single one of these rules by itself gives you the behavior
you are looking for.

Dave Butenhof

unread,

Jan 23, 2001, 11:14:53 AM1/23/01

to

tere...@my-deja.com wrote:

> I have questions about two different implemenations of POSIX
> conditions.

There's been a lot of discussion while I was away. I'm going to ignore
it and start afresh. Perhaps that will help.

> Firstly, my understanding of the subject is as follows (please correct
> me if i am wrong)

Certainly.

> for condition wait (calling thread owns the condition mutex)

It's illegal to wait without holding the mutex, so that's a trivial
precondition. Also, there's no such thing as a "condition mutex". There
is a loose and temporary association between a condition variable and
the (single) mutex specified by one or more threads currently waiting on
that condition variable. Many threads may be using the mutex without
using (or even knowing of) the condition variable: the converse is not
true.

> ==================================================
> ==========
> calling thread atomically releases the mutex and is added to the
> condition wait queue.

When the waiter is unblocked, it then locks the associated mutex. Do not
presume that "unblocking" includes mutex ownership. It doesn't.

> for condition signal with condition mutex owned by some thread
> ==================================================
> ============
> one thread from the condition wait queue (which one - depends on
> scheduling policy) is atomically moved from the condition wait queue
> to the condition mutex wait queue.

Here you presume a possible optimization as defined standard behavior.
That's absolutely not the case. Signalling (or broadcasting) a condition
variable need do nothing more than removing the waiter(s) from the block
queue and making them ready to run. Conceptually, they may do so with no
knowledge of the associated mutex. When (and if) they are scheduled to
run at some point in the future, they will resume execution in
pthread_cond_wait(), and attempt to re-acquire the mutex associated with
their call to pthread_cond_wait(). This attempt may in turn cause the
thread to block for the mutex until it is unlocked.

> for condition broadcast with condition mutex owned by some thread
> ==================================================
> ===============
> all threads from the condition wait queue are atomicaly moved from the
> condition wait queue to the condition mutex wait queue.

Again, this is not a reasonable assumption when pursuing a portable
model of POSIX condition wait behavior.

> Questions...
>
> Q1: Is the implementation (condition wait/signal/broadcast calls)
> which:
>
> a) provides atomic release of condition mutex and addition of the
> calling threads (wait call) to the condition wait queue;
>
> b) does not guarantee that thread(s) released from condition wait
> queue by condition signal/broadcast will be added to condition mutex
> wait queue BEFORE thread(s) released by
> subsequent call to signal/broadcast;
>
> conformant with POSIX??

Again, signal/broadcast does nothing more than release the waiter(s)
from the "blocked" state and render them ready to run. If one or more
are able to run immediately, that may or may not prevent some other
thread from issuing a second signal/broadcast before the other unblocked
threads can run.

Therefore, even if the mutex remains locked, the "unblock" order may
have no relation to the final order on the mutex wait queue. This is
especially true on a multiprocessor, where the POSIX priority scheduling
model doesn't work well without excessive synchronization between
processors. (POSIX explictly says that priority scheduling doesn't apply
at all on a multiprocessor, though many implementations attempt to make
it partially apply, to the extent considered practical by the
implementors.)

> Q2: Is the implementation (condition wait/signal/broadcast calls)
> which:
>
> a) does not provide _atomic_ release of condition mutex and addition
> of the calling threads (wait call) to the condition wait queue;

Depends a lot on what you mean by "atomic", a somewhat overloaded term.
If it is possible for another thread to acquire the mutex before the
condition waiter is "on the queue", then the implementation must provide
some transparent mechanism to cause the application illusion of
atomicity. It cannot be possible for a wakeup to be missed because it
happened between the unlock and the block.

> b) does guarantee that thread(s) released from condition wait queue by
> condition signal/broadcast will be added to condition mutex wait queue
> BEFORE thread(s) released by subsequent call to signal/broadcast;

This is irrelevant. Directly moving threads from condition variable wait
queue to mutex waiter queue is a nice optimization, but is not required
by the POSIX model. Any application presuming such behavior is
incorrect, though in practice it would be quite difficult to write an
application that would care.

> c) postpones the move of later threads from the condition wait queue
> to the condition mutex wait queue until all earlier released threads
> _pass_through_ the condition mutex (mutex is owned by last earlier
> released waiter thread);

I think this would be a technical violation of the standard, as it
requires that the condition waiters be "unblocked" (and you're not).
Still, though, it would be difficult for a conforming application to
detect the difference in most cases, and it probably doesn't matter
much. I think it would be a bad idea, in any case, because it would
enforce some "timing windows" that would likely make some badly written
(and nonportable) applications work on that platform, which would
inhibit portability.

> conformant with POSIX??

The magic 8 ball says "Not clear. Try again."

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

tere...@my-deja.com

unread,

Jan 23, 2001, 12:18:55 PM1/23/01

to

>If a mixture of threads having different priorities are all woken up
>(made runnable), then it is unlikely that a low priority thread
>from among these will be dispatched before a high priority one; the
>scheduler will surely choose a high priority thread to execute.

1. The scheduler may choose (I would actually expect it) to
execute BOTH threads (HIGH and LOW) in a situation when
more than 1 CPU available for execution. This would result in
unpredictable behaviour with respect to which thread will
acquire mutex first.

2. The specification does not say "all woken up (made runnable)".

It says:

"If more than one thread is blocked on a condition variable,
the scheduling policy determines the order in which threads are

unblocked. When each thread unblocked as a result of a
pthread_cond_signal() or pthread_cond_broadcast() returns
from its call to pthread_cond_wait() or pthread_cond_timedwait(),
the thread owns the mutex with which it called pthread_cond_wait()
or pthread_cond_timedwait(). The thread(s) that are unblocked

contend for the mutex according to the scheduling policy

(if applicable), and as if each had called pthread_mutex_lock()."

Why does the specification define mutex ownership ("if each had called
pthread_mutex_lock()") being a part of semantics for condition_wait??

According to your interpretation it could have being perfectly
legal (no change in semantics for condition variables) to leave
it up to application programmer... in the same way as I am required
to have the mutex locked before calling _wait I could have
being required to make an explicit call to mutex_lock after
_cond_wait returns (if I am interested in a) re-evaluation of
predicate/detection of spurious wakeups and b) access to shared
variables associated with condition).

Is it really done just in order to save my lines of code??

>So some high priority thread will get first crack at the mutex
>simply by being eligible to execute sooner.

Even with a single CPU you can not claim that (e.g. because of
possible asynchronous processing(s) which may stop a thread
(even HIGH priority thread) on its way to mutex_lock and block
on some other sync. object/put into sleep/…).

According to your interpretation not only the statement from
the specification to which I have referred to earlier does not
have any practical meaning but the following too:

"The pthread_cond_signal() or pthread_cond_broadcast() functions may
be called by a thread whether or not it currently owns the mutex
that threads calling pthread_cond_wait() or pthread_cond_timedwait()
have associated with the condition variable during their waits;
however, if predictable scheduling behaviour is required, then that
mutex is locked by the thread calling pthread_cond_signal() or
pthread_cond_broadcast(). "

It does not matter whether a thread have that mutex locked or not
if implementation does not guarantee (according to your interpretation)
that as a result of signal/broadcast all targeted waiting threads
will end up contending for the mutex in predictable/defined order
"as if each had called pthread_mutex_lock()", "the scheduling policy
determines the order" and that all these threads will still remain
waiting for a wakeup until I release that mutex, again "according
to the scheduling policy" but now on the mutex.

I have no problem with implementation, which does in fact unblock
all waiting threads and let them concurrently run and try to
acquire the mutex resulting in all but one thread blocked again
(waste of CPU cycles for unnecessary scheduling/context switches...)
But such implementation (according to my understanding of
specification) should insure "correct" semantics for condition
variables.

Well, it may be that my understanding of "correct" is wrong
but it is based on specification. And if it is really meant
to be working in the way you suggest.. I would really wish that
specification would not contain statements with no practical
meaning (such as predictable behaviour, scheduling order...).

I would also like to point you to the following paper:

http://www.cs.wustl.edu/~schmidt/win32-cv-1.html

the authors claim that

"Incorrectness....
.
.
.
The following illustrates the sequence of events:

Thread C1 attempts to dequeue and waits on CV non_empty_
Thread C2 attempts to dequeue and waits on CV non_empty_
Thread P1 enqueues 2 messages and broadcasts to CV not_empty_
Thread P1 exits
Thread C1 wakes up from CV not_empty_, dequeues a message and runs
Thread C1 exits
Thread C3 waits on CV not_empty_, immediately dequeues the 2nd message
and runs
Thread C3 exits
Thread C2 is the only thread left and blocks forever since not_empty_
will
never be signaled

In the above case, a thread that was not waiting on the condition
variable when a broadcast occurred was allowed to proceed.
This leads to incorrect semantics for a condition variable. "

A variation on the incorrectness problem described above could
occurs when the same thread P1 does another signal/broadcast. It is
my understanding that if C3 is not entitled (scheduling policy, while
contending for mutex ownership) to overtake C2 in processing
the first signal from P1, implementation should insure the delay of
signal processing on thread C3 until the last thread unblocked by the
first broadcast releases the mutex (make sure that C3 ends up
contending for mutex after C2 and just apply scheduling policy).

Kaz Kylheku

unread,

Jan 23, 2001, 1:02:34 PM1/23/01

to

On Tue, 23 Jan 2001 17:18:55 GMT, tere...@my-deja.com
<tere...@my-deja.com> wrote:
>1. The scheduler may choose (I would actually expect it) to
>execute BOTH threads (HIGH and LOW) in a situation when
>more than 1 CPU available for execution. This would result in
>unpredictable behaviour with respect to which thread will
>acquire mutex first.

That is correct. It is never predictable who will get the mutex first
when it is unlocked. It goes to whichever processor reaches the locking
instruction first. If the critical region protected by the mutex is
short enough, it's possible that both processors will get the mutex in
succession.

>Why does the specification define mutex ownership ("if each had called
>pthread_mutex_lock()") being a part of semantics for condition_wait??

That is how everyone expects mutexes and condition variables to
work. It is the way C. A. R. Hoare invented monitors and conditions
some thirty odd years ago (but with somewhat relaxed semantics).

>According to your interpretation it could have being perfectly
>legal (no change in semantics for condition variables) to leave
>it up to application programmer...

Yes, of course it could have been. Except that would violate just about
everyone's expectation regarding how condition variables work.
But there is no special synchronization property with respect to
acquiring the mutex after waking up. The implementation is not required
to wake up the thread and transfer ownership of the mutex to that
thread in one step. The one crucial synchronization property of
pthread_cond_wait is that in the window between giving up the lock and
blocking, it cannot miss a signal from another thread, provided that
the other thread locks the mutex at least once.

in the same way as I am required
>to have the mutex locked before calling _wait I could have
>being required to make an explicit call to mutex_lock after
>_cond_wait returns (if I am interested in a) re-evaluation of
>predicate/detection of spurious wakeups and b) access to shared
>variables associated with condition).

If you aren't interested in evaluating these conditions, maybe you need
to use some other object instead, like a semaphore? It's rarely correct
or useful to use a condition variable without reevaluating a predicate.

tere...@my-deja.com

unread,

Jan 24, 2001, 8:02:54 AM1/24/01

to

Dave Butenhof <David.B...@compaq.com> wrote:

>There's been a lot of discussion while I was away.
>I'm going to ignore it and start afresh.
>Perhaps that will help.

That certainly did help.

After reading your answers and especially your comments to
"my understanding of the subject is as follows..." and the
latest messages from Kaz Kylheku and David Schwartz
(special thanks to both for your patience / for not giving up
talking to me) now I do realize that my expectation that
"unblocking" includes mutex ownership (move from condition
block queue to mutex waiter queue followed by unblock
from mutex waiter queue) was indeed nothing more than
a variation/possible optimisation of standard behaviour
which require only that one/all waiter threads
a) should become unblocked from waiting for signal/broadcasts and
b) should contend for mutex ownership.

It seems that my mistake was that I started with looking at
the problem from "would be implementor" point of view w/o
having correct understanding of required standard behaviour.
Later (after building an opinion on how "good"
implementation should work) I was trying to justify it
by some statements from specification.

But it would certainly help me to avoid making a mistake
if XSH specification instead of (in addition to) statements
about "order in which threads are unblocked" and
"predictable scheduling behaviour" would just simply
say that the order in which unblocked threads acquire
the mutex is _implementation_specific_ / unpredictable /
my not depend upon "order in which threads are unblocked".

tere...@my-deja.com

unread,

Jan 24, 2001, 12:56:55 PM1/24/01

to

David Schwartz <dav...@webmaster.com> wrote:

> tere...@my-deja.com wrote:
>
>> BTW, could you please also share your view on other perceived
>> "problems" such as nested broadcast deadlock, spurious wakeups
>> and (the latest one) lost signals??
>
>I'm not sure what you mean. The standard allows an implementation
>to do almost whatever it likes. In fact, you could implement
>pthread_cond_wait by releasing the mutex, sleeping a random
>amount of time, and then reacquiring the mutex. Of course,
>this would be a pretty poor implementation, but any code that
>didn't work under that implementation wouldn't be strictly
>compliant.

The implementation you suggested is indeed correct
one (yes, now I see it :). However it requires from
signal/broadcast nothing more than to "{ return 0; }"
That is not the case for pthread-win32 and ACE
implementations. I do think that these implementations
(basically the same implementation) have some serious
problems with wait/signal/broadcast calls. I am looking
for help to clarify whether these problems are real
or not. I think that I can demonstrate what I mean
using one or two small sample programs. But before
doing so I would like to know whether the following
one (about 150 lines) of my sample programs is
"strictly" POSIX compliant and can be used
for meaningful test/demonstration..

regards,
alexander.

=====================================
tennis.c
=====================================
#include <stdio.h>
#include <pthread.h>

enum GAME_STATE {

START_GAME,
PLAYER_A, // Player A plays the ball
PLAYER_B, // Player B plays the ball
GAME_OVER,
ONE_PLAYER_GONE,
BOTH_PLAYERS_GONE

};

enum GAME_STATE eGameState;
pthread_mutex_t mtxGameStateLock;
pthread_cond_t cndGameStateChange;

void*
playerA(
void* pParm
)
{

// For access to game state variable
pthread_mutex_lock( &mtxGameStateLock );

// Game loop
while ( eGameState < GAME_OVER ) {

// Play the ball
printf( "\nPLAYER-A\n" );

// Now it is PLAYER-B's turn
eGameState = PLAYER_B;

// Signal it to PLAYER-B
pthread_cond_signal( &cndGameStateChange );

// Wait until PLAYER-B finishes playing the ball
do {

pthread_cond_wait( &cndGameStateChange,&mtxGameStateLock );

if ( PLAYER_B == eGameState )
printf( "\n----PLAYER-A: SPURIOUS WAKEUP!!!\n" );

} while ( PLAYER_B == eGameState );

}

// PLAYER-A gone
eGameState++;
printf( "\nPLAYER-A GONE\n" );

// No more access to state variable needed
pthread_mutex_unlock( &mtxGameStateLock );

// Signal PLAYER-A gone event
pthread_cond_broadcast( &cndGameStateChange );

return 0;

}

void*
playerB(
void* pParm
)
{

// For access to game state variable
pthread_mutex_lock( &mtxGameStateLock );

// Game loop
while ( eGameState < GAME_OVER ) {

// Play the ball
printf( "\nPLAYER-B\n" );

// Now its PLAYER-A's turn
eGameState = PLAYER_A;

// Signal it to PLAYER-A
pthread_cond_signal( &cndGameStateChange );

// Wait until PLAYER-A finishes playing the ball
do {

pthread_cond_wait( &cndGameStateChange,&mtxGameStateLock );

if ( PLAYER_A == eGameState )
printf( "\n----PLAYER-B: SPURIOUS WAKEUP!!!\n" );

} while ( PLAYER_A == eGameState );

}

// PLAYER-B gone
eGameState++;
printf( "\nPLAYER-B GONE\n" );

// No more access to state variable needed
pthread_mutex_unlock( &mtxGameStateLock );

// Signal PLAYER-B gone event
pthread_cond_broadcast( &cndGameStateChange );

return 0;

}

int
main()
{

pthread_t thid;

pthread_mutex_init( &mtxGameStateLock,0 );
pthread_cond_init( &cndGameStateChange,0 );

// Set initial state
eGameState = START_GAME;

// Create players
pthread_create( &thid,0,playerA,0 );
pthread_create( &thid,0,playerB,0 );

// Give them 5 sec. to play
//Sleep( 5000 ); //Win32
sleep( 5 );

// Set game over state
pthread_mutex_lock( &mtxGameStateLock );
eGameState = GAME_OVER;

// Signal game over
pthread_cond_broadcast( &cndGameStateChange );

// Wait for players to stop
do {

pthread_cond_wait( &cndGameStateChange,&mtxGameStateLock );

} while ( eGameState < BOTH_PLAYERS_GONE );

// Thats it
printf( "\nGAME OVER\n" );
pthread_mutex_unlock( &mtxGameStateLock );
pthread_cond_destroy( &cndGameStateChange );
pthread_mutex_destroy( &mtxGameStateLock );

return 0;

David Schwartz

unread,

Jan 24, 2001, 2:41:16 PM1/24/01

to

tere...@my-deja.com wrote:

> The implementation you suggested is indeed correct
> one (yes, now I see it :). However it requires from
> signal/broadcast nothing more than to "{ return 0; }"
> That is not the case for pthread-win32 and ACE
> implementations. I do think that these implementations
> (basically the same implementation) have some serious
> problems with wait/signal/broadcast calls. I am looking
> for help to clarify whether these problems are real
> or not. I think that I can demonstrate what I mean
> using one or two small sample programs. But before
> doing so I would like to know whether the following
> one (about 150 lines) of my sample programs is
> "strictly" POSIX compliant and can be used
> for meaningful test/demonstration..

It's compliant, but it's buggy. Players A and B use
pthread_cond_signal, but it's possible that they might need to wakeup
two threads (the main thread and the other player). If the
pthread_cond_signal happens to wake the main thread, the other player
will never wake.

DS

tere...@my-deja.com

unread,

Jan 24, 2001, 5:24:56 PM1/24/01

to

David Schwartz <dav...@webmaster.com> wrote:
>
> ...it's buggy.

well, please SHOW me the bug.

> Players A and B use pthread_cond_signal, but it's possible...

It is not fair to claim that it's buggy just saying
"it's possible that they might need to wakeup two threads...".

It would be another program then (with added new features).

The program is intentionally designed to use signals
(in order to demonstrate the problem with _signal call
impl.) in such way that players need ALWAYS AND ONLY
talk to each other or main() <- last player leaving the game,
but NEVER to both (another player and main).

In fact, even final "I am gone" broadcasts can be changed
to signals w/o breaking the program and...
(next level of “optimisation” :)

“eGameState++; unlock(; _signal( “

can be changed to

“if ( BOTH_PLAYERS_GONE == ++eGameState) _signal(; _unlock(”

in order to "optimise" one signal from first player
leaving the game.

Players are not allowed to:

a) leave the game until its over;
b) declare the game to be over.

main() commands (sets the state of game to be over)
players carry out (stop playing).

Broadcast is only needed for final "game over"
signal issued by main().

> It's compliant

That is really good.

Tomorrow (I have to go urgently now) I will try to
demonstrate the lost-signal "problem" of current
pthread-win32 and ACE-(variant w/o SingleObjectAndWait)
implementations: players start suddenly drop their balls :-)
(with no change in source code).

regards,
alexander.

David Schwartz

unread,

Jan 24, 2001, 5:52:22 PM1/24/01

to

tere...@my-deja.com wrote:
>
> David Schwartz <dav...@webmaster.com> wrote:
> >
> > ...it's buggy.
>
> well, please SHOW me the bug.

The bug is that you call 'pthread_cond_signal' which is only allowed if
_any_ woken thread can do the job. In this case, the main thread can't,
so if the 'pthread_cond_signal' wakes the main thread, the program will
stall.

> > Players A and B use pthread_cond_signal, but it's possible...
>
> It is not fair to claim that it's buggy just saying
> "it's possible that they might need to wakeup two threads...".
>
> It would be another program then (with added new features).

Huh?

> The program is intentionally designed to use signals
> (in order to demonstrate the problem with _signal call
> impl.) in such way that players need ALWAYS AND ONLY
> talk to each other or main() <- last player leaving the game,
> but NEVER to both (another player and main).

The problem is that the pthread_cond_signal doesn't know which thread
it 'needs' to wake up. You do, so you need to code that information. The
program is buggy.

> main() commands (sets the state of game to be over)
> players carry out (stop playing).
>
> Broadcast is only needed for final "game over"
> signal issued by main().

Nonesense. You are only guaranteed that pthread_cond_signal will wake
_some_ thread, but not which. If player A's pthread_cond_signal wakes
the main thread, player B will never wakeup.

> > It's compliant
>
> That is really good.

> Tomorrow (I have to go urgently now) I will try to
> demonstrate the lost-signal "problem" of current
> pthread-win32 and ACE-(variant w/o SingleObjectAndWait)
> implementations: players start suddenly drop their balls :-)
> (with no change in source code).

Signals aren't lost, they're going to the main thread, which isn't
coded correctly to handle them. Try this:

// Wait for players to stop
do {

pthread_cond_wait( &cndGameStateChange,&mtxGameStateLock );
printf("Main thread stole a signal\n");

} while ( eGameState < BOTH_PLAYERS_GONE );

I bet everytime you thing a signal is lost, you'll see that printf. The
signal isn't lost, it was stolen by another thread.

DS

tere...@my-deja.com

unread,

Jan 25, 2001, 6:10:45 AM1/25/01

to

David Schwartz <dav...@webmaster.com> wrote:
>tere...@my-deja.com wrote:
>>
>> David Schwartz <dav...@webmaster.com> wrote:
>> >
>> > ...it's buggy.
>>
>> well, please SHOW me the bug.
>
>The bug is that you call 'pthread_cond_signal' which is only
>allowed if _any_ woken thread can do the job. In this case,
>the main thread can't, so if the 'pthread_cond_signal' wakes
>the main thread, the program will stall.

please read the source code again...

main thread can not steal any "ball" signals from players
because main thread is not waiting for any signals while players
are busy playing the game. the main thread is asleep:

.
.
.

// Give them 5 sec. to play

Sleep( 5000 ); //Win32
//sleep( 5 );
.
.
.

>> > Players A and B use pthread_cond_signal, but it's possible...
>>
>> It is not fair to claim that it's buggy just saying
>> "it's possible that they might need to wakeup two threads...".
>>
>> It would be another program then (with added new features).
>
>Huh?

if the program's main thread would need to wait for signals
while players are playing it would be another program.

it could have been made to allow players to leave game
while game is not over, declare game over, replace
old players with new ones,.... these would be added new
features implementation of which would require main thread
to be in listening mode through out the game.

Current program does not support/allow it hence main does not
need to be in wait mode until game is not over (after sleeping
5. sec)

>> The program is intentionally designed to use signals
>> (in order to demonstrate the problem with _signal call
>> impl.) in such way that players need ALWAYS AND ONLY
>> talk to each other or main() <- last player leaving the game,
>> but NEVER to both (another player and main).
>
>The problem is that the pthread_cond_signal doesn't know which thread
>it 'needs' to wake up. You do, so you need to code that information.
>The program is buggy.

well, it might be that the program is in fact buggy.
but you did not show me any bug.

>> main() commands (sets the state of game to be over)
>> players carry out (stop playing).
>>
>> Broadcast is only needed for final "game over"
>> signal issued by main().
>
>Nonesense. You are only guaranteed that pthread_cond_signal
>will wake _some_ thread, but not which. If player
>A's pthread_cond_signal wakes the main thread, player B
>will never wakeup.

in the situation when player A's pthread_cond_signal
wakes the main thread (that can only happen when game is
over), player B will be unblocked by already happened
broadcast from main (which declared the game to be over).

>> > It's compliant
>>
>> That is really good.
>
>> Tomorrow (I have to go urgently now) I will try to
>> demonstrate the lost-signal "problem" of current
>> pthread-win32 and ACE-(variant w/o SingleObjectAndWait)
>> implementations: players start suddenly drop their balls :-)
>> (with no change in source code).
>
>Signals aren't lost, they're going to the main thread,
>which isn't coded correctly to handle them. Try this:
>
> // Wait for players to stop
> do {
>
> pthread_cond_wait( &cndGameStateChange,&mtxGameStateLock );
>printf("Main thread stole a signal\n");
>
> } while ( eGameState < BOTH_PLAYERS_GONE );
>
>I bet everytime you thing a signal is lost, you'll see that printf.
>The signal isn't lost, it was stolen by another thread.

well, you can probably loose your bet.. it was indeed stolen
by "another" thread but not the one you seem to think of.

I think that what actually happens is the following:

H:\SA\UXX\pt\PTHREADS\TESTS>tennis3.exe

PLAYER-A

PLAYER-B

----PLAYER-B: SPURIOUS WAKEUP!!!

PLAYER-A GONE

PLAYER-B GONE

GAME OVER

H:\SA\UXX\pt\PTHREADS\TESTS>

here you can see that PLAYER-B after playing his first
ball (which came via signal from PLAYER-A) just dropped
it down. What happened is that his signal to player A
was consumed as spurious wakeup by himself (player B).

The implementation has a problem:

================
waiting threads:
================

{ /** Critical Section

inc cond.waiters_count

}

/*
/* Atomic only if using Win32 SignalObjectAndWait
/*
cond.mtx.release

/*** ^^-- A THREAD WHICH DID SIGNAL MAY ACQUIRE THE MUTEX,
/*** GO INTO WAIT ON THE SAME CONDITION AND OVERTAKE
/*** ORIGINAL WAITER(S) CONSUMING ITS OWN SIGNAL!

cond.sem.wait

Player-A after playing game's initial ball went into
wait (called _wait) but was pre-empted before reaching
wait semaphore. He was counted as waiter but was not
actually waiting/blocked yet.

===============
signal threads:
===============

{ /** Critical Section

waiters_count = cond.waiters_count

}

if ( waiters_count != 0 )

sem.post 1

endif

Player-B after he received signal/ball from Player A
called _signal. The _signal did see that there was
one waiter blocked on the condition (Player-A) and
released the semaphore.. (but it did not unblock
Player-A because he was not actually blocked).
Player-B thread continued its execution, called _wait,
was counted as second waiter BUT was allowed to slip
through opened semaphore gate (which was opened for
Player-B) and received his own signal. Player B remained
blocked followed by Player A. Deadlock happened which
lasted until main thread came in and said game over.

It seems to me that the implementation fails to
correctly implement the following statement
from specification:

http://www.opengroup.org/
onlinepubs/007908799/xsh/pthread_cond_wait.html

"These functions atomically release mutex and cause
the calling thread to block on the condition variable
cond; atomically here means "atomically with respect
to access by another thread to the mutex and then the
condition variable". That is, if another thread is
able to acquire the mutex after the about-to-block
thread has released it, then a subsequent call to
pthread_cond_signal() or pthread_cond_broadcast()
in that thread behaves as if it were issued after
the about-to-block thread has blocked."

Question: Am I right?

(I produced the program output above by simply
adding “Sleep( 1 )”:

================
waiting threads:
================

{ /** Critical Section

inc cond.waiters_count

}

/*
/* Atomic only if using Win32 SignalObjectAndWait
/*
cond.mtx.release

Sleep( 1 ); // Win32

/*** ^^-- A THREAD WHICH DID SIGNAL MAY ACQUIRE THE MUTEX,
/*** GO INTO WAIT ON THE SAME CONDITION AND OVERTAKE
/*** ORIGINAL WAITER(S) CONSUMING ITS OWN SIGNAL!

cond.sem.wait

to the source code of pthread-win32 implementation:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/pthreads/
condvar.c?rev=1.36&content-type=text/
x-cvsweb-markup&cvsroot=pthreads-win32

/*
* We keep the lock held just long enough to increment the count of
* waiters by one (above).
* Note that we can't keep it held across the
* call to sem_wait since that will deadlock other calls
* to pthread_cond_signal
*/
cleanup_args.mutexPtr = mutex;
cleanup_args.cv = cv;
cleanup_args.resultPtr = &result;

pthread_cleanup_push (ptw32_cond_wait_cleanup, (void *)
&cleanup_args);

if ((result = pthread_mutex_unlock (mutex)) == 0)
{
Sleep( 1 ); // @AT

/*
* Wait to be awakened by
* pthread_cond_signal, or
* pthread_cond_broadcast, or
* a timeout
*
* Note:
* ptw32_sem_timedwait is a cancelation point,
* hence providing the
* mechanism for making pthread_cond_wait a cancelation
* point. We use the cleanup mechanism to ensure we
* re-lock the mutex and decrement the waiters count
* if we are canceled.
*/
if (ptw32_sem_timedwait (&(cv->sema), abstime) == -1)
{
result = errno;
}
}

pthread_cleanup_pop (1); /* Always cleanup */

BTW, on my system (2 CPUs) I can manage to get
signals lost even without any source code modification
if I run the tennis program many times in different
shell sessions.

Dave Butenhof

unread,

Jan 25, 2001, 8:26:07 AM1/25/01

to

tere...@my-deja.com wrote:

> But it would certainly help me to avoid making a mistake
> if XSH specification instead of (in addition to) statements
> about "order in which threads are unblocked" and
> "predictable scheduling behaviour" would just simply
> say that the order in which unblocked threads acquire
> the mutex is _implementation_specific_ / unpredictable /
> my not depend upon "order in which threads are unblocked".

POSIX and XSH say that "The thread(s) that are unblocked shall contend
for the mutex according to the scheduling policy (if applicable) and as
if each had called pthread_mutex_lock()." Obviously, that means they may
need to run (and "call pthread_mutex_lock()") in order to contend, which
means they contend not only with other unblocked condition waiters but
with any other threads trying to lock the mutex.

I understand why you might think this is unclear, but part of the cause
lies in the origins of 1003.1c-1995 (the amendment that put threads into
1003.1-1996). It was developed under the sponsorship of the REALTIME
working group. The focus was on the predictable behavior of SCHED_FIFO
and SCHED_RR threads on a uniprocessor system. Under such constraints,
"wakeup order" and "mutex acquisition order" are predictable from the
relative priorities of the threads. If you unblock a thread of higher
priority, it will immediately preempt the running thread. If it contends
for the mutex (because the signal/broadcast was done while holding the
mutex), the higher priority thread will block until the signaller
unlocks the mutex, at which time the highest priority waiter will
preempt it and immediately acquire the mutex. None of this is
predictable without realtime scheduling policies, but the standard
deliberately made only the blanket statement that scheduling order under
SCHED_OTHER is unspecified. (What else COULD it say?) Under
multiprocessor scheduling, you can't say anything useful except the
order in which threads are unblocked.

The point is that "implementation specific/unpredictable" would not be
correct. It is specified by the standard and absolutely predictable...
but only when you're using SCHED_FIFO or SCHED_RR on a uniprocessor.
Technically, it means when you're using it on a system with a POSIX
"allocation domain" of 1. While that should be, and was intended to be,
essentially the same as "uniprocessor", there are some systems like the
Solaris 2-level scheduler that don't support an allocation domain of 1
even on a uniprocessor. (The allocation domain in that case is the
number of LWPs active in the process, which Solaris may increase or
decrease at any time without your knowledge or control.)

David Schwartz

unread,

Jan 25, 2001, 11:37:59 AM1/25/01

to

tere...@my-deja.com wrote:

> well, it might be that the program is in fact buggy.
> but you did not show me any bug.

You're right. I was close but not dead on. I was correct, however, that
the code is buggy because it uses 'pthread_cond_signal' even though not
any thread waiting on the condition variable can do the job. I was wrong
in which thread could be waiting on the cv but unable to do the job.

DS

tere...@my-deja.com

unread,

Jan 25, 2001, 1:45:09 PM1/25/01

to

Dave Butenhof <David.B...@compaq.com> wrote:
>If you unblock a thread of higher priority,
>it will immediately preempt the running thread.

I really did not know that. I was thinking that
higher priority thread will simply resume
execution next - after the current thread gets
pre-empted "sometime" (next re-sched. interval,
its time slice elapses,...). I really did not know
that "sometime" == immediately.

Not knowing it, I was thinking that signalling
thread (w/o optimisation) could be able to "unblock"
multiple waiter threads and even release the mutex
before end of its execution time leaving free mutex
to all previously unblocked waiters ready to start their
race for the mutex... That statement
about predictable scheduling with mutex locked was
the most mysterious for me. Having that optimisation
of standard behaviour in mind, I was able to explain
it to myself as "they will all together/in order
nicely move to mutex and the process starts when
mutex released". But after your explanation of what
standard behaviour really meant to be and still
not knowing about "sometime" I was struggling
with that statement again (for uniprocessors)
and wanted to ask it here. Well, the answer from
you came even before the question was asked
(it seems that my question about prediction was
nicely predicted by you :) That is really
impressive. Thanks!

tere...@my-deja.com

unread,

Jan 26, 2001, 6:40:11 AM1/26/01

to

Okay, lets change 'pthread_cond_signal' to 'pthread_cond_broadcast'
but also add some noise from main() right before declaring the game
to be over (I need it in order to demonstrate another problem of
pthread-win32/ACE implementations - broadcast deadlock)...

=====================================
tennisb.c

enum GAME_STATE {

};

pthread_cond_broadcast( &cndGameStateChange );

// Wait until PLAYER-B finishes playing the ball
do {

pthread_cond_wait( &cndGameStateChange,&mtxGameStateLock );

}

return 0;

}

pthread_cond_broadcast( &cndGameStateChange );

// Wait until PLAYER-A finishes playing the ball
do {

pthread_cond_wait( &cndGameStateChange,&mtxGameStateLock );

if ( PLAYER_A == eGameState )
printf( "\n----PLAYER-B: SPURIOUS WAKEUP!!!\n" );

} while ( PLAYER_A == eGameState );

}

// PLAYER-B gone
eGameState++;
printf( "\nPLAYER-B GONE\n" );

// No more access to state variable needed
pthread_mutex_unlock( &mtxGameStateLock );

// Signal PLAYER-B gone event
pthread_cond_broadcast( &cndGameStateChange );

return 0;

}

int
main()
{

int i;

pthread_t thid;

pthread_mutex_init( &mtxGameStateLock,0 );
pthread_cond_init( &cndGameStateChange,0 );

// Set initial state
eGameState = START_GAME;

// Create players
pthread_create( &thid,0,playerA,0 );
pthread_create( &thid,0,playerB,0 );

// Give them 1 sec. to play
Sleep( 1000 ); //Win32
//sleep( 1 );

// Make some noise
pthread_mutex_lock( &mtxGameStateLock );
printf( "\n---Noise ON...\n" );
pthread_mutex_unlock( &mtxGameStateLock );
for ( i = 0; i < 100000; i++ )
pthread_cond_broadcast( &cndGameStateChange );
printf( "\n---Noise OFF\n" );

// Set game over state
pthread_mutex_lock( &mtxGameStateLock );
eGameState = GAME_OVER;

printf( "\n---Stopping the game...\n" );

// Signal game over
pthread_cond_broadcast( &cndGameStateChange );

// Wait for players to stop
do {

pthread_cond_wait( &cndGameStateChange,&mtxGameStateLock );

} while ( eGameState < BOTH_PLAYERS_GONE );

// Thats it

printf( "\nGAME OVER\n" );
pthread_mutex_unlock( &mtxGameStateLock );
pthread_cond_destroy( &cndGameStateChange );
pthread_mutex_destroy( &mtxGameStateLock );

return 0;

}

=====================================
end of tennisb.c
=====================================

It is my understanding of POSIX conditions,
that on correct implementation added noise
in form of unnecessary broadcasts from main,
should not break the tennis program. The
only 'side effect' of added noise on correct
implementation would be 'spurious wakeups' of
players (in fact they are not spurious,
players just see them as spurious) unblocked,
not by another player but by main before
another player had a chance to acquire the
mutex and change the game state variable:
.
.
.

PLAYER-B

PLAYER-A

---Noise ON...

PLAYER-B

PLAYER-A

.
.
.

PLAYER-B

PLAYER-A

----PLAYER-A: SPURIOUS WAKEUP!!!

PLAYER-B

PLAYER-A

---Noise OFF

PLAYER-B

---Stopping the game...

PLAYER-A GONE

PLAYER-B GONE

GAME OVER

H:\SA\UXX\pt\PTHREADS\TESTS>

On pthread-win32/ACE implementations the
program could stall:

.
.
.

PLAYER-A

PLAYER-B

PLAYER-A

PLAYER-B

PLAYER-A

PLAYER-B

PLAYER-A

PLAYER-B

---Noise ON...

PLAYER-A

---Noise OFF
^C

H:\SA\UXX\pt\PTHREADS\TESTS>

The implementation has problems:

================
waiting threads:
================

{ /** Critical Section

inc cond.waiters_count

}

/*
/* Atomic only if using Win32 SignalObjectAndWait
/*
cond.mtx.release

cond.sem.wait

/*** ^^-- WAITER CAN BE PREEMPTED AFTER BEING UNBLOCKED...

{ /** Critical Section

dec cond.waiters_count

/*** ^^- ...AND BEFORE DECREMENTING THE COUNT (1)

last_waiter = ( cond.was_broadcast &&
cond.waiters_count == 0 )

if ( last_waiter )

cond.was_broadcast = FALSE

endif

}

if ( last_waiter )

/*

/* Atomic only if using Win32 SignalObjectAndWait
/*

cond.auto_reset_event_or_sem.post /* Event for Win32
cond.mtx.acquire

/*** ^^-- ...AND BEFORE CALL TO mtx.acquire (2)

/*** ^^-- NESTED BROADCASTS RESULT IN A DEADLOCK

else

cond.mtx.acquire

/*** ^^-- ...AND BEFORE CALL TO mtx.acquire (3)

endif

==================
broadcast threads:
==================

{ /** Critical Section

waiters_count = cond.waiters_count

if ( waiters_count != 0 )

cond.was_broadcast = TRUE

endif

}

if ( waiters_count != 0 )

cond.sem.post waiters_count

/*** ^^^^^--- SPURIOUS WAKEUPS DUE TO (1)

cond.auto_reset_event_or_sem.wait /* Event for Win32

/*** ^^^^^--- DEADLOCK FOR FURTHER BROADCASTS IF THEY
HAPPEN TO GO INTO WAIT WHILE PREVIOUS
BROADCAST IS STILL IN PROGRESS/WAITING

endif

a) cond.waiters_count does not accurately reflect
number of waiters blocked on semaphore - that could
result (in the time window when counter is not accurate)
in spurios wakeups organised by subsequent _signals
and _broadcasts. From standard compliance point of view
that is OK but that could be a real problem from
performance/efficiency point of view.

b) If subsequent broadcast happen to go into wait on
cond.auto_reset_event_or_sem before previous
broadcast was unblocked from cond.auto_reset_event_or_sem
by its last waiter, one of two blocked threads will
remain blocked because last_waiter processing code
fails to unblock both threads.

In the situation with tennisb.c the Player-B was put
in a deadlock by noise (broadcast) coming from main
thread. And since Player-B holds the game state
mutex when it calls broadcast, the whole program
stalled: Player-A was deadlocked on mutex and
main thread after finishing with producing the noise
was deadlocked on mutex too (needed to declare the
game over)

(I produced the program output above by simply
adding “Sleep( 1 )”:

==================
broadcast threads:
==================

{ /** Critical Section

waiters_count = cond.waiters_count

if ( waiters_count != 0 )

cond.was_broadcast = TRUE

endif

}

if ( waiters_count != 0 )

Sleep( 1 ); //Win32

cond.sem.post waiters_count

/*** ^^^^^--- SPURIOUS WAKEUPS DUE TO (1)

cond.auto_reset_event_or_sem.wait /* Event for Win32

/*** ^^^^^--- DEADLOCK FOR FURTHER BROADCASTS IF THEY
HAPPEN TO GO INTO WAIT WHILE PREVIOUS
BROADCAST IS STILL IN PROGRESS/WAITING

endif

to the source code of pthread-win32 implementation:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/pthreads/
condvar.c?rev=1.36&content-type=text/
x-cvsweb-markup&cvsroot=pthreads-win32

if (wereWaiters)
{
/*
* Wake up all waiters
*/

Sleep( 1 ); //@AT

#ifdef NEED_SEM

result = (ptw32_increase_semaphore( &cv->sema, cv->waiters )
? 0
: EINVAL);

#else /* NEED_SEM */

result = (ReleaseSemaphore( cv->sema, cv->waiters, NULL )
? 0
: EINVAL);

#endif /* NEED_SEM */

}

(void) pthread_mutex_unlock(&(cv->waitersLock));

if (wereWaiters && result == 0)
{
/*
* Wait for all the awakened threads to acquire their part of
* the counting semaphore
*/

if (WaitForSingleObject (cv->waitersDone, INFINITE)
== WAIT_OBJECT_0)
{
result = 0;
}
else
{
result = EINVAL;
}

}

return (result);

}

BTW, on my system (2 CPUs) I can manage to get

the program stalled even without any source code
modification if I run the tennisb program many

times in different shell sessions.

Now (after Dave Butenhof, David Schwartz, and
Kaz Kylheku helped me to understand the standard
behaviour), it seems to me that the authors of
"Strategies for Implementing POSIX Condition
Variables on Win32"

http://www.cs.wustl.edu/~schmidt/win32-cv-1.html

are right in claiming that SetEvent strategy is
correct but unfair:

"Although the Pthreads specification does not mandate
this degree of fairness,...", "Of course, application
developers should not rely on the fairness semantics of
pthread_cond_broadcast..."

but unfortunately do not mention the real cause of that
problem: implementation may result in spurious
wakeup(s) for some waiter(s) unblocked multiple times
after a single broadcast with no subsequent signals or
broadcasts.

Right?

It is also somewhat unclear to me which fairness problem
was meant to be fixed by blocking the broadcasting
thread until the unblock of last waiter takes place
but that 'feature' itself does unfortunately create
another problem (deadlock).

Does anybody know some other implementations of
POSIX conditions based on semaphores?

I need it for the project I am currently involved
(PROCESS_SHARED conditions which would work on
Win32 and Linux - I have some conditions based code
such as specialized readwrite locks, queues, other peer
protocols... which would be difficult to rewrite for
IPC/Win32 semaphores).

tere...@my-deja.com

unread,

Jan 30, 2001, 5:40:52 PM1/30/01

to

Dave Butenhof <David.B...@compaq.com> wrote:

> tere...@my-deja.com wrote:
>> Q2: Is the implementation (condition wait/signal/broadcast calls)
>> which:
>>
>> a) does not provide _atomic_ release of condition mutex and addition
>> of the calling threads (wait call) to the condition wait queue;
>
>Depends a lot on what you mean by "atomic", a somewhat overloaded
>term. If it is possible for another thread to acquire the mutex
>before the condition waiter is "on the queue", then the
>implementation must provide some transparent mechanism to cause
>the application illusion of atomicity. It cannot be possible for
>a wakeup to be missed because it happened between the unlock and
>the block.

What I was trying to say was that the implementation does not
use 'atomic' Win32 SignalObjectAndWait but even so attempts
to insure correct POSIX 'atomic' semantics with respect to other
threads and subsequent signal/broadcast/wait. Please see the code
below (I am trying to solve the problem by making signal/broadcast
behave as an atomic operation with respect to other threads and
subsequent signal/broadcast/wait).

Is it ('atomic' behaviour as provided by the implementation) OK?

>> b) does guarantee that thread(s) released from condition wait
>> queue by condition signal/broadcast will be added to condition
>> mutex wait queue BEFORE thread(s) released by subsequent call
>> to signal/broadcast;
>
>This is irrelevant. Directly moving threads from condition
>variable wait queue to mutex waiter queue is a >nice optimization,
>but is not required by the POSIX model. Any application presuming
>such behavior is incorrect, though in practice it would be quite
>difficult to write an application that would care.
>
>> c) postpones the move of later threads from the condition wait queue
>> to the condition mutex wait queue until all earlier released threads
>> _pass_through_ the condition mutex (mutex is owned by last earlier
>> released waiter thread);
>
>I think this would be a technical violation of the standard, as it
>requires that the condition waiters be "unblocked" (and you're not).
>Still, though, it would be difficult for a conforming application to
>detect the difference in most cases, and it probably doesn't matter
>much. I think it would be a bad idea, in any case, because it would
>enforce some "timing windows" that would likely make some badly
>written (and nonportable) applications work on that platform, which
>would inhibit portability.

the only reason I was thinking about c) was a possible way how to
insure b) (pool of semaphores - sort of multiple wait barriers for
subsequent waves of unblocked waiters). But since b) is not required
by the POSIX model, there is absolutely no need for c). As for
technical violation.. well, after looking at the implementation
suggested by David Schwartz (wait: m.unlock();sleep(random());m.lock())
I am not so sure at all that anything can violate the standard unless
it results in lost signals.

>> conformant with POSIX??
>
>The magic 8 ball says "Not clear. Try again."

Ok, I am trying again...

here is the implementation (copy&paste from the pthread-win32 patch
I am making/playing with now). This code, as I hope, correctly
implements POSIX conditions using 3 counts, 2 semaphores and (only
needed for _timedwait and thread cancellation) 1 mutex (or a third
IPC sem). I need semaphore based implementation for the project,
which requires PROCESS_SHARED conditions for both Win32 and Linux.
For that project (no timedwaits and no cancellation) I plan to use
only a subset of synchronisations shown below (Sync.LEVEL-2 will be
completely omitted).

Q1: Is it correct or incorrect/buggy implementation?
Q2: Is there anything to improve or make completely different, why?

thanks a lot in advance.

regards,
alexander.
=======================================================================

struct pthread_cond_t_ {
long nWaitersBlocked; /* Number of threads
blocked */
long nWaitersUnblocked; /* Number of threads
unblocked */
long nWaitersToUnblock; /* Number of threads to
unblock */
sem_t semBlockQueue; /* Queue up threads waiting for
the */
/* condition to become
signalled */
sem_t semBlockLock; /* Semaphore that guards access
to */
/* | waiters blocked count/block
queue */
/* +-> Mandatory Sync.LEVEL-
1 */
pthread_mutex_t mtxUnblockLock; /* Mutex that guards access
to */
/* | waiters (to)unblock(ed)
counts */
/* +-> Optional* Sync.LEVEL-
2 */
}; /* Opt*) for _timedwait and
cancellation*/

int
pthread_cond_init (pthread_cond_t * cond, const pthread_condattr_t *
attr)
{
.
.
.
cv->nWaitersBlocked = 0;
cv->nWaitersUnblocked = 0;
cv->nWaitersToUnblock = 0;

if (sem_init (&(cv->semBlockLock), 0, 1) != 0)
{
goto FAIL0;
}

if (sem_init (&(cv->semBlockQueue), 0, 0) != 0)
{
goto FAIL1;
}

if (pthread_mutex_init (&(cv->mtxUnblockLock), 0) != 0)
{
goto FAIL2;
}

result = 0;

goto DONE;
.
.
.
}

int
pthread_cond_destroy (pthread_cond_t * cond)
{
.
.
.
/*
* Synchronize access to waiters blocked count (LEVEL-1)
*/
if (sem_wait(&(cv->semBlockLock)) != 0)
{
return errno;
}

/*
* Synchronize access to waiters (to)unblock(ed) counts (LEVEL-2)
*/
if ((result = pthread_mutex_lock(&(cv->mtxUnblockLock))) != 0)
{
(void) sem_post(&(cv->semBlockLock));
return result;
}

/*
* Check whether cv is still busy (still has waiters blocked)
*/
if (cv->nWaitersBlocked - cv->nWaitersUnblocked > 0)
{
(void) sem_post(&(cv->semBlockLock));
(void) pthread_mutex_unlock(&(cv->mtxUnblockLock));
return EBUSY;
}

/*
* Now it is safe to destroy
*/
(void) sem_destroy (&(cv->semBlockLock));
(void) sem_destroy (&(cv->semBlockQueue));
(void) pthread_mutex_unlock (&(cv->mtxUnblockLock));
(void) pthread_mutex_destroy (&(cv->mtxUnblockLock));
.
.
.
}

/*
* Arguments for cond_wait_cleanup, since we can only pass a
* single void * to it.
*/
typedef struct {
pthread_mutex_t * mutexPtr;
pthread_cond_t cv;
int * resultPtr;
} ptw32_cond_wait_cleanup_args_t;

static void
ptw32_cond_wait_cleanup(void * args)
{
ptw32_cond_wait_cleanup_args_t * cleanup_args =
(ptw32_cond_wait_cleanup_args_t *) args;
pthread_cond_t cv = cleanup_args->cv;
int * resultPtr = cleanup_args->resultPtr;
int eLastSignal; /* enum: 1=yes 0=no -1=cancelled/timedout w/o signal
(s) */
int result;

/*
* Whether we got here as a result of signal/broadcast or because of
* timeout on wait or thread cancellation we indicate that we are no
* longer waiting. The waiter is responsible for adjusting waiters
* (to)unblock(ed) counts (protected by unblock lock).
* Unblock lock/Sync.LEVEL-2 supports _timedwait and cancellation.
*/
if ((result = pthread_mutex_lock(&(cv->mtxUnblockLock))) != 0)
{
*resultPtr = result;
return;
}

cv->nWaitersUnblocked++;

eLastSignal = (cv->nWaitersToUnblock == 0) ?
-1 : (--cv->nWaitersToUnblock == 0);

/*
* No more LEVEL-2 access to waiters (to)unblock(ed) counts needed
*/
if ((result = pthread_mutex_unlock(&(cv->mtxUnblockLock))) != 0)
{
*resultPtr = result;
return;
}

/*
* If last signal...
*/
if (eLastSignal == 1)
{
/*
* ...it means that we have end of 'atomic' signal/broadcast
*/
if (sem_post(&(cv->semBlockLock)) != 0)
{
*resultPtr = errno;
return;
}
}
/*
* If not last signal and not timed out/cancelled wait w/o signal...
*/
else if (eLastSignal == 0)
{
/*
* ...it means that next waiter can go through semaphore
*/
if (sem_post(&(cv->semBlockQueue)) != 0)
{
*resultPtr = errno;
return;
}
}

/*
* XSH: Upon successful return, the mutex has been locked and is
owned
* by the calling thread
*/
if ((result = pthread_mutex_lock(cleanup_args->mutexPtr)) != 0)
{
*resultPtr = result;
}

} /* ptw32_cond_wait_cleanup */

static int
ptw32_cond_timedwait (pthread_cond_t * cond,
pthread_mutex_t * mutex,
const struct timespec *abstime)
{
int result = 0;
pthread_cond_t cv;
ptw32_cond_wait_cleanup_args_t cleanup_args;
.
.
.
/*
* Synchronize access to waiters blocked count (LEVEL-1)
*/
if (sem_wait(&(cv->semBlockLock)) != 0)
{
return errno;
}

cv->nWaitersBlocked++;

/*
* Thats it. Counted means waiting, no more access needed
*/
if (sem_post(&(cv->semBlockLock)) != 0)
{
return errno;
}

/*
* Setup this waiter cleanup handler

*/
cleanup_args.mutexPtr = mutex;
cleanup_args.cv = cv;
cleanup_args.resultPtr = &result;

pthread_cleanup_push (ptw32_cond_wait_cleanup, (void *)
&cleanup_args);

/*
* Now we can release 'mutex' and...
*/

if ((result = pthread_mutex_unlock (mutex)) == 0)
{

/*
* ...wait to be awakened by

* pthread_cond_signal, or
* pthread_cond_broadcast, or

* timeout, or
* thread cancellation
*
* Note:
*
* ptw32_sem_timedwait is a cancellation point,
* hence providing the mechanism for making
* pthread_cond_wait a cancellation point.
* We use the cleanup mechanism to ensure we
* re-lock the mutex and adjust (to)unblock(ed) waiters
* counts if we are cancelled, timed out or signalled.
*/
if (ptw32_sem_timedwait (&(cv->semBlockQueue), abstime) != 0)
{
result = errno;
}
}

/*
* Always cleanup
*/
pthread_cleanup_pop (1);

/*
* "result" can be modified by the cleanup handler.
*/
return (result);

} /* ptw32_cond_timedwait */

static int
ptw32_cond_unblock (pthread_cond_t * cond,
int unblockAll)
{
int result;
pthread_cond_t cv;
.
.
.
/*
* Synchronize access to waiters blocked count (LEVEL-1)
*/
if (sem_wait(&(cv->semBlockLock)) != 0)
{
return errno;
}

/*
* Synchronize access to waiters (to)unblock(ed) counts (LEVEL-2)
* This sync.level supports _timedwait and cancellation
*/
if ((result = pthread_mutex_lock(&(cv->mtxUnblockLock))) != 0)
{
return result;
}

/*
* Adjust waiters blocked and unblocked counts (collect garbage)
*/
if (cv->nWaitersUnblocked != 0)
{
cv->nWaitersBlocked -= cv->nWaitersUnblocked;
cv->nWaitersUnblocked = 0;
}

/*
* If (after adjustment) there are still some waiters blocked
counted...
*/
if ( cv->nWaitersBlocked > 0)
{
/*
* We will unblock first waiter and leave semBlockLock/LEVEL-1
locked
* LEVEL-1 access is left disabled until last signal/unblock
completes
*/
cv->nWaitersToUnblock = (unblockAll) ? cv->nWaitersBlocked : 1;

/*
* No more LEVEL-2 access to waiters (to)unblock(ed) counts needed
* This sync.level supports _timedwait and cancellation
*/
if ((result = pthread_mutex_unlock(&(cv->mtxUnblockLock))) != 0)
{
return result;
}

/*
* Now, with LEVEL-2 lock released let first waiter go through
semaphore
*/
if (sem_post(&(cv->semBlockQueue)) != 0)
{
return errno;
}
}

/*
* No waiter blocked - no more LEVEL-1 access to blocked count
needed...
*/
else if (sem_post(&(cv->semBlockLock)) != 0)
{
return errno;
}
/*
* ...and no more LEVEL-2 access to waiters (to)unblock(ed) counts
needed too
* This sync.level supports _timedwait and cancellation
*/
else
{
result = pthread_mutex_unlock(&(cv->mtxUnblockLock));
}

return(result);

} /* ptw32_cond_unblock */

int
pthread_cond_wait (pthread_cond_t * cond,
pthread_mutex_t * mutex)
{
/* The NULL abstime arg means INFINITE waiting. */
return(ptw32_cond_timedwait(cond, mutex, NULL));
} /* pthread_cond_wait */

int
pthread_cond_timedwait (pthread_cond_t * cond,
pthread_mutex_t * mutex,
const struct timespec *abstime)
{
if (abstime == NULL)
{
return EINVAL;
}

return(ptw32_cond_timedwait(cond, mutex, abstime));
} /* pthread_cond_timedwait */

int
pthread_cond_signal (pthread_cond_t * cond)
{
/* The '0'(FALSE) unblockAll arg means unblock ONE waiter. */
return(ptw32_cond_unblock(cond, 0));
} /* pthread_cond_signal */

int
pthread_cond_broadcast (pthread_cond_t * cond)
{
/* The '1'(TRUE) unblockAll arg means unblock ALL waiters. */
return(ptw32_cond_unblock(cond, 1));
} /* pthread_cond_broadcast */

Dave Butenhof

unread,

Feb 1, 2001, 9:35:14 AM2/1/01

to

tere...@my-deja.com wrote:

> Dave Butenhof <David.B...@compaq.com> wrote:
> > tere...@my-deja.com wrote:
> >> Q2: Is the implementation (condition wait/signal/broadcast calls)
> >> which:
> >>
> >> a) does not provide _atomic_ release of condition mutex and addition
> >> of the calling threads (wait call) to the condition wait queue;
> >
> >Depends a lot on what you mean by "atomic", a somewhat overloaded
> >term. If it is possible for another thread to acquire the mutex
> >before the condition waiter is "on the queue", then the
> >implementation must provide some transparent mechanism to cause
> >the application illusion of atomicity. It cannot be possible for
> >a wakeup to be missed because it happened between the unlock and
> >the block.
>
> What I was trying to say was that the implementation does not
> use 'atomic' Win32 SignalObjectAndWait but even so attempts
> to insure correct POSIX 'atomic' semantics with respect to other
> threads and subsequent signal/broadcast/wait. Please see the code
> below (I am trying to solve the problem by making signal/broadcast
> behave as an atomic operation with respect to other threads and
> subsequent signal/broadcast/wait).

That sounds like a reasonable approach, if your implementation matches your
intent.

> Is it ('atomic' behaviour as provided by the implementation) OK?

I'm afraid I really can't spare time to review and analyze your code in
detail. Sorry, but I do have a job, and I spend more than enough time in
this newsgroup already just reading and writing text. It's not that I'm
unwilling, or even uninterested... I just have too much else to do.

> the only reason I was thinking about c) was a possible way how to
> insure b) (pool of semaphores - sort of multiple wait barriers for
> subsequent waves of unblocked waiters). But since b) is not required
> by the POSIX model, there is absolutely no need for c). As for
> technical violation.. well, after looking at the implementation
> suggested by David Schwartz (wait: m.unlock();sleep(random());m.lock())
> I am not so sure at all that anything can violate the standard unless
> it results in lost signals.

There are all sorts of bad and undesirable behaviors aside from lost
signals, but you're not far off the mark. Condition variables are designed
to be lightweight and simple. While an implementation that wakes up too
often isn't good, it can be functionally correct.

Alexander Terekhov

unread,

Feb 20, 2001, 8:12:10 AM2/20/01

to

tere...@my-deja.com wrote:
>...Ok, I am trying again...
>
> here is the implementation...
>...

> Q1: Is it correct or incorrect/buggy implementation?
> Q2: Is there anything to improve or make completely different, why?

Louis Thomas <lth...@arbitrade.com> have provided me with
his analysis of different implementations of posix condition
variables using semaphores (events) including my own
implementation and we had a nice discussion on the subject..
I would like to clarify the following question here:

Q) Is the race condition in the algorithm(s) below really "harmless"?
(with respect to waiter's visibility, which i think it is)

thanks,

regards,
alexander.

---------- Algorithm 7a / IMPL_SEM,UNBLOCK_STRATEGY == UNBLOCK_ALL ------
given:
semBlockLock - bin.semaphore
semBlockQueue - semaphore
mtxExternal - mutex or CS
mtxUnblockLock - mutex or CS
nWaitersGone - int
nWaitersBlocked - int
nWaitersToUnblock - int

wait( timeout ) {

[auto: register int result ] // error checking omitted
[auto: register int nSignalsWasLeft ]

sem_wait( semBlockLock );
nWaitersBlocked++;
sem_post( semBlockLock );

unlock( mtxExternal );
bTimedOut = sem_wait( semBlockQueue,timeout );

lock( mtxUnblockLock );
if ( 0 != (nSignalsWasLeft = nWaitersToUnblock) ) {
if ( bTimeout ) { // timeout (or canceled)
if ( 0 != nWaitersBlocked ) {
nWaitersBlocked--;
}
else {
sem_wait( semBlockQueue ); // better now than spurious
later
}
}
if ( 0 == --nWaitersToUnblock &&
0 != nWaitersBlocked ) {
sem_post( semBlockLock ); // open the gate
nSignalsWasLeft = 0; // do not open the gate below
again
}
}
else if ( INT_MAX/2 == ++nWaitersGone ) { // timeout/canceled or spurious
semaphore :-)
sem_wait( semBlockLock );
nWaitersBlocked -= nWaitersGone; // something is going on here -
test of timeouts? :-)
sem_post( semBlockLock );
nWaitersGone = 0;
}
unlock( mtxUnblockLock );

if ( 1 == nSignalsWasLeft ) {
sem_post( semBlockLock ); // open the gate
}

lock( mtxExternal );

return ( bTimedOut ) ? ETIMEOUT : 0;
}

signal(bAll) {

[auto: register int result ]
[auto: register int nSignalsToIssue]

lock( mtxUnblockLock );

if ( 0 != nWaitersToUnblock ) { // the gate is closed!!!
if ( 0 == nWaitersBlocked ) { // NO-OP
return unlock( mtxUnblockLock );
}
if (bAll) {
nWaitersToUnblock += nSignalsToIssue=nWaitersBlocked;
nWaitersBlocked = 0;
}
else {
nSignalsToIssue = 1;
nWaitersToUnblock++;
nWaitersBlocked--;
}
}
else if ( nWaitersBlocked > nWaitersGone ) { // HARMLESS RACE CONDITION!
sem_wait( semBlockLock ); // close the gate
if ( 0 != nWaitersGone ) {
nWaitersBlocked -= nWaitersGone;
nWaitersGone = 0;
}
if (bAll) {
nSignalsToIssue = nWaitersToUnblock = nWaitersBlocked;
nWaitersBlocked = 0;
}
else {
nSignalsToIssue = nWaitersToUnblock = 1;
nWaitersBlocked--;
}
}
else { // NO-OP
return unlock( mtxUnblockLock );
}

unlock( mtxUnblockLock );
sem_post( semBlockQueue,nSignalsToIssue );
return result;
}

---------- Algorithm 7b / IMPL_SEM,UNBLOCK_STRATEGY == UNBLOCK_ONEBYONE
------
given:
semBlockLock - bin.semaphore
semBlockQueue - bin.semaphore
mtxExternal - mutex or CS
mtxUnblockLock - mutex or CS
nWaitersGone - int
nWaitersBlocked - int
nWaitersToUnblock - int

wait( timeout ) {

[auto: register int result ] // error checking omitted
[auto: register int nSignalsWasLeft ]

sem_wait( semBlockLock );
nWaitersBlocked++;
sem_post( semBlockLock );

unlock( mtxExternal );
bTimedOut = sem_wait( semBlockQueue,timeout );

lock( mtxUnblockLock );
if ( 0 != (nSignalsWasLeft = nWaitersToUnblock) ) {
if ( bTimeout ) { // timeout (or canceled)
if ( 0 != nWaitersBlocked ) {
nWaitersBlocked--;
nSignalsWasLeft = 0; // do not unblock next waiter
below (already unblocked)
}
else {
sem_wait( semBlockQueue ); // better now than spurious
later
}
}
if ( 0 == --nWaitersToUnblock &&
0 != nWaitersBlocked ) {
sem_post( semBlockLock ); // open the gate
nSignalsWasLeft = 0; // do not open the gate below
again
}
}
else if ( INT_MAX/2 == ++nWaitersGone ) { // timeout/canceled or spurious
semaphore :-)
sem_wait( semBlockLock );
nWaitersBlocked -= nWaitersGone; // something is going on here -
test of timeouts? :-)
sem_post( semBlockLock );
nWaitersGone = 0;
}
unlock( mtxUnblockLock );

if ( 1 == nSignalsWasLeft ) {
sem_post( semBlockLock ); // open the gate
}
else if ( 0 != nSignalsWasLeft ) {
sem_post( semBlockQueue ); // unblock next waiter
}

lock( mtxExternal );

return ( bTimedOut ) ? ETIMEOUT : 0;
}

signal(bAll) {

[auto: register int result ]

lock( mtxUnblockLock );

if ( 0 != nWaitersToUnblock ) { // the gate is closed!!!
if ( 0 == nWaitersBlocked ) { // NO-OP
return unlock( mtxUnblockLock );
}
if (bAll) {
nWaitersToUnblock += nWaitersBlocked;
nWaitersBlocked = 0;
}
else {
nWaitersToUnblock++;
nWaitersBlocked--;
}
unlock( mtxUnblockLock );
}
else if ( nWaitersBlocked > nWaitersGone ) { // HARMLESS RACE CONDITION!
sem_wait( semBlockLock ); // close the gate
if ( 0 != nWaitersGone ) {
nWaitersBlocked -= nWaitersGone;
nWaitersGone = 0;
}
if (bAll) {
nWaitersToUnblock = nWaitersBlocked;
nWaitersBlocked = 0;
}
else {
nWaitersToUnblock = 1;
nWaitersBlocked--;
}
unlock( mtxUnblockLock );
sem_post( semBlockQueue );
}
else { // NO-OP
unlock( mtxUnblockLock );
}

return result;
}

---------- Algorithm 7c / IMPL_EVENT,UNBLOCK_STRATEGY == UNBLOCK_ONEBYONE
------
given:
hevBlockLock - auto-reset event
hevBlockQueue - auto-reset event
mtxExternal - mutex or CS
mtxUnblockLock - mutex or CS
nWaitersGone - int
nWaitersBlocked - int
nWaitersToUnblock - int

wait( timeout ) {

[auto: register int result ] // error checking omitted
[auto: register int nSignalsWasLeft ]

wait( hevBlockLock,INFINITE );
nWaitersBlocked++;
set_event( hevBlockLock );

unlock( mtxExternal );
bTimedOut = wait( hevBlockQueue,timeout );

lock( mtxUnblockLock );
if ( 0 != (SignalsWasLeft = nWaitersToUnblock) ) {
if ( bTimeout ) { // timeout (or canceled)
if ( 0 != nWaitersBlocked ) {
nWaitersBlocked--;
nSignalsWasLeft = 0; // do not unblock next waiter
below (already unblocked)
}
else {
wait( hevBlockQueue,INFINITE ); // better now than spurious
later
}
}
if ( 0 == --nWaitersToUnblock &&
0 != nWaitersBlocked ) {
set_event( hevBlockLock ); // open the gate
nSignalsWasLeft = 0; // do not open the gate below
again
}
}
else if ( INT_MAX/2 == ++nWaitersGone ) { // timeout/canceled or spurious
semaphore :-)
wait( hevBlockLock,INFINITE );
nWaitersBlocked -= nWaitersGone; // something is going on here -
test of timeouts? :-)
set_event( hevBlockLock );
nWaitersGone = 0;
}
unlock( mtxUnblockLock );

if ( 1 == nSignalsWasLeft ) {
set_event( hevBlockLock ); // open the gate
}
else if ( 0 != nSignalsWasLeft ) {
set_event( hevBlockQueue ); // unblock next waiter
}

lock( mtxExternal );

return ( bTimedOut ) ? ETIMEOUT : 0;
}

signal(bAll) {

[auto: register int result ]

lock( mtxUnblockLock );

if ( 0 != nWaitersToUnblock ) { // the gate is closed!!!
if ( 0 == nWaitersBlocked ) { // NO-OP
return unlock( mtxUnblockLock );
}
if (bAll) {
nWaitersToUnblock += nWaitersBlocked;
nWaitersBlocked = 0;
}
else {
nWaitersToUnblock++;
nWaitersBlocked--;
}
unlock( mtxUnblockLock );
}
else if ( nWaitersBlocked > nWaitersGone ) { // HARMLESS RACE CONDITION!
wait( hevBlockLock,INFINITE ); // close the gate
if ( 0 != nWaitersGone ) {
nWaitersBlocked -= nWaitersGone;
nWaitersGone = 0;
}
if (bAll) {
nWaitersToUnblock = nWaitersBlocked;
nWaitersBlocked = 0;
}
else {
nWaitersToUnblock = 1;
nWaitersBlocked--;
}
unlock( mtxUnblockLock );
set_event( hevBlockQueue );
}
else { // NO-OP
unlock( mtxUnblockLock );
}

return result;
}