Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

recursive mutexes and conditions

52 views

Skip to first unread message

Ben Self

unread,

Sep 24, 1997, 3:00:00 AM9/24/97

I have a question regarding recursive mutexes and condition variables.

Given a mutex created with one of the following attributes:

DCE threads
pthread_mutexattr_setkind_np( &attr, MUTEX_RECURSIVE_NP );

X/Open XSH5 (UNIX98)
pthread_mutexattr_settype( &attr, PTHREAD_MUTEX_RECURSIVE );

What exactly is the behavior of a pthread_cond_wait() and what effect do
"nested" locks have on this behavior?

Do mixing recursive locks and condition variables make any sense?

This is largely achademic. However, I (like everyone else in known
space/time) maintain an OO abstraction of a portable subset of Pthreads
and would like to know the appropriate semantics.

Since the advent of the framework (about 3 years ago) I have managed to
avoid using recursive mutexes. Unfortunately, my back may be against
the wall on a few new projects and I may be forced to use them. They
seem to be a real pain.

And yes I promise this concludes my postings on cvs for the forseen
future.

thanks,

--ben

Joerg Faschingbauer

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

Hi Ben,

just curious: what has happened that you have to use recursive
mutexes? I often thought "shit, now I have to do it", but, after
quarreling with myself a while, everything could be solved by a
different locking strategy. What I mean is (academic): is there any
problem which can be solved *only* by using use recursive mutexes?

Joerg

Ben Self

unread,

Sep 25, 1997, 3:00:00 AM9/25/97

Joerg Faschingbauer wrote:
>
> Hi Ben,
>
> just curious: what has happened that you have to use recursive
> mutexes? I often thought "shit, now I have to do it", but, after
> quarreling with myself a while, everything could be solved by a
> different locking strategy. What I mean is (academic): is there any
> problem which can be solved *only* by using use recursive mutexes?
>
> Joerg
>

No, you are right. There is no problem that can *only* be solved by
using a recursive mutex. Depending on your methodology and degree of
your fanatacism it is like the goto, mutiple inheritence, ... Code that
doesn't use them tends to be more understandable and efficient.

There are a couple of situations where they are absolutely convenient.
These situations usually have something to do with legacy code and being
short on time.

Recursive mutexes are offten used in a design pattern for callback
driven C++ frameworks. Also using OO encapsulation and polymorphism in
a non trivial way often makes it very difficult to synchronize without
recursive mutexes.

One of the simplest ways to acomplish this is to provide both safe and
unsafe methods and expect the user to know which to call based upon the
state of the lock. In many cases this is not even possible because the
state of lock cannot be predicted. In these cases the state of the lock
must be determined before deciding which path to take (both of which are
very harry things to get right).

In all either much of the code has to change or some brilliant design
work has to happen. In the case of incorporating legacy code on a short
timeline often neither of these are acceptable and people go for the
easy fix.

If anyone has found a third path *please* let me know.

--ben

------------
Ben R. Self
bs...@opentext.com
http://www.opentext.com

Open Text -- home of Livelink Intranet

Dave Butenhof

unread,

Sep 26, 1997, 3:00:00 AM9/26/97

Ben Self wrote:
>
> I have a question regarding recursive mutexes and condition variables.
>
> Given a mutex created with one of the following attributes:
>
> DCE threads
> pthread_mutexattr_setkind_np( &attr, MUTEX_RECURSIVE_NP );
>
> X/Open XSH5 (UNIX98)
> pthread_mutexattr_settype( &attr, PTHREAD_MUTEX_RECURSIVE );
>
> What exactly is the behavior of a pthread_cond_wait() and what effect do
> "nested" locks have on this behavior?

The pthread_cond_wait (and pthread_cond_timedwait) have a well-defined
operation on the specified mutex -- and it does exactly the same thing
regardless of the type (kind) of the mutex. That is, it performs an
implicit unlock, waits, and then performs a lock.

> Do mixing recursive locks and condition variables make any sense?

It is "meaningful", but rarely "useful". Whether that makes any sense,
is up to you.

The operation is well-defined, and it's perfectly legal. The only
problem is that, IF the mutex is locked to a depth greater than 1,
waiting on a condition variable will merely decrease the lock depth. No
other thread will be able to acquire the lock, in order to set the
predicate and decide to signal/broadcast. Thus, you've hung at least the
subsystem involved in that mutex, and therefore possibly the entire
application.

Although I don't know of any that do, an implementation could decide to
report this condition as an error -- say, EDEADLK. (Although it's a
somewhat unusual interpretation of the error code, I can't think of
another that's a better fit.) Of course, only a pthread_cond_wait would
result in this error, since a pthread_cond_timedwait will eventually
time out and allow the application a chance to recover. (It probably
won't, but since it COULD, an error is probably inappropriate.)

> This is largely achademic. However, I (like everyone else in known
> space/time) maintain an OO abstraction of a portable subset of Pthreads
> and would like to know the appropriate semantics.
>
> Since the advent of the framework (about 3 years ago) I have managed to
> avoid using recursive mutexes. Unfortunately, my back may be against
> the wall on a few new projects and I may be forced to use them. They
> seem to be a real pain.

Recursive mutexes are slower, and, because they have more internal
state, they're more complicated to use correctly. (As borne out by the
interaction with condition waits.)

For DCE threads, we decided to create the "global mutex" -- a single,
universally available, mutex that could be used to safely protect all
non-threaded code. (Back in the days when very little code in a typical
UNIX system was thread-safe.) Everyone had to use a single mutex, or
else the whole system wouldn't work. (I lock mutex A to call malloc, you
lock mutex B to call malloc, and someone else calls sbrk, which malloc
uses, and which isn't protected by any mutex at all.) Well, I thought it
would be a neat hack to create the global mutex by implementing the
recursive mutex type. We'd talked a lot about the idea of mutex types,
and ways to avoid "fast path" overhead to support them -- and this was
my chance to demonstrate.

Unfortunately, recursive mutexes are a bad idea. They're expedient, and
it is, sometimes, useful to have them available so you don't need to
make code thread-safe. But there's ALWAYS a better, more maintainable,
and far more efficient alternative that uses normal mutexes. For a long
time, I regretted having introduced recursive mutexes into DCE (which is
what caused them to become part of XSH5). I've since reached a state of
relative inner peace, however. They CAN BE convenient. If used properly
(that is, with careful analysis and design), recursive mutexes can be an
effective way to use code that's never performance-critical in threaded
programs. (But if you've got code that's ever on a hot path, avoid
recursive mutexes like the plague that they are!)

> And yes I promise this concludes my postings on cvs for the forseen
> future.

Of course, unless you're far better at prognostication than most of us,
we've now exceeded the limited segment of the future that you were able
to foresee when you wrote that. Yes? ;-)

/---------------------------[ Dave Butenhof ]--------------------------\
| Digital Equipment Corporation bute...@zko.dec.com |
| 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof |
| Nashua NH 03062-2698 http://www.awl.com/cp/butenhof/posix.html |
\-----------------[ Better Living Through Concurrency ]----------------/

David Holmes

unread,

Sep 30, 1997, 3:00:00 AM9/30/97

Dave Butenhof <bute...@zko.dec.com> wrote in article
<342B97B0...@zko.dec.com>...

> Unfortunately, recursive mutexes are a bad idea. They're expedient, and
> it is, sometimes, useful to have them available so you don't need to
> make code thread-safe. But there's ALWAYS a better, more maintainable,
> and far more efficient alternative that uses normal mutexes.

I'd be interested in any discussion showing why recursive mutex's are a bad
idea, in particular in relation to the choice of the Java designers to
support them. Have the Java designers simple catered for the mass of novice
programmers who don't really know how to do concurrent programming?

I'm also curious why, given the support for recursive mutexs that POSIX
provides the semantics of cond_wait() don't take them into account the way
they are in Java. Is it simply a matter of providing recursive mutexs for
those few occasions when you consider them suitable but not really wanting
to advocate their use in general?

Cheers,
David Holmes

Dave Butenhof

unread,

Oct 3, 1997, 3:00:00 AM10/3/97

David Holmes wrote:
>
> Dave Butenhof <bute...@zko.dec.com> wrote in article
> <342B97B0...@zko.dec.com>...
> > Unfortunately, recursive mutexes are a bad idea. They're expedient, and
> > it is, sometimes, useful to have them available so you don't need to
> > make code thread-safe. But there's ALWAYS a better, more maintainable,
> > and far more efficient alternative that uses normal mutexes.
>
> I'd be interested in any discussion showing why recursive mutex's are a bad
> idea, in particular in relation to the choice of the Java designers to
> support them. Have the Java designers simple catered for the mass of novice
> programmers who don't really know how to do concurrent programming?

I won't make any comments regarding the knowledge or intent of the Java
designers. (Well, OK, maybe I will, but they'll be a little less direct
than your intriguing suggestion, and I'll save them for later.) However,
Java is not POSIX threads. The scope and goals are independent.

> I'm also curious why, given the support for recursive mutexs that POSIX
> provides the semantics of cond_wait() don't take them into account the way
> they are in Java. Is it simply a matter of providing recursive mutexs for
> those few occasions when you consider them suitable but not really wanting
> to advocate their use in general?

First, POSIX does *not* support recursive mutexes. Recursive mutexes are
an extension originally provided by DCE threads, and recently adapted by
The Open Group as part of the Single UNIX Specification, version 2. (Or,
less conversationally but more correctly, XSH5.)

As you say, Java's implicit locks operate as if they were recursive.
(Java doesn't have mutexes, it has synchronized methods and blocks. The
most "obvious" implementation might be to use a recursive mutex, but
that's not necessarily the most appropriate implementation.) Also, as
you imply, Java's wait "spins out" this implicit (and hypothetical)
recursive mutex to level 0, "releasing" it. (This can't be done using
XSH5 recursive mutexes, unless the JVM independently counts how many
times it has locked each mutex, in which case the "recursiveness" of the
mutex isn't doing much for you.)

In general, it's not a good idea to call anything with a mutex locked
(except, of course, to wait on a condition associated with the mutex, or
to unlock the mutex ;-) ), because doing so reduces concurrency. Mutexes
should be held for a short period of time so other threads don't have to
waste time waiting. Java code probably does this a lot, though,
unintentially, because it's so much "easier" to synchronize a method
than to synchronize the actual regions of code where synchronization is
needed. (While it's disastrous to concurrency and efficiency, it's
convenient -- convenience isn't always bad, but you should always be
wary of it.)

Nevertheless, when a thread does call another routine while holding a
mutex, (with the exception of unlock and wait!) the called code MUST
assume that the mutex is locked because one or more program invariants
are broken. That's WHY you lock mutexes, and when the invarient is no
longer broken, you should UNLOCK them. Again, Java programs probably
don't do this, because the METHOD is usually synchronized, not the
actual critical section. This means that many calls from synchronized
Java methods will be made from "non-critical" code, while invariants are
stable. However, the called code, and, more importantly, the JVM, has NO
WAY TO KNOW THIS. If you "spin out" the recursive lock to free it, you
are breaking the calling code by exposing the supposedly protected
(broken) invariants to other threads, unprepared to handle this.

Perhaps Java designers did this assuming that recursive locks are always
spurious (the outer code was a synchronized method with no broken
invariants at the time of the call). (Maybe they're even right -- but if
so, it's pure luck. There are are many little disasters just waiting for
a chance to happen.) Maybe they just assumed that nobody would ever be
foolish enough to make a call with a broken invariant. (Ha. Ha. And
again, Ha.) Perhaps they just didn't think about it. Whatever the
reasons, it's broken. As implemented, Java *requires* that programmers
call synchronized routines, from within synchronized routines or blocks,
*only* when all invariants are stable. Making a call, with a broken
invariant, to some other synchronized function that eventually results
in a wait, will BREAK the applet/application in strange ways that are
difficult to debug. This is nothing less than a form of data corruption,
perpetrated transparently by the runtime (presumably in the name of
"convenience").

The one omission in XSH5 recursive mutex support was a standard error
code for the programming ERROR of trying to wait on a condition with a
recursively locked mutex. If it's safe to release the recursive locks,
the code should have done so. If it's not safe to release, then the
runtime MUST NOT do so.

Robert White

unread,

Oct 7, 1997, 3:00:00 AM10/7/97

Dave Butenhof wrote:

> David Holmes wrote:
> >
> > Dave Butenhof <bute...@zko.dec.com> wrote in article
> > <342B97B0...@zko.dec.com>...
> > > Unfortunately, recursive mutexes are a bad idea. They're expedient, and
> > > it is, sometimes, useful to have them available so you don't need to
> > > make code thread-safe. But there's ALWAYS a better, more maintainable,
> > > and far more efficient alternative that uses normal mutexes.
> >
> > I'd be interested in any discussion showing why recursive mutex's are a bad
> > idea, in particular in relation to the choice of the Java designers to
> > support them. Have the Java designers simple catered for the mass of novice
> > programmers who don't really know how to do concurrent programming?
>
> I won't make any comments regarding the knowledge or intent of the Java
> designers. (Well, OK, maybe I will, but they'll be a little less direct
> than your intriguing suggestion, and I'll save them for later.) However,
> Java is not POSIX threads. The scope and goals are independent.

I STRONGLY DISAGREE with the idea that recursive mutexes "are a bad idea".

I have made and use a recursive mutex class in several key C++ endeavors. As a
low-level tool recursive mutexes are "bad" in that they tend to lead the sloppy
down dangerous roads. Conversly, in experienced hands an recursive mutex is a
tool of simple elegance. The core thing, as always, is "knowing what you are
doing".

As an example I have a small group of classes that deal with message passing and
handling between threads. In this model the same message may be destined for may
threads. Further the messaging system needs to be "reasonabbly fast" and
memory-tight. To reduce/remove a slew of unnecessairy copying I use handles and
such a hell of a lot.

Queues, Handle-targets (e.g. the handeled things), Messages and several other
classes all have the RecursiveMutex class as a virtual base class. By producing
fully stable (with respect to locking and mt issues) Queues etc and having the
base r-mutex class virtual I dispense with needing to have a public (locked) and
a protected (presumed locked) interface to the classes which must/may be
connected by inheritance relationships. A prime example is messages and handles
to messages.

A message is read in in a way that may lead to the message spanning several
buffers. If that message finds its way to certian facilities its text will need
to be "burped" into one buffer to reduce many operations (think strlen, memcpy
and so on) into single call transactions. If the message need only be regurgated
writev(2) can be used to write the separate buffers with one call witout ever
needign to burp them. (Burping is expensive and to be avoided if possible 8-)
Since the same message (e.g. not a copy) may be going many places you need to
lock it if you want to burp it, or write it.

Using handles to objects and referencing counting is fun and effective, but the
target needs to be locked durring reference incrementing and decrementing if the
reference count is to be safe across several threads.

Making the mutex for both a handle-target and a message virtual I can use the
same mutex for both operations (cutting the mutex create/destroy/etc in half
right away) and by making it recursive I cut my code volume in half by not
needing to have every operation that may need to be exclusive have a
lock-me-public and already-locked-private call. I also don't need to ever think
about these duplicate calls when doing derivation.

This is a tad simplified (e.g. there are no clear cases in this description where
the lock will necessarily be set recursively, which is actually good 8-) but in
the real body of logic things got twisted (message parking and disk-persistence)
and it was nice to know that when I wrote a Lock() member that called an
ancestor's Lock() directly or indirectly, I could get all the effects I wanted
without a maze of protected calls and witout the first possibility of
sefl-deadlock.

so yes, if you have a spade, a snow covered walkway, and a brain you dont *need*
a snow-shovel. That is *not* however, an a-priori argument against having a
snow-shovel. (and if you've ever tried to shovel a snow-covered walkway with a
spade you know *EXACTLY* what I mean 8-)

Rob.

Dave Butenhof

unread,

Oct 9, 1997, 3:00:00 AM10/9/97

Robert White wrote:
>
> I STRONGLY DISAGREE with the idea that recursive mutexes "are a bad idea".
>
> I have made and use a recursive mutex class in several key C++ endeavors. As a
> low-level tool recursive mutexes are "bad" in that they tend to lead the sloppy
> down dangerous roads. Conversly, in experienced hands an recursive mutex is a
> tool of simple elegance. The core thing, as always, is "knowing what you are
> doing".

Hey, look, recursive mutexes aren't illegal, they're not "morally
perverse", and with XSH5 (UNIX98) they're even standard and portable.
So, fine -- if you like them, you use them. Use them as much as you
like, and in any way you like.

But remember that they're ALWAYS more expensive then "normal" mutexes
(unless your normal mutexes are more expensive than they need to be for
the platform!). And remember that WAITING on a condition variable using
a recursively locked mutex simply won't work. So, if you're using a
condition variable to manage your queue states, you need to at least
analyze your lock usage sufficiently to ensure that the wait will work.
And, once you've done that, it's a simple step to dropping back to a
normal mutex.

There are definitely cases where the expense is acceptable, especially
when modifying existing code -- for example, to create a thread-safe
stdio package. The performance isn't "extremely critical", and you don't
need to worry about condition wait deadlocks (there's no reason to use
them in stdio). Sorting out all of the interactions between the parts of
the package is difficult, and requires a lot of new coding and
reorganization -- and implementing some of the correct semantics gets
really tricky.

Don't waste time optimizing code that's not on the critical path. If
you've got code that's on your critical path, and uses recursive
mutexes, then it's NOT optimized. If you care, you should remove the
recursive mutexes. If you don't care, fine. If the use of recursive
mutexes in non-critical-path code doesn't put it on the critical path,
there's no reason to worry about them.

Still, I, personally, would use a recursive mutex in new code only with
extreme reluctance and substantial consideration of the alternatives.

Robert White

unread,

Oct 10, 1997, 3:00:00 AM10/10/97

to Dave Butenhof

Dave Butenhof wrote:

> Robert White wrote:
> >
> > I STRONGLY DISAGREE with the idea that recursive mutexes "are a bad idea".

> But remember that they're ALWAYS more expensive then "normal" mutexes
> (unless your normal mutexes are more expensive than they need to be for
> the platform!).

The universal sub-context of adding a word to a feature (e.g. the "recursive" in
front of the "mutex") is "it will probably cost a little more somewhere". So, ahem,
"no duh". 8-)

> And remember that WAITING on a condition variable using
> a recursively locked mutex simply won't work. So, if you're using a
> condition variable to manage your queue states, you need to at least
> analyze your lock usage sufficiently to ensure that the wait will work.

My recursive mutexes handle "WAITING on a condition varaible" quite perfectly thank
you. The wait works every time with absolutely perfect correctness. (This may *NOT*
be true of the POSIX recursive mutex... I have no information there... but...) Since,
if you read my description of same, I built a class around a simple mutex that
handles the recursive part, so the contidion variable logic only sees/receives a
"mutex". It unlocks it once to wait. Subsequent users get an unlocked mutex which
they lock and recursively increment in the class, do their signalling, and then
decrement until they actually "unlock" at which point the original holder (may) get
his lock back, but when he dies the class preceives it as locked recursively to depth
N and so on.

I am not an idiot, I wrote it to work....

> And, once you've done that, it's a simple step to dropping back to a
> normal mutex.

Didn't have to drop back, I simply implemented it correctly.

> There are definitely cases where the expense is acceptable, especially
> when modifying existing code -- for example, to create a thread-safe
> stdio package. The performance isn't "extremely critical", and you don't
> need to worry about condition wait deadlocks (there's no reason to use
> them in stdio). Sorting out all of the interactions between the parts of
> the package is difficult, and requires a lot of new coding and
> reorganization -- and implementing some of the correct semantics gets
> really tricky.

You forget the intrinsic nature of having deterministic interfaces. The total cost
of a recursive mutex implemented as a class it the cost of one per-thread-data fetch
per lock. Not free by any means, but since this is (usually) an function call
followed by a subscript operation (after the first time where you have to add a
"new/malloc" of some sort and a second function call) the cost is vanishingly small.

> Don't waste time optimizing code that's not on the critical path. If
> you've got code that's on your critical path, and uses recursive
> mutexes, then it's NOT optimized. If you care, you should remove the
> recursive mutexes. If you don't care, fine. If the use of recursive
> mutexes in non-critical-path code doesn't put it on the critical path,
> there's no reason to worry about them.

Don't throw babbies out with bathwater. If you have to do scads of shit to back out
of an elegant and simple paradyme, and you have to break your balls looking through
code every time you enhance an feature you are almost certian to end up with
something who's cost is comprable to the so-called waste of the direct solution.

> Still, I, personally, would use a recursive mutex in new code only with
> extreme reluctance and substantial consideration of the alternatives.

That view may be warrented with a POSIX recursive mutex, I didn't have one of *them*
but an elementary recursive mutex paradyme (e.g. mine) is graceful and efficatious
(and just fucking usefull) enough to be may "basic chosen mutex" in my class
hierarchy.

You really should not make a habbit of equating implementation to concept. I makes
for a short lifetime of unconstraind thought followed by a long lifetime of
preconceptions and diatribe.

"Just Some Guy"
Rob.

Robert White

unread,

Oct 10, 1997, 3:00:00 AM10/10/97

to Dave Butenhof

Dave Butenhof

unread,

Oct 10, 1997, 3:00:00 AM10/10/97

Robert White wrote:

(all sorts of mildly abusive stuff about recursive mutexes.)

Guess I touched a nerve, huh?

I'm giving out INFORMATION, and ADVICE... not making rules. If you
choose to ignore my advice, go ahead. That's what I already told you.
You really don't need to moan about it in public, do you? (And with two
identical postings, yet.)

Some parting advice: You should watch your "communication" style, and
you should check your assumptions and your spelling.

Brian Silver

unread,

Oct 10, 1997, 3:00:00 AM10/10/97

Robert White wrote:
>
[Snip]

> I am not an idiot, I wrote it to work....

[Snip]

The two are not mutually exclusive.

Brian.
--

Replace ! with @ for email address: silver!zko.dec.com

Robert White

unread,

Oct 10, 1997, 3:00:00 AM10/10/97

Sorry I made a bobo, there is an error in the preceding message, the text should
read:

Put the pthread_mutex and the per-thread class in another class called Mutex. When
Mutex::Lock() is called it gets the per-thread integer. If the integer *IS* zero
call the primitive lock function for the mutex and increment the integer, otherwise
just increment the integer. WhenMutex::Unlock() is called decrement the integer if
it is greater than one, if itis one call the raw unlock and when it finishes
decrement the integer to zero.

I was typing too fast ans got an "is not" where there shoudl have been an "is"...

Happens to the best of us 8-)

Patrick TJ McPhee

unread,

Oct 11, 1997, 3:00:00 AM10/11/97

In article <343DC336...@pobox.com>,
Robert White <rwh...@pobox.com> wrote:

% My recursive mutexes handle "WAITING on a condition varaible" quite perfectly thank
% you.

It sounds like they release the mutex, during the wait, though. That's
certainly not correct behaviour.
--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

David Holmes

unread,

Oct 14, 1997, 3:00:00 AM10/14/97

Sorry for the tardiness of this reply but Dave's response didn't show up at
my local news host until after I left for OOPSLA.

Dave Butenhof <bute...@zko.dec.com> wrote in article

<3434E6...@zko.dec.com>...

> David Holmes wrote:
> > I'd be interested in any discussion showing why recursive mutex's are a
bad
> > idea, in particular in relation to the choice of the Java designers to
> > support them.
>

> First, POSIX does *not* support recursive mutexes. Recursive mutexes are
> an extension originally provided by DCE threads, and recently adapted by
> The Open Group as part of the Single UNIX Specification, version 2. (Or,
> less conversationally but more correctly, XSH5.)

Okay thanks for the clarification and thanks for the detailed response.

> In general, it's not a good idea to call anything with a mutex locked
> (except, of course, to wait on a condition associated with the mutex, or
> to unlock the mutex ;-) ), because doing so reduces concurrency. Mutexes
> should be held for a short period of time so other threads don't have to
> waste time waiting. Java code probably does this a lot, though,
> unintentially, because it's so much "easier" to synchronize a method
> than to synchronize the actual regions of code where synchronization is
> needed.

With OO programming the critical section is often the entire method (or 90%
of it if you take into account argument checks and the like) and thus
enforcing locks at the method level is often appropriate. Further
method-level locking is the only effective level of locking for providing
for re-use and extensibility.

I generally agree with your comments, though I think there is a difference
between system-level programming and higher-level application level
programming, especially in an OO context. Java provides (and is greatly
simplified by doing so) recursive locks for each object; however that in
itself does not require recursive mutex's or imply that recursive mutex's
are a good or necessary thing. It was incorrect of me to state that "Java
provides recursive mutex's". Instead of debating apples and oranges I'd
like to make a few points as to why recursive locks are good and necessary
(even if only for convenience) in an OO environment.

A concurrent OO language basically has three options when it comes to
providing support for locking.
1. intra-method locking
This is where locking code occurs in-line within a method anytime it is
required. This is the familiar "critical section" style of programming and
is most often used in OO languages that access OS or middleware threading
API's for concurrency control. Java also allows this style of programming
but it is frowned upon by the Java designers/documenters.
2. method-level locking
This is where locking code is implicitly (or sometimes explicitly)
inserted at the entry and exit points of designated methods (eg.
synchronized methods in Java).
3. object-level locking
This is where there is no intra-object concurrency at all, but rather an
"active object" model whereby only a single thread can be active within an
object at time (similar to a strict monitor - which Java's 'monitor'
concept is not)

Out of these options 2 and 3 are the most common. 3 is probably the most
common because it greatly simplifies things. Option 2 is the next most
common because the method is the unit of re-use in OO and so specifying
locking at the method level is both natural for the OO programmer and is
more amenable to inheritance and re-use issues. Note that if every method
requires an exclusive lock before it can be executed then 3 is a special
case of 2. Option 1 is actually quite common in older concurrent OO
languages where inheritance and re-use issues have not been considered.

The need for recursive locks in OO programming comes from the style in
which OO programs achieve their goals. In OO languages actions are achieved
by sending messages to objects to execute methods. Thus the actions in an
OO "critical section" are very likely to be calls to other objects methods
(in 'pure' OO languages the only possible actions are calls to object
methods). So the notion that it "is not generally a good idea to call
anything when a mutex is locked" simply does not apply to OO: you will
invariably call when a lock is held. Any invariants of the current object
may be irrelevant to the called method, as may be the fact that locks are
or are not held. The situations where the current objects invariant is
relevant is where the call is either a self-call (direct or to 'super'
methods) or a call which will result in a direct call-back to the current
object. Ignoring concurrency issues, the programmer is always responsible
for ensuring the relevant invariants are maintained before making such
calls. When these situations arise in a concurrent context then the
existence of recursive locks is essential so that common OO coding idioms
do not result in self-deadlock.

The most common example is the simple case of a derived class that has a
method which extends a base method by invoking the base method and then
performing some additional actions. If classic "critical section" style
locking is used in the base method then the derived method may be unable to
ensure mutual exclusion over the base method and the new actions because
either:
i) the internal locking policy and lock object(s) used by the base method
have not been exposed and thus mutual exclusion over all the necessary data
can not be achieved, or;
ii) the internal lock(s) has been exposed but any attempt to use the lock
to bracket the super call and the new actions will result in self-deadlock
unless the locks are recursive.

Similarly method-level locking will result in self-deadlock unless the lock
is recursive. Only object level locking doesn't need recursive locks
because only a single thread can ever be active.

The previous arguments for recursive locks are actually invalid :-), as
some of you will have realised. The need for recursive locks is not simply
a result of OO programming style but a combination of that and the way in
which method level locking is achieved.

In many concurrent OO languages method-level locking is achieved by
inserting the lock acquisition/release code at the entry and exit points of
each method (this is logically what Java does). Thus the locking code
exists in the 'object' code and so a invocation of a super method can not
avoid re-executing the lock acquisition code. Thus conditional waits need
explicit 'un-rolling' semantics. In a more sophisticated language the
locking code may be applied by logically 'expanding' the call-site to
invoke the locking code then the method body then the unlocking code.
However this would need to be done 'dynamically' to avoid problems with
polymorphic method calls, some of which may need locking and some of which
may not, and this gets quite complicated (especially if you want to support
method-level and intra-method locking as Java does). In short, although
recursive locks may not be essential for concurrent object-oriented
programming, they are far easier to support than alternative techniques.

A footnote concerning "active object" approaches may be of interest to some
readers. In most "active object" models concurrency is achieved through
asynchronous message passing not through direct method invocation. This has
the interesting side-effect that the message acceptance code within the
object does all of the necessary locking and synchronisation and thus no
locking code is actually needed within the methods themselves. Such systems
have the added complexity of being able to implement self-calls as either
direct method calls (with no inherent synchronisation) or as self-sends
whereby a new message is sent and thus re-enters the concurrency control
code (of course in this case recursive locking is again needed).

In conclusion, to get back to the original topic, although recursive
locking may be very convenient for concurrent OO programming this does not
imply that a recursive mutex primitive is necessary, as the book-keeping
data structures used by the language runtime can be built on top of
non-recursive mutex's and condition variables. Having said that, a user of
an OO language which does not support concurrency directly will usually
benefit from either a recursive locking primitive or a recursive lock built
upon non-recursive primitives: thus they are more likely to ask for a
recursive mutex.

Thanks for the discussion.

Cheers,
David Holmes
http://www.mri.mq.edu.au/~dholmes

Dave Butenhof

unread,

Oct 22, 1997, 3:00:00 AM10/22/97

"David Holmes" <dho...@mri.mq.edu.au> wrote:
>
> The previous arguments for recursive locks are actually invalid :-), as
> some of you will have realised. The need for recursive locks is not simply
> a result of OO programming style but a combination of that and the way in
> which method level locking is achieved.

I'll certainly agree with the first part! As for the second part, I want
to remove a conditional and make your implication clear. Recursive locks
are not required for OO programming. Period. Yes, they can make it
easier to write a naive threaded OO program, and there's a measurable
benefit in that. The absence of recursive mutexes, however, would force
a clearer (and, yes, more complex) design process, usually resulting in
a cleaner model. In many (though certainly far from all) cases, the
resulting code would be more efficient. Useful, yes. Required,
definitely not.

> In many concurrent OO languages method-level locking is achieved by
> inserting the lock acquisition/release code at the entry and exit points of
> each method (this is logically what Java does). Thus the locking code
> exists in the 'object' code and so a invocation of a super method can not
> avoid re-executing the lock acquisition code. Thus conditional waits need
> explicit 'un-rolling' semantics.

Once again, this "unrolling" is the worst part. It's not just
inefficient, or "unclean" -- it's WRONG and DANGEROUS. When a component
"unrolls" the recursive lock level to wait, it is taking full
responsibility for the stability of ALL predicates protected by that
lock. The language system cannot possibly accept this responsibility
unless the language syntax makes predicates explicit, and Java does not.
The responsibility is therefore placed on the programmer to avoid
calling wait unless it can be KNOWN that in that context all predicates
controlled by the implicit lock are stable. If it's possible for the
code (or language runtime, for that matter) to know this, it would have
been just as easy to avoid the recursive lock to begin with (increasing
efficiency). In any case, if you can't know that the predicates are all
stable, you CANNOT unlock. Period.

Therefore, any threaded Java program that makes a call with a broken
predicate (to anything that waits, or that may make another call that
waits) is BROKEN. Whether it (or its programmers) know it or not. This
will probably start becoming obvious as more systems implement the Java
1.1 kit (with native thread support), and threaded Java program
concurrency moves beyond the control of the VM. Or maybe nobody's ever
made a call with a broken predicate. Maybe. Stranger things have
happened. But then, "do you know where YOUR predicates are?"

Java should either fix this bug or make it PAINFULLY clear to
programmers that they SHALL NOT make calls with broken predicates. This
is always a good rule to follow anyway, so the restriction wouldn't
bother me, and shouldn't prove particularly inconvenient to anyone.
Unfortunately, if the VM can't enforce the rule (and it can't, because
it can't identify predicates), lots of people will get themselves in
trouble, and spend an awesome amount of time figuring out what went
wrong. Debugging a problem like this is tough, because, to start with,
it's a timing problem that can be really hard to reproduce. And even
once you've reproduced it, you've got the victim of a problem caused by
another thread that's moved on, and you've got to painfully reconstruct
the accident scene to figure out what happened.

So, the advice. If you write a concurrent Java program/applet, any place
you've used the "synchronized" keyword, treat EACH AND EVERY method
invokation as if it will unlock and then relock that hidden object mutex
(or class mutex, for static methods). Don't, for example, "do something"
with the first element of a queue, that involves a call, and then remove
it from the queue header -- because if something you call ends up
waiting, another thread can lock the object/class and start working with
the same queue element.

/---------------------------[ Dave Butenhof ]--------------------------\
| Digital Equipment Corporation bute...@zko.dec.com |
| 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof |

| Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ |

David Holmes

unread,

Oct 27, 1997, 3:00:00 AM10/27/97

Again my apologies for the lag in this discussion, I'm getting news
articles about 3-4 days after they are posted.

Dave Butenhof <bute...@zko.dec.com> wrote in article

> Once again, this "unrolling" is the worst part. It's not just
> inefficient, or "unclean" -- it's WRONG and DANGEROUS.

I disagree. It *may* be wrong and dangerous but that doesn't mean it *is*
wrong and dangerous. I don't disagree with what you say about predicates
needing to hold but I don't think that this is a problem.

In a classical concurrent OO environment the predicates concerned are the
invariants of the object and the preconditions of the method that is being
invoked. Of course you have to ensure the precondition of any subsequent
method you invoke on that object (eg. via self or super) but that's the way
OO programming is done. The locks and CV's are encapsulated within the
object that is being accessed - thus ensuring predicates for
synchronisation purposes is no different to ensuring predicates for any
method call.

Ideally we would be able to program all objects to individually handle
their own synchronisation issues, but in practice that is not always
appropriate when it comes to efficiency. To increase efficiency we may
group objects and associate a lock per group. In these circumstances
handling CV's is certainly more complex and requires greater care. But when
we do this we have to design these objects to work together within the
particular synchronisation framework that we want to construct. After all a
method can only invoke wait() if the object has been designed for use in a
concurrent environment: thus the design of that object must be taken into
account when trying to use it in a particular synchronisation framework.

> If you write a concurrent Java program/applet, any place
> you've used the "synchronized" keyword, treat EACH AND EVERY method
> invokation as if it will unlock and then relock that hidden object mutex

This is excessive advice. For a start only methods that use the same object
for synchronisation can potentially unlock the mutex. Secondly unlocking
will only occur if a wait() is performed and this is not something that
should be done arbitrarily or without the programmers knowledge. Effective
documentation of the synchronisation properties of methods is essential to
enable correct programming of concurrent OO programs (or any concurrent
program): and the fact that a method invokes wait() is one of the most
crucial parts of that documentation. Yes you *have* to know if a method
will potentially block due to wait(). If a method can block and you don't
know that (nor can you reasonably infer it from the semantic description of
the methods) then you don't know whether this object can actually be used
by only a single thread or whether multiple threads are required.
Documentation of such issues is an essential extension to the documentation
you would normally expect for a class/method. Armed with this required
information it is much simpler to ensure that necessary predicates hold
before/across potentially blocking calls.

Dave Butenhof

unread,

Oct 28, 1997, 3:00:00 AM10/28/97

David Holmes wrote:
>
> Dave Butenhof <bute...@zko.dec.com> wrote in article

> > Once again, this "unrolling" is the worst part. It's not just
> > inefficient, or "unclean" -- it's WRONG and DANGEROUS.
>
> I disagree. It *may* be wrong and dangerous but that doesn't mean it *is*
> wrong and dangerous. I don't disagree with what you say about predicates
> needing to hold but I don't think that this is a problem.

You're right. But the fact that it's possible for a diligent programmer
to jump through the right hoops and ensure that it's safe to "unroll"
does NOT justify a language system assuming that this is the case,
automatically and without control. Furthermore, as I've said before, if
you can know that it's safe to "unroll", then you have no need for
recursive mutexes in the first place, and certainly no need to "unroll"
them. It means that you understand the full call tree, and the state of
all shared data. Of course, everyone always knows all of this, and the
developer and all future maintainers all keep this information handy and
refer to it continuously, right?

Right. And I've got this big mutex over a river in NYC... would you like
to buy it?

> In a classical concurrent OO environment the predicates concerned are the
> invariants of the object and the preconditions of the method that is being
> invoked. Of course you have to ensure the precondition of any subsequent
> method you invoke on that object (eg. via self or super) but that's the way
> OO programming is done. The locks and CV's are encapsulated within the
> object that is being accessed - thus ensuring predicates for
> synchronisation purposes is no different to ensuring predicates for any
> method call.

Again, this is true only IF the programmer has been extremely careful,
and understands exactly what the program is doing at all times, AND
understands exactly how the language system's synchronization works. I
don't believe any of this for an instant. Do you, really?

I think the risk is far higher in OO programming than in traditional
functional programming. The design of an OO language makes it far easier
to uproot the system's internals "transparently" by rewriting a few
methods. When the language system can't ensure correctness, or detect
incorrectness, you're asking for big trouble.

Sure, nobody would ever make a call with a broken invariant. Of course
not. Nobody would ever pass the wrong datatype, either -- but for some
bizarre reason people actually think that these awkward and confining
"typesafe" languages are a good idea. I wonder why? Perhaps because
programmers aren't perfect? Perhaps because letting "dumb mistakes"
survive past the coding phase of a project makes them far more
difficult, and expensive, to find and fix?

And we're not talking about a "segv without warning when you run the
first test" sort of problem. We're talking about a complicated data
corruption that heavily depends on the timing of various threads. You
may never see it until a big customer runs a particularly demanding
workload. It'll be almost impossible to reproduce. Perhaps you enjoy
that sort of challenge.

> Ideally we would be able to program all objects to individually handle
> their own synchronisation issues, but in practice that is not always
> appropriate when it comes to efficiency. To increase efficiency we may
> group objects and associate a lock per group. In these circumstances
> handling CV's is certainly more complex and requires greater care. But when
> we do this we have to design these objects to work together within the
> particular synchronisation framework that we want to construct. After all a
> method can only invoke wait() if the object has been designed for use in a
> concurrent environment: thus the design of that object must be taken into
> account when trying to use it in a particular synchronisation framework.

Uh huh. Luckily, every programmer, no matter how inexperienced, designs
perfect concurrent code, the first time and every time, and of course
everyone tests every possible combination of concurrent code paths,
exhaustively, before shipping the code. Therefore, the fact that the
Java language system encourages dangerous programming practices is
completely irrelevant. I'm glad we've settled that. The language needn't
be safe because the programmer is perfect. Cool. Is this requirement
stated in the documentation?

> > If you write a concurrent Java program/applet, any place
> > you've used the "synchronized" keyword, treat EACH AND EVERY method
> > invokation as if it will unlock and then relock that hidden object mutex
>
> This is excessive advice. For a start only methods that use the same object
> for synchronisation can potentially unlock the mutex. Secondly unlocking
> will only occur if a wait() is performed and this is not something that
> should be done arbitrarily or without the programmers knowledge. Effective
> documentation of the synchronisation properties of methods is essential to
> enable correct programming of concurrent OO programs (or any concurrent
> program): and the fact that a method invokes wait() is one of the most
> crucial parts of that documentation. Yes you *have* to know if a method
> will potentially block due to wait(). If a method can block and you don't
> know that (nor can you reasonably infer it from the semantic description of
> the methods) then you don't know whether this object can actually be used
> by only a single thread or whether multiple threads are required.
> Documentation of such issues is an essential extension to the documentation
> you would normally expect for a class/method. Armed with this required
> information it is much simpler to ensure that necessary predicates hold
> before/across potentially blocking calls.

Yeah. Simple. Nice to hear that. Although perhaps a tad insulting to all
the experienced folks who've actually had trouble designing perfect
concurrent applications all this time. Gee, I wonder what's wrong with
them?

/---------------------------[ Dave Butenhof ]--------------------------\
| Digital Equipment Corporation bute...@zko.dec.com |
| 110 Spit Brook Rd ZKO2-3/Q18 http://members.aol.com/drbutenhof |

| Nashua NH 03062-2698 http://www.awl.com/cseng/titles/0-201-63392-2/ |

Patrick TJ McPhee

unread,

Nov 2, 1997, 3:00:00 AM11/2/97

In article <01bce31d$66ec2500$1bf56f89@dholmes>,
David Holmes <dho...@mri.mq.edu.au> wrote:

% Dave Butenhof <bute...@zko.dec.com> wrote in article
% > Once again, this "unrolling" is the worst part. It's not just
% > inefficient, or "unclean" -- it's WRONG and DANGEROUS.
%
% I disagree. It *may* be wrong and dangerous but that doesn't mean it *is*
% wrong and dangerous.

I'd like to ring in on David's side here. It *is* wrong and dangerous.

% In a classical concurrent OO environment the predicates concerned are the
% invariants of the object and the preconditions of the method that is being
% invoked.

I'm not actually sure what this sentence means, but it seems to me that
the principal goal of OO, concurent or not, is abstraction. If you have to
know all about the implementation of an object before you can use it, that
strikes me as a problem (and as going against the whole object of the
methodology).

% > If you write a concurrent Java program/applet, any place
% > you've used the "synchronized" keyword, treat EACH AND EVERY method
% > invokation as if it will unlock and then relock that hidden object mutex
%
% This is excessive advice. For a start only methods that use the same object
% for synchronisation can potentially unlock the mutex. Secondly unlocking
% will only occur if a wait() is performed and this is not something that
% should be done arbitrarily or without the programmers knowledge.

Why not? And how do you protect from someone doing this? What if the
implementation of one piece changes? It seems to me you're saying that
any time you use a mutex, you have to examine the implementation of
every object that's used within the mutex section, and any time you do
a wait, you have to examine the implementation of every object that's
calling that method. _This_ sounds excessive to me.

I don't have a problem with recursive mutexes per se, but if the first
lock isn't held until its matching unlock, then the mutex isn't
implemented properly. I don't see how you can defend that kind of

David Holmes

unread,

Nov 4, 1997, 3:00:00 AM11/4/97

Patrick TJ McPhee <pt...@interlog.com> wrote in article
<63ihbk$slg$1...@news.interlog.com>...

>
> I'd like to ring in on David's side here. It *is* wrong and dangerous.

Let's be clear - you rang in on Dave Butenhofs side :)

> % In a classical concurrent OO environment the predicates concerned are
the
> % invariants of the object and the preconditions of the method that is
being
> % invoked.
>
> I'm not actually sure what this sentence means, but it seems to me that
> the principal goal of OO, concurent or not, is abstraction. If you have
to
> know all about the implementation of an object before you can use it,
that
> strikes me as a problem (and as going against the whole object of the
> methodology).

Synchronisation is not just an implementation detail. Certainly straight
forward thread-safety, ie locking with no conditional blocking can be done
without the client even knowing they are using a thread-safe object.
However if a method blocks a thread on a condition variable this is a very
important synchronisation property of the object which is visible to the
client and thus should be documented as part of the semantics of the
interface to that object. Ie. synchronisation forms part of the abstraction
presented by a concurrent object.

> % will only occur if a wait() is performed and this is not something that
> % should be done arbitrarily or without the programmers knowledge.
>
> Why not?

Because blocking a thread is part of the caller visible behaviour that
needs to be documented. Such an object requires that it be used by multiple
threads in the correct manner. An object with blocking semantics on its
guards is different to a balking one. The client needs to know what kind of
object they are dealing with.

> And how do you protect from someone doing this?

You can't. Just as you can't protect anyone from writing 'bad' code. There
are coding styles, idioms and patterns for doing reasonable concurrent
programming (OO or not).

> What if the
> implementation of one piece changes? It seems to me you're saying that
> any time you use a mutex, you have to examine the implementation of
> every object that's used within the mutex section,

No not the implementation - the semantics of the interface. Synchronisation
ala guarded waits is a visible property of an object - or should be.

> and any time you do a wait, you have to examine the implementation of
every
> object that's calling that method. _This_ sounds excessive to me.

??? Object methods don't just arbitrarily do a wait. They do a wait in
response to a condition not being met and under the understanding that
another thread will at some time invoke another method which will cause the
right state and release the waiting thread. When an object is designed to
perform such blocking methods then the condition variable used for such
things is either internal to the object because the object totally
encapsulates the action that is being carried out, or the object must have
been designed to interact with a passed on lock/CV. When an objects method
does a wait() it doesn't need to know about its clients but it does need to
know what sort of concurrent protocol it is participating in, just as the
client needs to know what sort of concurrent protocol it supports.

> I don't have a problem with recursive mutexes per se, but if the first
> lock isn't held until its matching unlock, then the mutex isn't
> implemented properly. I don't see how you can defend that kind of
> behaviour.

Because it is appropriate behaviour in the context under discussion: Java
locks and wait-sets. The use of a recursive lock makes implementation much
simpler. Within the expected usage it is necessary for the wait() mechanism
to unwind the lock. This is purely an implementation consideration. If
java's synchronized statements did not transform into inline locking code,
but instead allowed locking/unlocking to be done once regardless of
super/self calls, then a recursive lock would not be needed and wait()
would simply unlock the mutex not unwind it.

David

Dave Butenhof

unread,

Nov 6, 1997, 3:00:00 AM11/6/97

David Holmes wrote:

It is (or should be) an externally documented behavior ONLY if the behavior is
relevant to the caller. That is, if the caller will be required to unblock it,
then, sure, it needs to be documented. To be pedantic, it would be wrong for
POSIX to fail to document that pthread_cond_wait will block the thread until
another thread signals or broadcasts the condition variable. The blocking is
an external behavior of the interface.

But any thread-safe method that uses a mutex may block. That's implicit, and
it does not affect the external behavior of the method -- nor does the caller
need to "do something" to wake it. It's blocked because it requires a resource
(the mutex) currently held by some other thread, because that thread is using
the same critical section or shared data. It'll be unblocked when that other
thread is done, with no action required of the caller. The resource in
question (the mutex) isn't even visible (or accessible) to the caller. It's
NOT external behavior.

The same is true when a condition variable is used to signal between threads.
For example, the method you called may be a client, communicating with a
server thread. The fact that it blocks on an (internal) condition variable
while the server works is irrelevant to the external behavior of the method.
(There's nothing the caller could do about it anyway, since the condition
variable is internal to the client/server package, and isn't visible to the
caller.)

> > % will only occur if a wait() is performed and this is not something that
> > % should be done arbitrarily or without the programmers knowledge.
> >
> > Why not?
>
> Because blocking a thread is part of the caller visible behaviour that
> needs to be documented. Such an object requires that it be used by multiple
> threads in the correct manner. An object with blocking semantics on its
> guards is different to a balking one. The client needs to know what kind of
> object they are dealing with.

No, it's not caller visible, unless the interface MAKES it caller visible.
Which, in general, is a bad idea. The interface should export an abstraction
("get the balance of my checking account"), not an implementation ("lock mutex
foo, create an instance of the check_request structure and add it to the
request queue, signal condition variable request, and wait on condition
variable confirm, remove check_request from the confirmation queue and return
the balance field"). Exporting the implementation makes for all sorts of
maintainability problems, and it's a particularly bad idea for OO designs
where the abstraction you're blowing is the basic power of the design!

> > And how do you protect from someone doing this?
>
> You can't. Just as you can't protect anyone from writing 'bad' code. There
> are coding styles, idioms and patterns for doing reasonable concurrent
> programming (OO or not).
>
> > What if the
> > implementation of one piece changes? It seems to me you're saying that
> > any time you use a mutex, you have to examine the implementation of
> > every object that's used within the mutex section,
>
> No not the implementation - the semantics of the interface. Synchronisation
> ala guarded waits is a visible property of an object - or should be.

So you're saying that concurrent design shouldn't be OO, and shouldn't take
advantage of the power of abstraction. Interesting. Of course, I don't agree.

> > and any time you do a wait, you have to examine the implementation of
> every
> > object that's calling that method. _This_ sounds excessive to me.
>
> ??? Object methods don't just arbitrarily do a wait. They do a wait in
> response to a condition not being met and under the understanding that
> another thread will at some time invoke another method which will cause the
> right state and release the waiting thread. When an object is designed to
> perform such blocking methods then the condition variable used for such
> things is either internal to the object because the object totally
> encapsulates the action that is being carried out, or the object must have
> been designed to interact with a passed on lock/CV. When an objects method
> does a wait() it doesn't need to know about its clients but it does need to
> know what sort of concurrent protocol it is participating in, just as the
> client needs to know what sort of concurrent protocol it supports.

IF the protocol is external to the method doing the wait, then, sure. This is
certainly true for the implementation of wait(). Just as for
pthread_cond_wait, the wait IS the external semantic, and it would be foolish
and pointless to try to hide that. This is rarely the case, however. In most
cases, the wait is an internal detail in the method's implementation:
transparent and irrelevant to the caller. In a modular programming environment
supporting even the most primitive sort of abstraction, it MUST be
transparent.

> > I don't have a problem with recursive mutexes per se, but if the first
> > lock isn't held until its matching unlock, then the mutex isn't
> > implemented properly. I don't see how you can defend that kind of
> > behaviour.
>
> Because it is appropriate behaviour in the context under discussion: Java
> locks and wait-sets. The use of a recursive lock makes implementation much
> simpler. Within the expected usage it is necessary for the wait() mechanism
> to unwind the lock. This is purely an implementation consideration. If
> java's synchronized statements did not transform into inline locking code,
> but instead allowed locking/unlocking to be done once regardless of
> super/self calls, then a recursive lock would not be needed and wait()
> would simply unlock the mutex not unwind it.

As I've argued before, the lock unwind is "necessary" only because it
implements the broken wait semantics arbitrarily assigned by Java. It's wrong
and dangerous. It completely breaks any concept of modular programming or
abstraction in concurrent Java applications, because the caller MUST know
whether a method it uses might wait, and specifically prepare for that. The
only way this can be done safely, without causing major maintainability
nightmares, is for EVERY call made within a synchronized region to assume that
the lock may be unwound before that call returns. Yes, you may sometimes be
able to optimize this by applying a static dependency analysis -- and not
worry if you know that the call you're making can't possibly result in
unwinding the lock of the current object. But what if that changes in the
future? How often can you be SURE (100%, bet-your-life sure) that it can't
happen?

And if Java avoided using recursive mutexes, the problem wouldn't go away. It
doesn't matter whether you're unlocking a nonrecursive mutex or unwinding a
recursive mutex. The point is that the caller(s) of the waiting method may
have broken invariants (shared data in an inconsistent state) that are
protected by the mutex being unlocked. The program is now BROKEN. Unless the
language can avoid any possibility of broken invariants, or the programmer
successfully avoids the trap, you're in big trouble. Java cannot avoid the
possibility, or even detect it, and does not currently make any attempt to
warn the programmer that there even IS a trap, much less how they can wriggle
through the spike-studded hoops to avoid it. That's wrong. And dangerous. And
no amount of rationalization will make the problem go away. Period.

Patrick TJ McPhee

unread,

Nov 9, 1997, 3:00:00 AM11/9/97

In article <01bce8cf$94d1c120$1bf56f89@dholmes>,
David Holmes <dho...@mri.mq.edu.au> wrote:
% Patrick TJ McPhee <pt...@interlog.com> wrote in article
% <63ihbk$slg$1...@news.interlog.com>...

% > I'd like to ring in on David's side here. It *is* wrong and dangerous.
%
% Let's be clear - you rang in on Dave Butenhofs side :)

Oops. Sorry.

% Synchronisation is not just an implementation detail.

Yes it is. More precisely, the way in which synchronisation is achieved
is just an implementation detail. Even the fact of synchronisation itself
could be an impelementation detail. For all you know, the method you're
calling could kick off another thread to do some work in parallel. You
shouldn't have to know. In fact the more complicated it gets, the more
you shouldn't have to know.

% > % will only occur if a wait() is performed and this is not something that
% > % should be done arbitrarily or without the programmers knowledge.
% >
% > Why not?
%
% Because blocking a thread is part of the caller visible behaviour that
% needs to be documented.

It depends on how it blocks it. If there's the potential for deadlock, then
the caller has to know how to avoid the deadlock. Otherwise, I would argue
that blocking isn't caller-visible. If there's some precondition to avoid
deadlock, that has to be documented, but I argue that should be of the form
`server X has to be running or this method will hang', rather than
`such and such a mutex will be unrolled if you lock it'.

% The client needs to know what kind of object they are dealing with.

No, it doesn't. The client needs to know what exceptions it will raise
and what it does on success.

% > And how do you protect from someone doing this?
%
% You can't. Just as you can't protect anyone from writing 'bad' code.

But there's a difference between bad code and good code which produces
unreasonable results..

% > and any time you do a wait, you have to examine the implementation of
% every
% > object that's calling that method. _This_ sounds excessive to me.
%
% ??? Object methods don't just arbitrarily do a wait.

!!! Details of implementations change over time. The beautiful thing
about new languages is that there's no legacy code, so the effect of
implementation changes isn't constantly jumping up and rapping everyone
in the nose, but give it time. At some point, somebody will decide to
do some work in parallel, as I suggested above, or will put some new
synchronisation super-structure on top of some existing code in order to
do something better or something new. If making that sort of not
arbitrary, but carefully planned change will require complete
examination of the code base to be sure there aren't any places where a
synchronised method will be broken, that's excessive.

% > I don't see how you can defend that kind of behaviour.
%
% Because it is appropriate behaviour in the context under discussion: Java
% locks and wait-sets.

So, it's defensible because it's designed that way?

% Within the expected usage it is necessary for the wait() mechanism
% to unwind the lock. This is purely an implementation consideration.

Now it's my turn to say `that's not an implementation detail'. In my opinion,
wait should raise a would-have-to-unroll exception if it would have to unroll
a lock. This is a design consideration. This way, you would know that the
wait didn't succeed and you can take some action to work around it.

David Holmes

unread,

Nov 11, 1997, 3:00:00 AM11/11/97

Dave Butenhof <bute...@zko.dec.com> wrote in article

<3461E22F...@zko.dec.com>...

> It is (or should be) an externally documented behavior ONLY if the
behavior is
> relevant to the caller.

[snip]

> The same is true when a condition variable is used to signal between
threads.

For the simple examples of thread-safety I totally agree - the concurrency
behaviour is totally transparent to the client. The difference between
locking for thread safety versus using condition variables and wait() is
that the former doesn't require multiple threads to execute correctly while
the latter does.

Taking your example of signalling between threads. If the method blocks on
an internal condition while waiting for the server and the server is
guaranteed to respond at some point, then I agree that the blocking does
not need to be documented and does not form part of the semantics of the
interface. However, if the server is not guaranteed to respond, because it
depends on other threads/activities doing something then this may need to
be something that is visible to the client. The most obvious form of
visibility that comes to mind is the ability to specify a time-out. This is
a common practice when dealing with distribution and is not irrelevant when
dealing with (possibly) simpler thread communication.

> No, it's not caller visible, unless the interface MAKES it caller
visible.
> Which, in general, is a bad idea. The interface should export an
abstraction

It must be visible if the client:
a) needs to ensure some other action to make the first action possible; or
b) if the client may wish to affect the way in which the method is executed
(eg. specifying a time-out or even balking instead of blocking)

In this context the client is not the thread using the object but the
programmer writing the code that the threads will execute.

If synchronisation semantics can be totally transparent to the user of the
interface then I agree with you. However in the situations I am envisaging
those synchronisation semantics are an important part of that interface
because they directly affect the way the interface needs to be used.

> So you're saying that concurrent design shouldn't be OO, and shouldn't
take
> advantage of the power of abstraction. Interesting. Of course, I don't
agree.

Dave, you seem to have a knack of taking a statement and pushing it to the
extreme limits of interpretation. Of course concurrent design should be OO
and course you don't throw away the power of abstraction. But as I stated
before synchronisation semantics can form part of the abstraction. As an
example don't you think that specifying that a read() operations blocks
until input is available is an important part of the semantics of the
operation?

> In most cases, the wait is an internal detail in the method's
implementation:
> transparent and irrelevant to the caller. In a modular programming
environment
> supporting even the most primitive sort of abstraction, it MUST be
> transparent.

Well this is where we disagree. I don't think that in most cases the wait
is an internal detail that is irrelevant to the caller. Obviously we are
both looking at different usage contexts. And it should only be transparent
if it *is* transparent - my point is that you don't hide a wait() that is
significant to the client and the way the client uses the object. That sort
of behaviour is one of the key properties of that object.

> As I've argued before, the lock unwind is "necessary" only because it
> implements the broken wait semantics arbitrarily assigned by Java. It's
wrong
> and dangerous. It completely breaks any concept of modular programming or
> abstraction in concurrent Java applications, because the caller MUST know
> whether a method it uses might wait, and specifically prepare for that.

We come back to the same argument and I don't buy it at all as a general
statement. Obviously we are envisaging completely different uses of Java's
synchronisation mechanism and they way they manifest themselves as part of
an objects behaviour.

But I again re-iterate the point that if you call a method which *might*
wait(), and if that wait() is entirely transparent to the client, then such
a wait will be internal to the client and will use an object that is
internal is well. In which case any locks the client is currently holding
are totally unaffected. It is only if you use global objects for all
locking/waiting that you will run into the problem that you describe. Is
that what you are doing and are concerned about? If so then by definition
the common usage of the same locking object can not be transparent to the
client!

Let's consider an example scenario.
Client thread A in object O1 holds lock L1 and the invariants if O1 are
currently invalid when it makes a call into O2.foo(). Inside foo() is a
condition and condition variable which is protected via L1.
So what happens in Java:
Thread A holds the lock L1 and invokes foo() which locks L1 again and does
a wait(). L1 is now unlocked and all hell can break loose inside O1 if
another thread has access to it.
So what happens in pthreads?
Thread A holds the lock L1 and invokes foo() which tries to lock L1 again,
which can't happen so the thread A deadlocks with itself.

In both cases the programs are broken.

I contend that if O2.foo() uses a global lock L1 then that is an important
piece of information that the programmer needs to be aware of and is not
something that should be hidden or done arbitrarily. If the information is
not hidden the in neither environment would the incorrect design have been
used.

If foo() uses an internal lock L2 with the condition then in both java and
pthreads the lock L1 will still hold (and protect O1) while L2 is released
during the wait. Of course depending on how O2 and O1 are related that
could lead to a nested monitor problem - but that occurs in both
environments and is irrelevant to this discussion.

Regards,
David

David Holmes

unread,

Nov 11, 1997, 3:00:00 AM11/11/97

Patrick TJ McPhee <pt...@interlog.com> wrote in article

<644tpu$6q7$1...@news.interlog.com>...

> % Synchronisation is not just an implementation detail.
>
> Yes it is. More precisely, the way in which synchronisation is achieved
> is just an implementation detail.

Please see my other post responding to Dave Butenhof's latest post (dated
7/11). Synchronisation is only an implementation detail if it is totally
transparent to the client. Certainly simple thread-safety fits in this
category which is what I stated before.

> If there's some precondition to avoid
> deadlock, that has to be documented, but I argue that should be of the
form
> `server X has to be running or this method will hang', rather than
> `such and such a mutex will be unrolled if you lock it'.

I agree in general. But if the reason for this potential deadlock is
because the object uses an external global mutex/condition then that has to
be documented - because its not transparent!

> % The client needs to know what kind of object they are dealing with.
>
> No, it doesn't. The client needs to know what exceptions it will raise
> and what it does on success.

The client needs to know the client visible synchronisation properties of
the object. When I say client I mean the programmer trying to use the
object in question not the specific call-time client.

> If making that sort of not arbitrary, but carefully planned change will
> require complete examination of the code base to be sure there aren't any

> places where a synchronised method will be broken, that's excessive.

It certainly would be. But if your locking strategy emphasis modularity and
encapsulation then it won't be such a problem. You'll have to be aware of
nested monitors and other potential for deadlock of course. If you use a
'global' locking strategy and then change a method to use that lock and do
a wait, then in java you'd get the undesirable unroll and in pthreads you'd
get a self deadlock when the thread tries to acquire the lock twice. In
both cases you'd have to check at all interactions between threads/objects
that use the common lock.

> So, it's defensible because it's designed that way?

I don't consider that it needs defending.

> Now it's my turn to say `that's not an implementation detail'. In my
opinion,
> wait should raise a would-have-to-unroll exception if it would have to
unroll
> a lock.

But that wouldn't fit in with OO concurrent programming in this context at
all. I'm trying to point out that the unrolling semantics are necessary to
make common OO usage scenarios work due to the way locking code is
effectively in-lined in a method. Consider this example:

class Base {
boolean condition = false;
synchronized void waitFor() {
while (!condition) wait();
}
synchronised signal(boolean c){
condition = c;
notifyAll();
}
}
This captures the simplest and most common approach to guarded methods in
Java. Now consider a derived class which wants to introduce some extra
functionality
that logs waitFor operations:

class LoggingBase extends Base {
LogFile log = new LogFile("\temp\log.txt");
synchronized void waitFor(){
super.waitFor();
log.append(Thread.currentThread() + " completed wait");
}
}

Note that the synchronised derived waitFor() invokes the synchronized super
class waitFor(). Without the un-rolling semantics of wait() this simple and
common construct would not work. Because the locking code is in-lined in
the super method the lock gets locked twice. If a different technique was
used for performing locking then the recursive lock and the un-rolling wait
would not be needed. That's why I say its an implementation detail.

I'm curious to see how you would program this example in pthreads?

I can see how the problem with unrolling is viewed but in my view it only
comes into play when common locks objects are used all over the place and
without being documented. And its not that the situation in which this
occurs is broken where a pthread implementaion would be correct - it seems
to be that its the way the situation is broken that is more of concern.

David Holmes

Dave Butenhof

unread,

Nov 11, 1997, 3:00:00 AM11/11/97

David Holmes wrote:

> class Base {
> boolean condition = false;
> synchronized void waitFor() {
> while (!condition) wait();
> }
> synchronised signal(boolean c){
> condition = c;
> notifyAll();
> }
> }
> This captures the simplest and most common approach to guarded methods in
> Java. Now consider a derived class which wants to introduce some extra
> functionality that logs waitFor operations:
>
> class LoggingBase extends Base {
> LogFile log = new LogFile("\temp\log.txt");
> synchronized void waitFor(){
> super.waitFor();
> log.append(Thread.currentThread() + " completed wait");
> }
> }
>
> Note that the synchronised derived waitFor() invokes the synchronized super
> class waitFor(). Without the un-rolling semantics of wait() this simple and
> common construct would not work. Because the locking code is in-lined in the
> super method the lock gets locked twice. If a different technique was used
> for performing locking then the recursive lock and the un-rolling wait would
> not be needed. That's why I say its an implementation detail.

In the simple case you give, unrolling won't break anything. It's still wrong,
because nobody but the programmer can verify that it will work, and, in
general, it'll be very difficult for the programmer to know there's a problem.
That is, unless there's a general rule that "thou shalt not call with broken
invariants".

For example, what if your derived class was more like:

class LoggingBase extends Base {
int predicate;

LogFile log = new LogFile("\temp\log.txt");
synchronized void waitFor(){
super.waitFor();

predicate = 1;

log.append(Thread.currentThread() + " completed wait");
}
}

The call to super.waitFor() is made with a broken (uninitialized) invariant
(predicate) in the derived class. This doesn't invalidate the call to
super.waitFor(), because the invariant doesn't exist in the superclass. In
this case, of course, it's probably fairly obvious to most programmers that
"waitFor" might do a wait, and if they know enough about Java, they'd know
that the wait will unroll their mutex and expose the broken LoggingBase object
invariant to other potential users of the object. Most of the time, it will
not be (and shouldn't be!) obvious that the call might wait. The
implementation of a method (even a superclass method) should be transparent.

Patrick TJ McPhee

unread,

Nov 12, 1997, 3:00:00 AM11/12/97

In article <01bcee59$b708d280$1bf56f89@dholmes>,
David Holmes <dho...@mri.mq.edu.au> wrote:
% Patrick TJ McPhee <pt...@interlog.com> wrote in article [...]

% But that wouldn't fit in with OO concurrent programming in this context at
% all. I'm trying to point out that the unrolling semantics are necessary to
% make common OO usage scenarios work due to the way locking code is
% effectively in-lined in a method. Consider this example:

Well...what you've done here is to implement a routine which just waits.
Obviously, if the routine just waits, the caller needs to know that's what
it does. Suppose you were to take the next step and write a routine which
does something, and along the way it needs to wait.

To me, an example of a use for recursive mutexes is in a getval/setval
situation. getval does something like this

getval:
lock a mutex
perform a million steps, possibly starting off with a wait
release the mutex

setval does something like this

setval:
lock a mutex
perform a million steps, possibly including getval more than once
set the value
release the mutex

If getval does a wait, and `a mutex' is released, then the calculations
performed in setval will be wrong, and there's no way for you to know
until you notice that you're getting the wrong values. The problem won't
be easy to reproduce, because it depends on timing. Maybe you won't notice
it until you get past your live tests and implement the solution across
a world-wide network. You don't know, and you can't know -- this is why
it's wrong and dangerous.

Now, you can argue that another approach to waiting might lead to a hung
thread, and you're probably right, but that's better than nothing. I'd
rather come in one morning and be told that users are reporting deadlocks
all over the world than to come in the first of the month and be told that
there are numbers wrong all over the place, we've got no idea how or why
and there's no confidence in anything that's been done since the system
went live.

% I'm curious to see how you would program this example in pthreads?
If I were implementing a class which does a wait, it would probably have
a method which does a lock, and a method which does an unlock, and a
method which does a wait. How it worked would depend on what I was trying
to achieve.

% I can see how the problem with unrolling is viewed but in my view it only
% comes into play when common locks objects are used all over the place and
% without being documented.

i.e., When the application isn't trivial.

David Holmes

unread,

Nov 12, 1997, 3:00:00 AM11/12/97

Patrick TJ McPhee <pt...@interlog.com> wrote in article

<64bagj$md5$1...@news.interlog.com>...

> Well...what you've done here is to implement a routine which just waits.
> Obviously, if the routine just waits, the caller needs to know that's
what
> it does. Suppose you were to take the next step and write a routine which
> does something, and along the way it needs to wait.

The example captures the core logic. What it does apart from the locking
and waiting/signalling isn't relevant to the point being illustrated - why
Java's synchronisation semantics are desirable.

> If getval does a wait, and `a mutex' is released, then the calculations
> performed in setval will be wrong,

I'll ignore the fact that we don't know what getVal() is waiting for and
assume that its something independent of the use of setVal(). In which case
yes the calculations will be wrong. If you are going to invoke a method
that releases *your* lock then you need to know that so you can design
accordingly. The fact that setVal() uses getVal() and that getVal() does a
wait() implies setVal() needs to know about the wait(). If setVal() is
written in the same class as getVal() then you *will* know about the wait.
If setVal() is in a derived class then you'll need knowledge of, and access
to the locking policy of the base class - otherwise you can't end up using
the same mutex and thus the problem doesn't arise.

The described situation is a programming error and in my view one that is
fairly obvious. The only time it is not immediately obvious is if the
method that does the wait is in another class. But in that case the lock
sharing has to be set-up and thus the locking/waiting semantics of the
method have to be known. I can't see how anyone can claim that use of
wait() in methods involving the lock of the caller is, or should be,
transparent to the caller!

> Maybe you won't notice it until you get past your live tests and
implement the
> solution across a world-wide network. You don't know, and you can't know
--
> this is why it's wrong and dangerous.

What you have described is an extreme situation (unless you commonly employ
ad-hoc waiting using a common lock and without documenting this fact). I
agree that if this error gets through to the 'real world' then its a
problem. But I don't believe that this is a common or likely scenario.
There are far more likely problems to bite you in concurrent programming
than the scenario outlined.

The Java waiting mechanism has exactly the same contract as a pthreads
condition variable: when you do a wait() the associated lock/mutex is
released - end of story. You can't share locks by accident - only by
design.

Consider this slight variation on the scenario. The methods getVal() and
setVal() both require to be passed a mutex used for locking. Now these
methods know that they are using a mutex that 'belongs' to someone else -
thus they don't know whether or not the mutex is locked when its passed to
them, only that it must be locked before they can proceed. So instead of do
a lock() they do a try_lock(). Now there is no potential for deadlock. When
getVal() does the wait() setVal() is exposed to corruption just as before,
except this time there are no recursive locks or unrolling involved. Does
that mean that try_lock() is "wrong and dangerous" because by using it we
were able to avoid the situation where the error would have been detected
through deadlock rather than corruption?

> % I can see how the problem with unrolling is viewed but in my view it
only
> % comes into play when common locks objects are used all over the place
and
> % without being documented.
>
> i.e., When the application isn't trivial.

So you're implying that non-trivial applications aren't or shouldn't be
documented? ;-) Surely not.

At the end its a matter of degree. You and Dave think its a serious problem
(others may agree), while I think it is not a serious problem that is
likely to arise in a reasonable situation.

David

David Holmes

unread,

Nov 12, 1997, 3:00:00 AM11/12/97

This is a correction and a question.

David Holmes <dho...@mri.mq.edu.au> wrote in article
<01bcefb0$82c6a3f0$1bf56f89@dholmes>...

> Consider this slight variation on the scenario. The methods getVal() and
> setVal() both require to be passed a mutex used for locking. Now these
> methods know that they are using a mutex that 'belongs' to someone else -
> thus they don't know whether or not the mutex is locked when its passed
to
> them, only that it must be locked before they can proceed. So instead of
do
> a lock() they do a try_lock(). Now there is no potential for deadlock.
When
> getVal() does the wait() setVal() is exposed to corruption just as
before,
> except this time there are no recursive locks or unrolling involved. Does
> that mean that try_lock() is "wrong and dangerous" because by using it we
> were able to avoid the situation where the error would have been detected
> through deadlock rather than corruption?

Sorry I was thinking that try_lock() allows you to detect whether or not
the calling thread was already the owner of the lock, but it doesn't.

Can a thread detect that it already owns a mutex? The Kleiman, Shah and
Smaalders book (sorry Dave I'm still waiting for yours to come in stock -
they tell us December :( ) says that " if the current owner of the mutex
tries to relock the mutex; it will result in deadlock" but under errors its
states "pthread_mutex_lock() fails and returns the corresponding value if
any of the following conditions are detected: EDEADLK - the mutex was
already locked and is owned by the calling thread". So does it deadlock or
return EDEADLK?

If you can detect mutex self-ownership then getVal() could use lock() and
check for EDEADLK to see if it was already the owner. But in that case
simply ignoring the return code would provide the effect of a recursive
lock which means that a subsequent wait() would still unlock the mutex and
thus provide exactly the same exposure as the Java mechanism!

If you can't detect mutex ownership (anyone know of any systems where you
can?) then ignore my alternative scenario.

Cheers,
David

Patrick TJ McPhee

unread,

Nov 13, 1997, 3:00:00 AM11/13/97

In article <01bcefb5$a8941f90$1bf56f89@dholmes>,
David Holmes <dho...@mri.mq.edu.au> wrote:

% Can a thread detect that it already owns a mutex? The Kleiman, Shah and
% Smaalders book (sorry Dave I'm still waiting for yours to come in stock -
% they tell us December :( ) says that " if the current owner of the mutex
% tries to relock the mutex; it will result in deadlock" but under errors its
% states "pthread_mutex_lock() fails and returns the corresponding value if
% any of the following conditions are detected: EDEADLK - the mutex was
% already locked and is owned by the calling thread". So does it deadlock or
% return EDEADLK?

My experience :) is that it deadlocks. I think that this was not the case
throughout the lifetime of the standard.

% If you can detect mutex self-ownership then getVal() could use lock() and
% check for EDEADLK to see if it was already the owner. But in that case
% simply ignoring the return code would provide the effect of a recursive
% lock which means that a subsequent wait() would still unlock the mutex and
% thus provide exactly the same exposure as the Java mechanism!

I would argue that this is a programming error. So, trylock() isn't
wrong, because it has the useful side-effect of not blocking, but
proceding as though it had succeeded when it actually failed would be
wrong, and possibly dangerous.

% If you can't detect mutex ownership (anyone know of any systems where you
% can?) then ignore my alternative scenario.

You can always create a global variable under the protection of another
mutex. This would require either a wrapper for the lock function if you
do locking in a lot of places.

Dave Butenhof

unread,

Nov 13, 1997, 3:00:00 AM11/13/97

David Holmes wrote:

> Dave Butenhof <bute...@zko.dec.com> wrote in article

> <3468801C...@zko.dec.com>...

> > That is, unless there's a general rule that "thou shalt not call with
> broken
> > invariants".
>

> You make it sound like the converse is the general rule: "thou shalt always
> call with broken invariants". I don't think its a black and white issue. In
> some cases you will need to do something when invariants are broken and in
> other cases you wont. If you do need to do something when invariants are
> broken then you need to ensure that things can work correctly.

The only way to "ensure that things can work correctly" is to avoid calling
anything that can possibly result in a wait on a property of an object with a
broken invariant. Or, conversely, to avoid ever waiting on a property of an
object with a broken invariant.

The real point, and the severe danger of the Java design, is that this is an
IMPLICIT requirement placed on each Java programmer, with no documentation or
support. You have to know that your invariant is broken ("what's an
invariant?"), and you have to know that what you're calling may or may not
ever result in a wait on that object (you have to know a lot about the object
to be sure of that -- and, especially, that it can never change in the
future).

The latter part is the real danger. That you know, either from documentation
or inspection, that your call (say, to a superclass method) cannot wait. You
make the call with a broken invariant because it's convenient -- or just
because you don't know any better. It works. Later, someone (maybe you) change
the implementation of the superclass (it's a clean OO, neatly encapsulated,
data-hidden design, right? It's fine to rip out the guts and do it over!).
Now, suddenly, the application doesn't work. Why? Does the Java compiler tell
you that you called this changed method with a broken invariant? What tool,
what methodology, do you apply to detect the root problem? (Hark, is that the
tinkle of electronic Internet laughter in the distance?)

If you can't plan to fix it, don't do it. You CANNOT plan to fix this bug. No
matter how good you think that you, or your successor, might be, forget about
it. Finding the problem will be a matter of lots of hard work and perspiration
-- and even then you'll need a healthy dose of plain old luck. The first rule
is to design for maintainability. This construct is NOT maintainable. Not even
remotely.

> > For example, what if your derived class was more like:
> >
> > class LoggingBase extends Base {
> > int predicate;
> > LogFile log = new LogFile("\temp\log.txt");
> > synchronized void waitFor(){
> > super.waitFor();
> > predicate = 1;
> > log.append(Thread.currentThread() + " completed wait");
> > }
> > }
> >

> > This doesn't invalidate the call to
> > super.waitFor(), because the invariant doesn't exist in the superclass.
> In
> > this case, of course, it's probably fairly obvious to most programmers
> that
> > "waitFor" might do a wait, and if they know enough about Java, they'd
> know
> > that the wait will unroll their mutex and expose the broken LoggingBase
> object
> > invariant to other potential users of the object.
>

> Firstly its not just obvious that waitFor() will do a wait() it would be
> explicitly documented as part of the synchronisation semantics of this
> object. Second we are dealing with a derived class which has to have
> additional information on the synchronisation policy and mechanisms used by
> the base class. In this case the derived has to know that the base class
> uses 'this' as the synchronisation object. So there's nothing accidental
> about knowing that waitFor() will do a wait. Thirdly if they have bothered
> to learn anything about Java's support for synchronisation they will know
> that that the wait() releases the lock on the derived object. You seem to
> imply that this is privileged information that programmers don't have
> access to.

Sure, the example's a trivial and obvious pathological case. So what? That
just means the real problem will be much more subtle.

If you're really talking about clean OO, the derived class shouldn't need to
know a damn thing about the IMPLEMENTATION of the superclass methods. It
should know about the INTERFACE to the (public) methods and properties.

Sure, I took your example of "waitFor", which is clearly a wait. OK, bad
example. I was too lazy to change it, thinking that it shouldn't be necessary
to help someone understand the point. The "waitFor" function is NOT documented
to wait on "this", because that's an incidental aspect of its implementation,
not part of its interface. It just happens to need some resource, say from a
server somewhere on the net, that may, sometimes, result in a requirement that
it wait. Do you really need to nitpick this any further?

> The example as you have modified it is a pathological case - and such
> things can always be constructed. What is the predicate supposed to
> represent and how do other methods in this derived object interact with the
> predicate? The programmer has chosen to wait when the predicate is not
> valid - the programmer must ensure that this is the appropriate thing to
> do.

No. THIS programmer has chosen to make a call while the predicate was invalid.
THAT programmer has chosen to make the call perform a wait. The
implementations are independent and opaque to each other. They don't match.
There are no tools to tell either of them what the problem is. It just "don't
work". At least, it doesn't work SOMETIMES -- when the server is under load so
you happen to have multiple threads involved in the object's broken
invariants.

> Do I expect that programmers will always do this? No, programmers make
> mistakes.
> Do I think many people will make this mistake? No, not in general. People
> are far more likely to construct nested monitors and write code that misses
> notifications than to expose a broken object this way. However it is just
> as easy for people to break invariants in a base method then do a wait()
> and have the broken object exposed without even considering recursive locks
> or wait-unrolling.

If you break yourself, you blew it. Sorry, Charlie. That's not what we're
talking about here. The design of the language broke you -- transparently and
beyond your control, because it requires two separate methods, which SHOULDN'T
need to know anything about each other beyond the interface, to have detailed
knowledge of each others' IMPLEMENATION to function correctly, and for the
programmer to make proper use of that information. That means neither
implementation can be changed independently. That's just not OO.

I think I see that the root of the problem is that Java tried to integrate
threads into the language, but only went part of the way. The lock is
integrated -- but the shared data and signalling (of changes in the data) are
still expressed by explicit application code, above the level of language
rules and knowledge. Java should have gone the extra step to true monitors,
where the language system controls the data and signalling of changes to the
data. It would know when invariants were broken, it would know when a wait
might occur, and it could do the right thing. Having the language control the
lock while the application controls the data and signalling is a mistake. It's
an inherently unstable half-way point.

Dave Butenhof

unread,

Nov 13, 1997, 3:00:00 AM11/13/97

David Holmes wrote:

> This is a correction and a question.
>
> David Holmes <dho...@mri.mq.edu.au> wrote in article
> <01bcefb0$82c6a3f0$1bf56f89@dholmes>...
> > Consider this slight variation on the scenario. The methods getVal() and
> > setVal() both require to be passed a mutex used for locking. Now these
> > methods know that they are using a mutex that 'belongs' to someone else -
> > thus they don't know whether or not the mutex is locked when its passed
> to
> > them, only that it must be locked before they can proceed. So instead of
> do
> > a lock() they do a try_lock(). Now there is no potential for deadlock.
> When
> > getVal() does the wait() setVal() is exposed to corruption just as
> before,
> > except this time there are no recursive locks or unrolling involved. Does
> > that mean that try_lock() is "wrong and dangerous" because by using it we
> > were able to avoid the situation where the error would have been detected
> > through deadlock rather than corruption?
>
> Sorry I was thinking that try_lock() allows you to detect whether or not
> the calling thread was already the owner of the lock, but it doesn't.
>

> Can a thread detect that it already owns a mutex? The Kleiman, Shah and

> Smaalders book (sorry Dave I'm still waiting for yours to come in stock -

> they tell us December :( ) says that " if the current owner of the mutex

> tries to relock the mutex; it will result in deadlock" but under errors its

> states "pthread_mutex_lock() fails and returns the corresponding value if

> any of the following conditions are detected: EDEADLK - the mutex was

> already locked and is owned by the calling thread". So does it deadlock or

> return EDEADLK?

WHO says "December"? That's absurd. Is this Barnes & Noble? I've heard a
number of complaints about them being unable (or unwilling) to keep it in
stock. Try Quantum, or another real computer bookstore. Or use your computer
as a bookstore and go to Amazon or BookPool -- they're usually shipping within
a day or two.

Anyway... the confusion about pthread_mutex_lock is that EDEADLK, like many
POSIX error returns, is an optional error. If you read the POSIX standard,
this is very clear (once you know the standardese, anyway). Errors are either
"if occurs", or "if detected". The latter, like this EDEADLK, are "if
detected". IF the implementation you're using can and chooses to detect the
error, it must report that error using EDEADLK. But it needn't detect the
error -- and, if it doesn't, the attempt will deadlock. In other words, a
"conforming application" must accept either behavior.

pthread_mutex_trylock, on the other hand, always returns EBUSY if the
mutex "was already locked". This terminology applies to the standard POSIX
mutex type. It's "warped" a little by the addition of recursive mutexes in
XSH5 (UNIX98 or Single UNIX Specification, Version 2). A pthread_mutex_trylock
on a recursive mutex will succeed if the calling thread already owns the
mutex, because it's safe and legal to relock that mutex. Thus, for recursive
mutexes, it really means if the mutex "cannot be locked by the current thread
without blocking".

> If you can detect mutex self-ownership then getVal() could use lock() and

> check for EDEADLK to see if it was already the owner. But in that case

> simply ignoring the return code would provide the effect of a recursive

> lock which means that a subsequent wait() would still unlock the mutex and

> thus provide exactly the same exposure as the Java mechanism!
>

> If you can't detect mutex ownership (anyone know of any systems where you

> can?) then ignore my alternative scenario.

Neither POSIX nor XSH5 provide any way to inquire about the current ownership
of a mutex. You can tell whether YOU own it, by locking it -- either by
pthread_mutex_lock or pthread_mutex_trylock. Once either returns successfully,
you know the owner (you). If you don't own the mutex, or if
pthread_mutex_trylock returns with EBUSY, you cannot tell who owns it -- or
indeed if anyone does. (Only that, at the time you called
pthread_mutex_trylock, it had been owned by someone else.) You can't
(portably) depend on an EDEADLK from pthread_mutex_lock, either. Of course, if
you restrict yourselves to implementations that always detect self-deadlock,
that's fine. Many implementations won't do that with "default" mutexes,
because keeping track of ownership is usually more expensive -- and the
default mutex type is designed to be as fast as possible.

David Holmes

unread,

Nov 13, 1997, 3:00:00 AM11/13/97

Patrick TJ McPhee <pt...@interlog.com> wrote in article

<64e005$dcj$1...@news.interlog.com>...

> In article <01bcefb5$a8941f90$1bf56f89@dholmes>,
> David Holmes <dho...@mri.mq.edu.au> wrote:
> I would argue that this is a programming error. So, trylock() isn't
> wrong, because it has the useful side-effect of not blocking, but
> proceding as though it had succeeded when it actually failed would be
> wrong, and possibly dangerous.

I would also say its a programming error. But I maintain that calling a
method that does a wait() on your lock while your invariants are broken is
also a programming error. The mechanisms permit the errors to be made - so
if one mechanism is "wrong and dangerous" shouldn't that also apply to the
other? ;-)

David

David Holmes

unread,

Nov 13, 1997, 3:00:00 AM11/13/97

Dave Butenhof <bute...@zko.dec.com> wrote in article

<346B00F0...@zko.dec.com>...

> Anyway... the confusion about pthread_mutex_lock is that EDEADLK, like
many
> POSIX error returns, is an optional error. If you read the POSIX
standard,
> this is very clear (once you know the standardese, anyway). Errors are
either
> "if occurs", or "if detected". The latter, like this EDEADLK, are "if
> detected". IF the implementation you're using can and chooses to detect
the
> error, it must report that error using EDEADLK. But it needn't detect the
> error -- and, if it doesn't, the attempt will deadlock. In other words, a
> "conforming application" must accept either behavior.

Interesting. Thanks for the clarification.

David

David Holmes

unread,

Nov 13, 1997, 3:00:00 AM11/13/97

Dave Butenhof <bute...@zko.dec.com> wrote in article

<346B06BA...@zko.dec.com>...

> Sure, I took your example of "waitFor", which is clearly a wait. OK, bad
> example. I was too lazy to change it, thinking that it shouldn't be
necessary
> to help someone understand the point. The "waitFor" function is NOT
documented
> to wait on "this", because that's an incidental aspect of its
implementation,
> not part of its interface. It just happens to need some resource, say
from a
> server somewhere on the net, that may, sometimes, result in a requirement
that
> it wait. Do you really need to nitpick this any further?

Okay enough point and counterpoint as its getting hard to follow the
fragments of argument in this topic and becoming counter productive. Let me
summarise what I see as the problem and the conditions under which it is a
problem.

Your stance is that a method employing a lock has to ensure that all
invariants hold before calling any other method because the other method
may do a wait, while will release the lock thus exposing an inconsistent
object. I hope this is a fair description.

The first and most important point is that the above situation can *only*
happen if the called method uses the same object to wait on that the
original method used for locking. If they are different then there is no
problem. I hope you agree.

Your stance is that the caller should not need to know that the called
method does a wait because it is an implementation detail.

This I have to qualify. The use of internal locking, waits and notifies
*may* be an implementation detail. However if the waits/locking/notifies
use the same object as the caller uses for locking then it is *not* an
implementation detail because it is a tight coupling between the two
methods. I would hope you agree that when objects are tightly coupled their
dependencies should be documented. Of course such coupling should be
avoided if possible - in which case we wouldn't have the exposed invariant
problem.

So the problem can only exist if the two methods share locks. A situation
which represents a coupling dependency that should be an important part of
the documentation of such methods/classes. If such interactions are not
documented then its not just a problem for the Java synchronisation
mechanism but any (perhaps all?) mechanisms because you'll never know how
the other method is using your lock.

So ends the summary. Now if you'd like to know more ... read on

Now lets consider the possible relationships between these two methods:

Case 1: They are directly part of the same class

In this case it is highly likely that the same object is used for
locking/waiting and to outside clients such usage may well be considered an
implementation detail. But for the two methods concerned it is a very
important part of how they achieve their goals. The programmer has to know
what one method is doing with respect to the other to get the method to do
anything. This applies to simple functional tasks let alone
synchronisation. If the programmer doesn't know the synchronisation policy
used within the class for whom he/she is writing the method then they are
likely to make a huge range of errors each of which would make the object
totally broken eg. using a different object to synchronise access to the
shared object state.

Case 2: One is a method of the super/sub class of the other

In this case I (and I'm not alone here by any means) maintain that for a
subclass programmer to integrate over-ridden or new methods with the base
class then they *have* to know the synchronisation *policy* used by the
base class *and* they have to have sufficient *access* to the
synchronisation mechanism. If they do not have this then they can not know
that they have correctly implemented the new methods to work in a
concurrent environment. This is one example where ignorance most certainly
is not bliss. Now a well designed class may export this information through
an internal set of methods (interface) for controlling locking and waiting
such that the derived class can't break anything. This would be nice. The
reality is that many simple classes will simply use 'this' as the
locking/waiting object and so will the derived class - of course this must
be documented. Unfortunately even a well documented and accessible base
class policy can be inappropriate for a derived class: these are the
so-called "inheritance anomalies".

Case 3: The two method exist in totally separate classes

In this case there are a number of variations, the two main ones being:

1. The first method controls locking via a private internal lock object. In
which case the only way the second method can get the lock is if the first
method passes it to the second (either directly or via other methods of the
other object). This can hardly be done accidentally. If you pass your lock
to another method in a different class then you need to know what it will
do with that lock. If you don't know then you will at least ensure that
your invariants hold before calling that method. Note that this is not
specific to the Java mechanism; if you pass your mutex to a strange object
it can decide to pass it to a CV and thus unlock it.

2. The first and second objects have been constructed to use lock/waiting
objects that exist externally to them. In which case these objects have
presumably been designed to interact together using the lock sharing
policy. And of course if you are writing one of these objects then you must
know about this policy to use a lock in the first place. Again this is not
specific to Java's mechanisms because your shared mutex may be used with
the CV in the other method.

In either case there must a exist a documented policy that defines the
requirements and responsibilities of each class/method that participates in
the lock sharing.

Given all of this I maintain that for the problem to arise requires a
combination of error and ignorance that will make it very unlikely to be a
real problem.

As for the fragile base class problem mentioned in the other post. Well
that's irrelevant to the current argument because I could just as easily
redefine a base class to use a different synchronisation object which would
also break a derived object. Fragile base classes are the price OO pays for
inheritance.

David

David Holmes

unread,

Nov 18, 1997, 3:00:00 AM11/18/97

Dave Butenhof <bute...@zko.dec.com> wrote in article

<34704FCF...@zko.dec.com>...

> David Holmes wrote:
>
> > Given all of this I maintain that for the problem to arise requires a
> > combination of error and ignorance that will make it very unlikely to
be a
> > real problem.
> >

> You're dreaming.

And you're having nightmares. :)

> With one thing I will agree: There's no point to continuing this. We're
> not making any progress, and I suspect all the points have been made.

Agreed. Hopefully its given readers a few points to consider one way or
another. Time will be the judge as to the true seriousness of this apparent
problem.

Cheers,
David

0 new messages