Second, sorry for starting the original conversation and then not
following through. I have many excuses (new baby, and moving across
the state among them), but that is neither here nor there.
Bill seems to be going full-steam ahead with a implementation on the
Google Code site, and concurrency annotations are missing there. So,
enough beating around the bush: I endorse the @GuardedBy, @ThreadSafe,
and @NotThreadSafe annotations proposed by Brian, pretty much in their
current form. I have some comments below.
I'm still uncomfortable with the @Immutable annotation. I understand
the need for such a thing, but my problem with it, as I stated
earlier, is that there are a number of different definitions of
immutable floating around in the community. If we pick the wrong
definition to get behind, it will not be good. My colleague John
Boyland at Universty of Wisconsin-Milwaukee has very strong feelings
on this subject. Perhaps he can yet be drawn into this discussion.
> > @ThreadSafe is not very helpful to implementors of the class because
> > it doesn't say anything about how the thread safety is accomplished,
> > and how it is to be maintained as the class is extended and evolved.
>
> Agreed, but by design. I believe that such information should be done
> by other means, including but by no means limited to @GuardedBy.
> Annotations are a pretty clunky mechanism; the more we try to push them,
> the more unwieldy they get, and the less willing people are to use them.
> "Marker" annotations like @ThreadSafe are probably the sweet spot in
> terms of expressiveness.
>
> Furthermore, I think there is not a sufficient literature of "thread
> safety patterns" to make more detailed annotations (or variants of
> @ThreadSafe, like @ThreadSafe(SelfGuardedMonitor) to be flexible enough
> to cover the majority of code. But I'm willing to be convinced!
>
> My overriding concern is that if we make the annotation set too
> difficult to use, most users will not use it. And I think there's more
> benefit in getting most people to use a simple but weak set of
> annotations than in getting the experts to use a powerful but
> complicated set of annotations.
I agree. It must be stressed, that just because a class contains a
@GuardedBy annotation, it does not imply the class is @ThreadSafe,
because it might not guarantee safety for all the state in the class.
So there is still a continuum of safety. Any class that is not
@ThreadSafe is @NotThreadSafe.
> One place where @ThreadSafe is too weak is that it only applies to
> classes, not instances. But some Lists are thread-safe and some are
> not. It would be nice to be able to say something like
>
> @ThreadSafe Set s = ...
>
> or
>
> @Factory(@ThreadSafe.class)
> public static Set synchronizedSet(Set s) {...}
>
> This has more in common with the "type qualifier" sort of annotation,
> like @NonNull, @Tainted, and friends.
Agreed. This is definitely something interesting worth exploring.
> > (Although I'm
> > not entirely sure what Brian intends by the @GuardedBy("itself")
> > annotation.)
>
> This was one that I think I withdrew from my proposal before the book
> was published, but may have leaked through into the code posted on the
> web. The idea there was an state variable that was an object that is
> guarded by its own intrinsic lock. For example:
>
> @GuardedBy("itself") List list = new ArrayList();
>
> ...
>
> synchronized (list) {
> // do stuff with list
>
> }
>
> This is a pattern that shows up often enough, but @GuardedBy("itself")
> is a bad name for it, because all the other arguments to @GuardedBy are
> valid Java expressions that might appear as the lock of a synchronized
> block.
>
I see. I agree that "itself" is not the greatest name to use here.
Also, this is something that needs to be used with care because the
assumption seems to be that the object being referenced is not of a
class that already has locking rules associated with it. Also, this
implies an ownership relationship, because you don't want the
following situation:
class A {
@GuardedBy("itself") List list = new ArrayList();
...
}
class B {
@GuardedBy("itself") List list = null;
public B(List l) {
list = l;
}
...
}
A a = new A();
B b = new B(a.list);
(I admit this example is highly contrived for brevity, but assume
people here are savvy enough to see what I'm getting at.)
Now that A object referenced by a and the B object referenced by b are
using different locks to protect the same ArrayList object. This is
no good for anyone.
> > locks. Lock preconditions on methods are lesser used, but also quite
> > helpful. We have seen a real need for them in production code. Lock
> > getter methods I have found to be less useful in practice.
>
> It shows up in real code, though arguably it is overused. It tends to
> show up in abstract classes where the base class wants to let the
> subclass choose the locking protocol.
Actually, your indirect technique for declaring lock-returning methods
is more powerful than the technique we have been using in Fluid. Our
technique does not allow deferring the decision of the lock
implementation. That said, I am not 100% comfortable with the fact
that a @GuardedBy annotation on a field is also acting as a
requirement on a method implementation (that is likely to be "far
away" from the field declaration in the code. This is possibly
surprising and, although easily understood by programmers, is likely
to be overlooked a lot. I would be more comfortable with a distinct
annotation for marking methods that return locks. Something like
class C {
private int protectMe;
...
@ReturnsLockFor("protectMe")
protected Object getLock() { ... }
}
Of course, this has the opposite problem: the fact that field
protectMe requires a lock is only annotated on the getLock() method.
Fluid avoids this problem by explicitly annotating the existence of a
named lock on the class, something like
@Lock("MyLock is this protects protectMe")
class C {
private int protectMe;
...
@ReturnsLock("MyLock")
protected Object getLock() { returns this; }
}
But in this approach it is not possible to defer the identity of the
lock reference ("this" in this case). So we would have to allow
something like
@Lock("abstract MyLock protects protectMe")
abstract class C {
private int protectMe;
...
@ReturnsLock("MyLock")
protected abstract Object getLock();
}
@Lock("MyLock is lockField protecs protectMe")
abstract class D extends C {
private final Object lockField = new Object();
// inherit the @ReturnsLock annotation
protected Object getLock() { return lockField; }
}
I don't know if we want to head down this road in complexity. One way
around it (although in reality it just moves the complexity somewhere
else), is to support both @Lock and @GuardedBy, whereby @GuardedBy is
really syntactic sugar to introduce @Lock annotations with implicitly
named locks. For example,
class X {
@GuardedBy("this")
private int f;
}
is really identical to
@Lock("fLock is this protects f")
class X {
private int f;
}
(In this case the naming convention is to append "Lock" to the name of
the field being guarded.)
This is an awful lot of changes just to get around the fact that I
don't like an annotation on a field affecting an unannotated method so
dramatically, so I don't know if it is worth it. But it is something
to think about.
--Aaron
Of course, @Lock is lousy name for an annotation because it conflicts
with java.util.concurrent.locks.Lock. It should be something else,
like @Guards, @StateLock, @Mutex, etc.
--Aaron
If there is more than one useful definition, there is probably more
than one useful annotation. Can you provide examples of classes for
each different definition that are either commonly used or in the
JRE--preferably both? Would any of the definitions not be guaranteed
by the 5 rules of making a class immutable given in "Effective Java"
item #13?
http://safari.oreilly.com/0201310058/ch04lev1sec2
The link elides the actual five rules to use, but I'll suppose
they're
unsurprising. There's some worry about serialization and transient
field,
but I can ignore that for now.
Class annotations of @Immutable are not as ambiguous as @Readonly.
I still don't think @Readonly is ready for prime time (see
http://www.jot.fm/issues/issue_2006_06/article1),
but @Immutable is fine. (Ernst and others muddied the water a bit by
calling their readonly proposal(s) "reference immutability".)
On the other hand, Ernst and his colleagues did do a large case study
and found that the number of truly immutable classes was small.
So it seems @Immutable would not be quite as useful as a class
annotation
as one might believe.
On the other @Immutable would be very useful as a TYPE annotation, and
(assuming JSR 308 is not here yet) as an annotation on parameters,
fields
or methods (return values). Here @Immutable would mean the same thing
as on a class: no state in the referenced object will be changed by
anyone
anywhere anytime (after). (It's orthogonal to "final".)
Allowing @Immutable in these various places makes it possible to
have immutable lists, maps, etc. A little-known trick for lock-less
synchronization
uses volatile fields pointing to immutable objects. This could be
shown thread-safe
by using immutability.
An immutable object can be achieved by having an unliased object,
or by using the constructor for an immutable class. In either case,
one has to avoid leaky constructors, but that can be checked.
"There's some worry about serialization and transient field, but I can
ignore that for now."
Please expand more, when you get the chance.
On Aug 13, 3:12 pm, Aaron Greenhouse <aar...@cs.cmu.edu> wrote:
> I see. I agree that "itself" is not the greatest name to use here.
> Also, this is something that needs to be used with care because the
> assumption seems to be that the object being referenced is not of a
> class that already has locking rules associated with it. Also, this
> implies an ownership relationship, because you don't want the
> following situation:
>
> class A {
> @GuardedBy("itself") List list = new ArrayList();
> ...
> }
>
> class B {
> @GuardedBy("itself") List list = null;
>
> public B(List l) {
> list = l;
> }
> ...
> }
>
> A a = new A();
> B b = new B(a.list);
>
> (I admit this example is highly contrived for brevity, but assume
> people here are savvy enough to see what I'm getting at.)
>
> Now that A object referenced by a and the B object referenced by b are
> using different locks to protect the same ArrayList object. This is
> no good for anyone.
Whoops! John Boyland privately pointed out to me that there is no
ownership requirement here, and of course he is correct. I managed to
confused myself already...
I think the problem I actually meant to point out is this one:
class Inner {
public final Object lock = new Object();
@GuardedBy("lock")
public final int value;
}
class Outer {
@GuardedBy("itself")
public Inner inner;
...
}
Here, the class Outer is imposing a locking policy on instances of
class Inner, but Inner already has its own policy. This is
counterproductive and interferes with guaranteeing mutual exclusion to
the 'value' field.
I suspect this takes a very narrow definition of immutable.
For example:
class X { }
is not immutable because it could be extended:
class Y extends X { public int mutableField; }
But @Immutable, and @ThreadSafe, should not be inherited, since it is
clear that a subclass of a thread-safe class is not necessarily intended
to be thread-safe.
This class is not immutable because of data races:
class StringHolder {
private String s;
StringHolder(String s) { this.s = s; }
public String get() { return s; }
}
Without the s field being final, a StringHolder created in one thread
and not properly published and accessed from another thread could return
null on one invocation of get() and the provided String on another.
This class is also not technically immutable, because it uses a common
technique for lazily computing properties:
class StringHolder {
private final String s;
private int length;
StringHolder(String s) { this.s = s; }
public int getLength() {
if (length == 0)
length = s.length();
return length;
}
}
(In fact, String uses this for hashCode.) But while the class may have
a mutable representation, from the perspective of the client, it has
no mutable state, since there are no actions the client can take that
would cause any of its methods to change their behavior.
I prefer a behavioral definition of immutability: an object is immutable
if no action that the client takes after construction can cause any of
its methods to change their behavior. Under that definition, @Immutable
describer a much broader range of classes.
> > On the other hand, Ernst and his colleagues did do a large case study
> > and found that the number of truly immutable classes was small.
> > So it seems @Immutable would not be quite as useful as a class
> > annotation as one might believe.
>
> I suspect this takes a very narrow definition of immutable.
> ...
Feel free to read the paper rather than speculating. The paper with the
original study is
"A practical type system and language for reference immutability"
by Adrian Birka and Michael D. Ernst.
In Object-Oriented Programming Systems, Languages, and Applications
(OOPSLA 2004), (Vancouver, BC, Canada), October 26-28, 2004, pp. 35-49.
http://pag.csail.mit.edu/~mernst/pubs/ref-immutability-oopsla2004.pdf
though the type system described in that paper has been superseded; see
http://pag.csail.mit.edu/~mernst/pubs/ref-immutability-oopsla2005-abstract.html
and
http://pag.csail.mit.edu/~mernst/pubs/immutability-generics-fse2007-abstract.html
The latter paper contains new case studies that found somewhat more use of
immutability than the OOPSLA 2004 paper did.
> I prefer a behavioral definition of immutability: an object is immutable
> if no action that the client takes after construction can cause any of
> its methods to change their behavior.
All of the referenced papers use this behavioral definition. We call it
immutability of the abstract state, as opposed to immutability of the
concrete state. An example of a difference (which someone on this mailing
list previously requested) is that filling in a field that serves as a
cache should not count as a mutation of the object; more examples appear in
the papers.
Different programmers use different styles, and I suspect that a programmer
who designs classes with immutability in mind, and who is using a tool that
checks the code, can produce programs that use more immutable classes.
We continue to experiment with two type systems, named Javari and IGJ, in
order to learn more about the usability and usefulness of immutability in
practice. You can participate by downloading the type-checkers for these
type systems from the JSR 308 checkers distribution; see
http://pag.csail.mit.edu/jsr308/#Download .
-Mike
On Aug 14, 6:08 pm, Brian Goetz <br...@quiotix.com> wrote:
>
> But @Immutable, and @ThreadSafe, should not be inherited, since it is
> clear that a subclass of a thread-safe class is not necessarily intended
> to be thread-safe.
Doesn't this negate much of the utility of annotating a class as
@ThreadSafe or @Immutable, unless that class is also final? Otherwise
even if class C is @ThreadSafe, and you have a reference of type C,
you still don't know anything about the thread safety of the object
you are pointing to. So, I'm not a fan of this idea. I think
@ThreadSafe and @Immutable should "poison" (so to speak) the branch of
the class hierarchy.
Another example, one that doesn't come up often because most immutable
classes are final: if X is immutable, it is possible for Y to extend X
such that Y is mutable, but it's "X nature" is not, thereby not
violating the substitution principle. This is a weaker requirement than
requiring any class that extends an immutable/thread-safe class to be
immutable/thread-safe itself.
In short, inheriting these characterstics scares me.
Now, @ThreadSafe seems kind of squirrelly in terms of semantics--we
really should pin these down (has anyone tried?) But at minimum, I
imagine it should mean that if I perform operations on the object from
multiple threads, the result will be the same as doing the same
operations from a single thread in some order. That's a contract,
therefore @ThreadSafe should be inherited.
As a pragmatic matter, Brian is right, sometimes it may be useful to
inherit from a @ThreadSafe class in a way that is not @ThreadSafe,
making sure that the subclass is only used in a non-concurrent
context. But we should recognize that this is a fundamentally
dangerous thing, and that the entire application becomes fragile. If
we were to introduce threads anywhere in the application, we'd have to
check carefully to ensure that the subclass is not used in a
multithreaded way. This analysis would be very tricky, because the
subclass might be hidden behind variables of the (@ThreadSafe)
superclass type.
Imagine we had a static or dynamic analysis tool that could tell us
whether a class was "really" thread-safe. We'd clearly want that tool
to treat @ThreadSafe as inherited, because we'd want to know if we
were doing something dangerous like having a non-@ThreadSafe subclass
of a @ThreadSafe class. If we determine that it doesn't matter in our
context, we can always turn off the warning.
I think the "inheritedness" of @ThreadSafe should reflect OO
development principles like substitutability and OO best practices,
rather than questionable practices that are sometimes expedient but
dangerous.
Jonathan Aldrich
- David
I think you're overstating the implications of the substitution principle.
Now, I'm not recommending people code this way, but...
Let's say A is immutable:
@Immutable public class A {
private final int x = 3;
public get() { return x; }
}
Now, B extends A:
class B extends A {
private int counter;
public get() {
++counter;
return super.get();
}
public int getCount() { return counter; }
}
When a B is used as an A, its behavior is indistinguishable from an A.
So the substitution principle holds -- it's "A nature" is still immutable.
> Now, @ThreadSafe seems kind of squirrelly in terms of semantics--we
> really should pin these down (has anyone tried?) But at minimum, I
> imagine it should mean that if I perform operations on the object from
> multiple threads, the result will be the same as doing the same
> operations from a single thread in some order. That's a contract,
> therefore @ThreadSafe should be inherited.
Again, a strict reading of the substitution principle would apply only
to the methods of the base class.
I bring these arguments up because it took us a good long time to
understand the limitations of inheritance in the first place, so adding
more inheritance to the language might be making the same mistake twice.
On Aug 15, 11:43 am, Brian Goetz <br...@quiotix.com> wrote:
> For immutability, it is commonly recommended that they should be final
> for this reason. And it is often difficult to extend a thread-safe
> class while preserving thread-safety -- that requires the base class to
> be designed for extension, which includes publishing, and committing to,
> its synchronization and notification protocols, as well as exposing any
> state required to do so.
But isn't the point of this proposal to give programmers the means to
publish these protocols?
> (For example, its hard to extend
> ConcurrentHashMap because the locks are private, not protected, but CHM
> is not final.) I don't have specific examples, but I worry that
> requiring thread-safety to be inherited in this situation might cause
> problems. Is it not imaginable that you might want to extend a class
> that happens to be thread-safe, but thread-safety is not necessary for
> your use?
I would suggest then that ConcurrentHashMap should not be extended; it
probably should have been made final. The fact is that not all
classes are meant to be or should be subclassed. There's a whole
shelf full of literature from the past 30 years about how hard it is
to extend abstract data types and classes when they support concurrent
access. (See in particular the literature on "Concurrent Object-
Oriented Programming" from the late '80s and early '90s, and the so-
called "inheritance anomaly.") We aren't going to solve that problem
here.
I have often wanted to extend a pre-existing class, but have been
prevented from doing so for one reason or another. This isn't unique
to concurrency concerns. Classes need to be designed for
subclassing. In my opinion, Java made a mistake by making classes non-
final by default.
> Another example, one that doesn't come up often because most immutable
> classes are final: if X is immutable, it is possible for Y to extend X
> such that Y is mutable, but it's "X nature" is not, thereby not
> violating the substitution principle. This is a weaker requirement than
> requiring any class that extends an immutable/thread-safe class to be
> immutable/thread-safe itself.
I don't find this argument convincing. Yes, I have often created toy
examples that fit this scenario, but I have never encountered such a
situation in the wild. If you have, please share. But, I still
question the wisdom of doing such a thing.
I don't think we should try to accommodate every programming practice
that is out there. If a practice is bad and potentially dangerous,
the programmer should be discouraged from doing it. Making
assurance tools hostile to that practice is one way of discouraging
it. I realize that if someone really wants to shoot themselves in the
foot we can never stop them from doing it, but we shouldn't encourage
them to do it.
So what to do? I think we should consider the effect of our choices
on programmers. Most programmers, unfortunately, won't have a thread-
safety verification tool in their hands immediately (although we can
all hope that such things arrive soon). I'm concerned that if we give
@ThreadSafe and @Immutable the more flexible semantics, and make the
annotation not inherited, that many people may assume that since it's
not inherited, they can make non-thread-safe overriding methods.
Ouch!
I would never make this argument if I thought the flexibility of the
non-inherited solution was essential, because I don't believe
languages should hobble experts for the sake of novices. But we don't
have any truly compelling examples of where the inherited semantics
would cause problems, only toy examples. Furthermore, I hear
convincing arguments (e.g. from Aaron) that any such use would be bad
style and likely to cause bugs anyway, and I don't hear anyone
disagreeing. Does anyone take the opposing view?
So I'm leaning towards Aaron's view that the simpler semantics for
@ThreadSafe and @Immutable--with annotations that are inherited and
with any new methods in subclasses required to also obey the
constraints--is likely to discourage errors, encourage good style, and
cost little or no useful expressiveness.
Jonathan
I'm a little chary of the idea that these things will be always
inherited, without exception. The substitution principle that is
mentioned above is an important one for proper design, but not one
that is actually enforced by the language. Imagine an annotation that
reads "@EnforcesTheContractForEquality", which you tag your class
with, and means that you implemented the equals() method correctly.
You could easily tag the Object class with this annotation, but it
would be wrong to say that its subclasses all implement equals()
correctly. It's a design problem, but one that is pervasive in Java
(and most OO languages). Similar mis-design can be seen in the whole
"Stack extends Vector" debacle.
The problem is compounded for thread safety, because Java has
occasionally encouraged its users to mix thread safety properties in
with the logic of a class (like Vector and Hashtable) when it is
really an orthogonal constraint. So you get users extending a class
without even really caring about the thread safety property. Alas,
this happens all too often in practice! Verification of this is as
easy as doing a Google Code Search for "extends Vector".
Another problem, previously mentioned: someone creating a subclass
won't necessarily agree with the definition of "ThreadSafe" or
"Immutable" used by the designer of the superclass.
It would be terrific if it could be inherited, but my feeling is that
rather than discouraging errors, it would simply make people ignore
the annotation.
Having said that...
It is fairly clear (to me, at any rate; I've been known to be wrong
before) that without a rigorous definition for "@Immutable" or
"@ThreadSafe", there is no way to ensure these properties
automatically, and therefore you can never guarantee whether or not
subclasses will adhere to them.
An interesting alternative would be, in some cases, to make a
rigorous, checkable definition of ThreadSafe that encompasses a subset
of what you actually mean by thread safety (the dumbest one I can
think of would be "all method invocations are synchronized on the this
object", but I can think of better and more subtle ones), and force
the annotations to be inherited if you could prove the property, and
not inherited if you can't. Then, when subclasses are written, they
can be checked to the same standard as the superclass, and if they
violate the contract, then the verifier gives them the boot.
Jeremy
I think we can come to a pretty simple clean definition of these
properties:
(1) ThreadSafe means all methods of the class
acquire necessary locks before accessing mutable state,
except through parameters. Volatile fields don't need locks to
access.
(2) Immutable means methods of the class access NO mutable
state except through parameters. (Obviously, this implies
ThreadSafe.)
In this formulation, these class annotations are a way of broadcasting
default annotations to methods. In other words, we can easily
(without requiring loopholes) permit subclasses to define new methods
that don't follow these guidelines, but not to override methods unless
they still follow the contract.
The reason for the exception of parameters is pretty clear: a copyInto
method for an immutable list makes perfect sense, as does a appendCopy
method that takes a potentially mutable list. Similarly for
threadsafe. The caller must assure that the access to the parameter
state is properly protected.
I can guess people will immediately want exceptions to the two rules I
posted above: "can't our immutable class keep usage statistics? ..."
and there is the compelling argument for caching the hashcode, so we
may need a loophole annotation. I'm a little worried about these
because technically they do violate synchronization rules. But an
annotated loophole is fine as long as programmers realize that it
means safety guarantees are now the burden of the programmer, not the
annotation checking tool. I can imagine a development process that
subjects such loopholes to more stringent review.
John
I think this is at the heart of my concerns as well. These properties
_should_ be inherited, but theory and practice, well, they're very
similar in theory...
> An interesting alternative would be, in some cases, to make a
> rigorous, checkable definition of ThreadSafe that encompasses a subset
> of what you actually mean by thread safety (the dumbest one I can
> think of would be "all method invocations are synchronized on the this
> object", but I can think of better and more subtle ones), and force
> the annotations to be inherited if you could prove the property, and
> not inherited if you can't. Then, when subclasses are written, they
> can be checked to the same standard as the superclass, and if they
> violate the contract, then the verifier gives them the boot.
Data-race-free seems like it should be checkable, but that's a weaker
notion of thread-safety than I'd like to promote. Data-race-free still
allows room for atomicity failures, like in the following class:
public BogusVectorExtension<T> extends Vector<T> {
public void putIfAbsent(T t) {
if (!contains(t))
add(t);
}
}
While the above is data-race free (all state is properly guarded by a
lock), you could still end up with two copies of the specified element
if two threads race to putIfAbsent the same element.
> > An interesting alternative would be, in some cases, to make a
> > rigorous, checkable definition of ThreadSafe that encompasses a subset
> > of what you actually mean by thread safety (the dumbest one I can
> > think of would be "all method invocations are synchronized on the this
> > object", but I can think of better and more subtle ones), and force
> > the annotations to be inherited if you could prove the property, and
> > not inherited if you can't. Then, when subclasses are written, they
> > can be checked to the same standard as the superclass, and if they
> > violate the contract, then the verifier gives them the boot.
>
> Data-race-free seems like it should be checkable, but that's a weaker
> notion of thread-safety than I'd like to promote. Data-race-free still
> allows room for atomicity failures, like in the following class:
>
> public BogusVectorExtension<T> extends Vector<T> {
> public void putIfAbsent(T t) {
> if (!contains(t))
> add(t);
> }
> }
>
> While the above is data-race free (all state is properly guarded by a
> lock), you could still end up with two copies of the specified element
> if two threads race to putIfAbsent the same element.
The weakenings I had in mind were something like:
a) every mutable field is tagged with the locks that protect it, and
there is a per-class ordering over those locks
b) each method is tagged with the locks that it obtains. A method
that calls into such a method absorbs these tags.
c) your method holds the appropriate locks from the first time it
accesses a field / method which is tagged with it to the last time it
accesses a field / method that is tagged with it.
d) locks are never acquired in an order that violates the per-class order.
The "everything is synchronized" approach meets this definition
trivially. Volatiles are basically treated as a special case where if
you only access them once-per-method, then you don't have to acquire
the lock.
That's without having thought about these rules terribly carefully --
I suspect that there are things wrong with them.
Jeremy
It isn't just a question of volatiles not needing locks, because
sometimes that isn't true:
volatile int v;
public void incrementMyVolatile() {
v++;
}
... is not thread safe.
In ConcurrentHashMap, volatiles are accessed both with and without
locks, where appropriate. I very much doubt that the state of the art
definitions could tell the difference between CHM and a thread unsafe
class without declaring too many things thread safe.
The loophole annotation isn't a bad idea (basically, this makes the
inheritance opt-out instead of what I am suggesting, which is opt-in),
but you do still lose the Liskov substitution property that worries
many.
As I said after the "Having said that..." in my message, I think that
we can make use of such definitions, and enforce where possible.
Perhaps a @ThreadSafe and a @RigorouslyInheritableThreadSafe would
make sense (with a better name for the second).
(For immutable: you can say that you forbid access to mutable objects,
but in a language like Java, it is often the case that people don't
know what is going on in their libraries, so you have to be a little
careful about that definition. I would say that an object is
immutable if it only exposes things that are instances of classes that
are tagged immutable, or are final scalar values, and are not arrays).
Jeremy
On Aug 17, 12:05 pm, "Jeremy Manson" <jeremy.man...@gmail.com> wrote:
> It isn't just a question of volatiles not needing locks, because
> sometimes that isn't true <...>
Yes. I completely neglected the concept of atomicity which is beyond
the idea of "properly synchronized" in the Java sense. Are there
compelling cases where one only wants the former and not the latter?
If not, I won't object to requiring both proper synchronization and
atomicity for "thread safe."
> (For immutable: you can say that you forbid access to mutable objects,
> but in a language like Java, it is often the case that people don't
> know what is going on in their libraries, so you have to be a little
> careful about that definition. I would say that an object is
> immutable if it only exposes things that are instances of classes that
> are tagged immutable, or are final scalar values, and are not arrays).
I don't think I agree here (I'm not completely sure I understand):
1. As I said before, immutable instances of not universally immutable
classes are useful. In particular, I have found immutable arrays and
immutable lists useful. I know that Java won't (1) prevent an array
from being mutated or (2) requires a wrapper to prevent mutation of
lists. But I don't think the purpose here is building bulletproof
libraries, but rather to define annotations to describe design intent
BEYOND that which Java can help us with. In other words, we should
describe the semantics of the annotations and let various bug finders
etc decide whether they want to (a) verify conservative patterns or
(b) detect flagrant violations, or both. There's no need to tie the
semantics to a particular set of checkable idioms.
(I do find such idioms useful, and for this JSR, we should provide
some examples of things that fit the semantics, and even some
SUFFICIENT rules for ensuring correctness, but I think it's a mistake
to REQUIRE a particular decidable system.)
2. An immutable class may return mutable state from parameters or from
the environment assuming the state is protected in other ways. For
the former case, consider the copyInto method I mentioned before. For
the latter, consider an immutable map to self-protected (threadsafe)
objects. Essentially, an immutable class (or an immutable instance)
talks about immutability of the parts it *owns*, but not about other
parts, and the method can only access state it owns (which must be
immutable) or is passed to it. By using the rule I posted earlier, we
can avoid specifying what state is owned, and just talk about state
accessed (implicitly the owned state). We also avoid having to talk
abvout how other state is protected.
John
I don't think there's much of a choice. Most useful definitions of
thread safety involve some form of "works the same when invoked from
multiple threads as when invoked from one" (modulo classes like
Exchanger that are only useful with multiple threads.) That implies
both the absence of data races and the absence of atomicity anomalies like:
class Foo {
private volatile v;
public int increment() { return ++v; }
}
For the purposes of this JSR, this is problematic because, while
data-race-free might be practical to check automatically, the absence of
atomicity failures is impossible to verify without the programmer
specifying atomicity boundaries.
> For the purposes of this JSR, this is problematic because, while
> data-race-free might be practical to check automatically, the absence of
> atomicity failures is impossible to verify without the programmer
> specifying atomicity boundaries.
>
Right. How about just calling the annotation @DataRaceFree?
While it might be less familiar, it vastly reduces arguments
and misunderstandings about what it means.
A similar scale-down of intent might apply to @Immutable.
-Doug
I was going to propose:
> I think if you want @ThreadSafe to be useful, I think it should mean:
>
> If a class C is marked as @ThreadSafe, it means that if several
> threads invoke methods on an instance X of C, then the overall
> behavior is the same as some sequential ordering of the method
> invocations consist with the original of the invocations in their
> original threads. Note that this property must hold even if the
> threads have no synchronization external to the C class, and are
> making what could be simultaneous or overlapping calls.
But that doesn't work for classes such as an Exchanger. Does anyone
have a extension of sequential consistency that extends to such
methods (chording, maybe?).
Bill
The problem may be the vague label "@ThreadSafe". I think we need
more precise annotations. (We could use "ThreadSafe" but only if we
can agree on a precise semantics.)
Doug proposed @DataRaceFree. I like that because it is (more)
precise. It would fit what I propose for ThreadSafe above: all
methods of the class properly synchronize before accessing mutable
state except through parameters. (This ignores deadlock also.)
Another possible name is @ProperlySynchronized.
(BTW: I don't think we need a weakened version of Immutable that
reflects the difference between atomic and properly synchronized
because immutable classes shouldn't be mucking about in mutable state
at all. In the past people have wanted Immutable classes to do dirty
things under the covers, but I think the best thing here is force
people to use explicit loopholes.)
We could have @Sequentialized (need a better name) for classes whose
methods all enforce sequential consistency: ignoring state passed in
by parameter, the effects of the method call reflects some sequential
ordering of method calls. I think this will be an extremely difficult
annotation to verify, but at least it has a well-defined semantics.
(@Atomic probably should be reserved for classes whose methods, if
they throw a runtime or checked exception, undo any side-effects.
That may be too much for us now.)
The Exchanger class would be @ProperlySynchronized (one hopes) but not
@Sequentialized.
People will notice my definitions except state accessed through
parameters. This state should be protected using techniques described
by design intent on the method (effects, broadly construed). This
brings in ownership, uniqueness and readonly. I have my own opinions
on this, but we probably don't want to start a big discussion on
effects right now.
Again, I think this line of argument is putting the needs of the tool
ahead of the needs of the user. Which I expect to hear in the course of
tool design, of course, but I hope we'll rise above it.
I think the number one goal of the JSR 305 effort should be to produce
annotations a wide range of developers WILL ACTUALLY USE. These
annotations will be useless if no one annotates their code.
@ThreadSafe is a statement of design intent. Many developers think they
understand what thread-safe means; the fact that most of them don't
doesn't prevent them from having more or less the right intent -- that
their code be safe to call from multiple threads. On the other hand,
most developers don't know what data-race-free means. (In fact, it was
Bill who pointed out to me, on reviewing an early draft of my book, that
I didn't quite understand what data-race-free meant, either. Now, I
don't want to make this about me, but if its a concurrency concept I got
wrong initially, I'm guessing others will too.)
Just because we can't prove thread-safety doesn't mean we should force
the programmer to use terms that don't really mean what they intend.
(In reality, I think they won't use it at all. If you don't have an
idea what data-race-free means, you're unlikely to use the annotation.)
Of course, this is a point that is open to debate, so lets debate it,
since it lies at the heart of this and other decisions.
Given a choice between:
- Producing easy-to-use annotations that support imperfect analysis,
but which will raise the quality of _typical_ code, and
- Producing more specific annotations that support perfect analysis,
but which are hard enough to use that that will only raise the quality
of code produced by experts,
my vote is nearly always going to be "screw the experts".
I feel strongly that annotations that are hard to understand, or which
require the mastery of subtle topics to use correctly, will either go
unused, or will be misused. For better or worse, I think @DataRaceFree
may well be over that line.
The members of this group who come from an analysis tools background are
likely to hate this position, because it amounts to saying "yes, I know
we can solve this problem perfectly, but let's not because I'd rather
live in a world where more people can benefit from an imperfect solution."
On Aug 20, 5:03 pm, Brian Goetz <br...@quiotix.com> wrote:
> > I think the number one goal of the JSR 305 effort should be to produce
> > annotations a wide range of developers WILL ACTUALLY USE.
>
> Of course, this is a point that is open to debate, so lets debate it,
> since it lies at the heart of this and other decisions.
>
> Given a choice between:
>
> - Producing easy-to-use annotations that support imperfect analysis,
> but which will raise the quality of _typical_ code, and
>
> - Producing more specific annotations that support perfect analysis,
> but which are hard enough to use that that will only raise the quality
> of code produced by experts,
>
> my vote is nearly always going to be "screw the experts".
I sympathize with what you are saying, but as a tool designer, I am
still tempted to make the argument that there is little point in
annotating design intent that cannot be at least partially checked.
Is it better to annotate design intent than not? Yes. But if you
cannot figure out if the intent fulfilled, you as the programmer can
still do whatever you want. There is plenty of evidence of this with
respect to concurrency.
However, I think we are losing sight of (another) bigger picture: How
much do we really want people to use @ThreadSafe? It would be much
better if they made use of @GuardedBy and @Immutable instead. These
annotations provide much more future guidance about how the class
should be implemented.
> I feel strongly that annotations that are hard to understand, or which
> require the mastery of subtle topics to use correctly, will either go
> unused, or will be misused. For better or worse, I think @DataRaceFree
> may well be over that line.
Point taken. In people's experience, are we likely to wreak havoc if
we provide multiple annotations, some of which are easier to
understand than others? My guess is that the risk is that the ones
the programmers don't understand will simply go unused. Or am I being
naive?
> The members of this group who come from an analysis tools background are
> likely to hate this position, because it amounts to saying "yes, I know
> we can solve this problem perfectly, but let's not because I'd rather
> live in a world where more people can benefit from an imperfect solution."
I'm not coming from the point of saying let's enable a perfect
analysis. I'm simply saying that annotations that are unanalyzable
are not very useful, although I won't say they are useless.
I know this particular message has been followed up on many times, but
I haven't seen anyone make the point I want to make. Both John and
Jeremy have proposed definitions for ThreadSafe that mandate
synchronization, wouldn't this rule out using an Immutable class
implemented using entirely final fields as a ThreadSafe class?
ThreadSafe needs to be defined based on observable behavior not by
implementation details.
All the issues we've discussed can be partially checked -- the
discussion that led to this was more along the lines of "we shouldn't
define annotations that can't be completely checked" (e.g.,
@DataRaceFree instead of @ThreadSafe), which I disagree with 100%.
While tools are important, annotations can have significant
documentation value in and of themselves -- for both the consumer of a
class and for future maintainers -- even if their tool-value is zero.
Of course, tool value is good too! But tools are only one constituent
of annotations, and clients and maintainers are equally, if not more,
important. Annotations can help clients use APIs correctly, they can
help code reviewers spot problems, they can help maintainers do the
right thing.
> Is it better to annotate design intent than not? Yes. But if you
> cannot figure out if the intent fulfilled, you as the programmer can
> still do whatever you want.
We can't catch those mistakes with tools, but we can catch them in code
reviews. That's good too!
> There is plenty of evidence of this with
> respect to concurrency.
No argument.
> However, I think we are losing sight of (another) bigger picture: How
> much do we really want people to use @ThreadSafe? It would be much
> better if they made use of @GuardedBy and @Immutable instead. These
> annotations provide much more future guidance about how the class
> should be implemented.
Agreed.
>> I feel strongly that annotations that are hard to understand, or which
>> require the mastery of subtle topics to use correctly, will either go
>> unused, or will be misused. For better or worse, I think @DataRaceFree
>> may well be over that line.
>
> Point taken. In people's experience, are we likely to wreak havoc if
> we provide multiple annotations, some of which are easier to
> understand than others? My guess is that the risk is that the ones
> the programmers don't understand will simply go unused. Or am I being
> naive?
I wouldn't say naive, but I would say optimistic. I think people's
willingness to use a set of annotations drops pretty dramatically as the
size of the set increases. It's increasing the surface area of an API
-- it raises the bar for using it at all.
All three are useful. The developer can always suppress warnings to
deal with anything but threadsafe, but the usual goal would be to
modify the code in order to convince the tool that the annotation is
correct. It almost doesn't matter how weak the validation powers of
the tool are.
Another option to consider is the value of a working prototype. If
nobody can decide how and when @ThreadSafe should be used with the
JRE, won't be widely adopted outside of it. If the JRE has plenty of
examples of what it means to be @ThreadSafe, it might be widely
adopted.
I have no objection to annotations that express something that's we
(currently) can't check. My objection is coming up with an annotation
without semantics. Undecidability is one thing. Incoherency is
another.
"This code can be safely called from multiple threads" is too vague.
Might as well have an annotation @WorksFine on code that works fine.
On the other hand, if we decide "@ThreadSafe" means what might be
called "sequentializability" by some self-selected group of experts,
that's fine. And then Exchanger wouldn't be @ThreadSafe. If you
don't like that, then maybe that means the word "ThreadSafe" is a bad
choice, and we should choose "@ThreadTransparent" or even
"@Sequentializabile".
Or maybe "@ThreadSafe" means "properly synchronized" in which case
code that relies of benign race conditions wouldn't be "thread safe"
unless there are declared loopholes. Again that's OK.
But if not, then maybe someone here can define "benign race condition"
and then permit them in a broader definition of @ThreadSafe.
(And for people who are afraid that our definition will be too tight:
if you look at the cookbook idioms for synchronization (i.e.
"synchronize all methods") in the popular literature, they are all
MUCH more rigid than any definition we are likely to make.)
But I must concede that any definition is likely to leave out
someone's pet synchronization idiom or permit code that someone else
doesn't like, or both. That's why we are having these discussion on
"Concurrency Annotations" (people, please leave the title as it was
started) so we can come up with a DEFINITION that will
(1) be simple enough to explain to regular programmers
(2) that covers most important GOOD things and disallows most
important BAD things,
(3) that will be simple enough to formalize
(4) that can at least be partially checked (positively and negatively)
I think we are getting close to a definition for @Immutable that is
pretty close to meeting
these goals. @GuardedBy seems pretty well defined already.
@ThreadSafe is a way off.
John
Not at all. Notice my definition of Immutable IMPLIES thread-safe (as
defined there). If you don't access mutable state (as an immutable
object shouldn't be doing) then no synchronization is needed (or
allowed -- accessing an immutable object shouldn't cause deadlock).
> ThreadSafe needs to be defined based on observable behavior not by
> implementation details.
That's a reasonable position, but synchronization is usually
observable on the outside. I'm willing to consider other definitions:
sequentializable is a nice definition that restricts itself to
observable state.
John
+1
I think you are missing the point.
I think it is very important that people use @ThreadSafe when they
are designing a class that is intended to be thread safe.
Not because it allows me to prove other code is correct.
But because marking a class a @ThreadSafe _incurs_ a verification/
specification/correctness obligation. If a class is marked as
@ThreadSafe, then static analysis doesn't have to try to find paths
from application entry points in which methods are invoked on a
shared instance from separate threads. Instead, the static analysis
can assume that it is fair game for multiple threads to bang on an
instance without any external synchronization. If the analysis finds
any mutable fields not protected by some lock, bang, you've found a bug.
One of the biggest problems I've found with trying to report
concurrency errors using static analysis is trying to figure out
which classes and methods are designed to be thread safe. @ThreadSafe
allows tools to know what is intended to be thread safe.
Bill
If your entire class is immutable, you don't have mutable fields, and
therefore don't have to worry about either John or my definition of
ThreadSafe. :)
Jeremy
In the interests of brevity and irrational optimism @WorksFine has a
default value of true. It's usually better to fix the code in
question than add @WorksFine(false).
+1
Dan
--Aaron