What's the best practice for requiring lots of dependencies that
spread across packages? Should an implementation within a package
depend directly on interfaces from other packages? Or should there be
some kind of facade or service (for easier testing within the package
itself) and the impl exists outside and glues in the other packages?
The main reason I'm asking is because now that we've converted to
using Guice & constructor-based injection, it becomes a real PITA with
subclasses and huge dependency lists. Consider an AbstractFoo that
requires a lot of other factories and services to run, and many
different implementations. There's two problems: writing tests
requires passing a lot of different dependencies directly (or setting
them via Guice), and if the dependencies of AbstractFoo ever change,
all the subclasses need their constructors altered.
This leads me towards wanting a single catch-all FooFacade or
FooService that interface that AbstractFoo uses, but it leaves a bad
taste in my mouth... Is it really a good idea to pretend that
AbstractFoo doesn't depend on these services (by indirectly using them
through the FooFacade)? How far should this practice go? If I end up
also needing to use a BarFactory, does a createBar method get added to
FooFacade instead of passing it into AbstractFoo (and requiring all
the subclasses change their constructors)? What about if AbstractFoo
needs a lot of dependencies from within the package? Is it a good
idea to put that into the FooFacade too?
These are the things that keep me awake at night!
Thanks,
Sam
Bob
Dhanji.
Bob
For anybody following this thread and didn't think to look up "why
extends is evil" thinking it was just DJ expressing himself, the
article in question can be found here:
http://www.javaworld.com/javaworld/jw-08-2003/jw-0801-toolbox.html
It gives a few great examples as well.
-Josh
--
"This is crazy! Why are we talking about going to bed with Wilma
Flintstone... She'll never leave Fred and we know it. "
:: Josh 'G-Funk' McDonald
:: 0437 221 380 :: jo...@gfunk007.com
I think we're going to shy away from using setter injection (as
opposed to constructor injection) because that'll require all the
variables to be volatile (since the object is heavily multi-threaded).
And unfortunately we don't have the time right now to redo the object
hierarchy to not use class inheritence... so I think we're just going
to have to live with difficult-to-construct objects for now.
Sam
I'm not a zen master in this, but I believe that logic is a
misunderstanding of the threading issues at hand. The objects will be
created (thus constructor injected) and setter injected fully before you
get them back from the Injector, hence there is no need for the fields
to be volatile. The object isn't as such /published/ before both these
things have happened (unless you yourself do any listener registration
or whatever in the constructor or setters), and the setters may thus be
viewed as an extension of the constructor (only of course the fields may
not be declared final, which is annoying).
I read your logic as basically everything in a java program would have
to be volatile if it uses more than one thread. This isn't true, AFAIK.
As long as a thread accesses an object "for the first time", it will
have to fetch all values from main memory, so to speak. It is only if
two or more threads communicate back-and-forth through some data that
these accesses needs to be synched, or volatile - this since each thread
is free to cache the memory as hard as it wants if this isn't the case.
This is at least my understanding, and it has worked out nice so far..! :-)
Well.. On third thought - if some of these objects gets some other
singleton objects injected, with which they communicate, then these
singleton objects will have to handle synch issues. But not the newly
created objects. And in particular not if the whole application object
graph is constructed in one go, where one doesn't use the Injector as a
generic factory throughout the code (I don't, so at least one people use
it that way!).
If I've gotten this wrong, then PLEASE correct me!
Kind regards,
Endre.
Synchronization is a whole other matter because it deals will
concurrent threads that modify state (as opposed to safe publication).
I have heard views that all fields ought to be either final or
volatile (except in rare cases). I do not subscribe to this view
because it doesn't address any problem as such, in the abstract. While
it is good practice to declare all members final where possible, it
does not *ensure* safe publication as is often wrongly claimed (for
instance, if you allow the "this" reference to escape during
construction you can make no publication guarantees). So think about
your use case and concurrency environment before mandating finals,
volatiles or synchronizations. =)
Dhanji.
I concur. I hadn't even heard about this view..! :)
> While
> it is good practice to declare all members final where possible, it
> does not *ensure* safe publication as is often wrongly claimed (for
> instance, if you allow the "this" reference to escape during
> construction you can make no publication guarantees).
Concur again - why I use final is actually more to get compile time
errors (nice squiggly red lines!) if I fail to set them in the
constructor. It makes the field "guaranteed to be set"-safe, which I
think is more important than the "cannot be set again" effect. That is
why I complain about setter injection, since it obviously isn't possible
to declare a field final then.
(I think everybody interested in these things (which everybody should
be!) should get a copy of "Java Concurrency in Practice" by Brian Goetz
et. al. I thought I had most of the stuff nailed, but there are a quite
a few things to think about. The publication-problem hadn't occurred to
me, which I think is a big point for fixing issue 62
(http://code.google.com/p/google-guice/issues/detail?id=62) - "
Lifecycle support", at least a simple @Init annotation or something.)
Endre.
I was always under the impression that variables set within
constructors are safe to be non-final (because publishing occurs after
the constructor returns), but final is the best option. However,
non-volatile variables set after the constructor returns weren't
guaranteed to be published & visible to other threads. This is
especially a problem in the common case of thread pools, where
although it may be easy to guarantee a Runnable is processed after a
certain point, it isn't easy to guarantee that runnable is processed
in a brand new Thread.
I don't quite understand the distinction that "if it's used after the
injection finishes, you're OK". To me, that seems like an arbitrary
point in time. (And it becomes hazier when non-eager instantiation is
done, or if things are eagerly instantiated singletons and do
initialization code [even worse if that code spawns threads] when
constructed.)
With non-eager instantiation, it's almost certainly the case that
threads will exist prior to the object and its dependency-graph being
constructed, and if those pre-existing threads want to use the new
object, it isn't guaranteed that they'll see the correct values (if
those values were set after the constructor finished). This is based
on my understanding that publishing occurs when the constructor
returns, as opposed the understanding that a thread seeing a new
object for the first time will read all of its variables as if they
were published. That understanding could certainly be wrong.
Sam
Seems like you've gotten that right:
http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.4.5
Check "completely initialized" definition, and "Discussion":
http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.5
So. Seemingly, the reference to an object may be set/published/made
available/visible to other threads before all values touched in the
constructor are committed to "main memory", except for final fields,
which are guaranteed to be set and "flushed" when the constructor
returns and the instance becomes available. Is this a correct
interpretation?
I've now chalked that up as another little thing to be concerned
about in regards to multithreading. I was hoping at some point to become
fully educated about these issues, and always believe that day is today,
but then some new nuance suddenly come popping up! :)
My statement two posts back about a thread accessing an object for the
first time will make it see the values from main memory, is very
misleading, at best - since other threads might have changed these
values plenty in their local cached view, and not flushed the changes
yet unless a happens-before relationship is established between the two
threads, one will possibly get stale data, up to un-initialized data.
It should preferably be up to Guice to establish this specific
happens-before relationship, making sure that the entire object graph is
fully published upon return of the Injector's (Provider's)
getInstance(...) methods. How is this achieved? I tried looking into the
code, but the obvious synchronized on /this/ in the get methods wasn't
there..! (and would that have been enough?)
From the JLS/JMM: "Without correct synchronization, very strange,
confusing and counterintuitive behaviors are possible.".. Right.
Endre.
Strange--I got Tim's email but haven't gotten Endre's email which Tim quoted.
When you call getInstance(), Guice constructs the object graph in the
same thread, so there's no need for synchronization there.
If you pass the object Guice returned to you off to another thread,
it's up to you to do so in a thread safe manner, in which case you
would safely publish it.
Bob
On Nov 16, 11:21 am, Endre Stølsvik <stols...@gmail.com> wrote:Seemingly, the reference to an object may be set/published/made available/visible to other threads before all values touched in the constructor are committed to "main memory", except for final fields, which are guaranteed to be set and "flushed" when the constructor returns and the instance becomes available. Is this a correct interpretation?Yes, though you have to be careful with terms like "main memory" and "flushed" -- your use of quotes suggests that you already are, but other readers might not be -- those terms are not part of the JMM, and no one should attempt to reason about thread-safety using them.
And I got Tim's reply to you before yours. Wonder if this is a spam
slowdown, the JCP lists are also gloriously backed up.
I guess he was in the Cc:, and the list in the To:, so he got it
directly? I also didn't get my message back until just recently, but
that have happened several times lately, even though the message was
present in the archives (the group) - I was thinking maybe googlegroups
was over-smart and excluded the sender from the recipients, or I have
some mess up with my procmail.
>
> When you call getInstance(), Guice constructs the object graph in the
> same thread, so there's no need for synchronization there.
What about singletons? Synched Maps?
>
> If you pass the object Guice returned to you off to another thread,
> it's up to you to do so in a thread safe manner, in which case you
> would safely publish it.
I think there are too many opinions here now - or everybody is saying
the same thing, differently.. Since you, Bob, is the principal author of
this thing, you're probably right!
I read you as since one will have to do some synchronization, or use a
volatile variable, to actually pass the object in question safely off to
another thread, that would constitute the happens-before edge, and hence
everything would be good?
And just to get my head straight here: this one happens-before edge
between the thread setting the object, and the thread getting the
object, will also ensure a metaphorical "deep synch" of the entire
object, making all fields (final and non, from the constructor, up and
including every change up to that edge) be visible to the other thread,
also including the entire graph from that object of?
.. one happens-before edge will ensure all that?
Endre.
> It might seem unlikely that a container could be thread-safe but not
> provide safe handoff, but consider session scope implemented with
> HttpSession attributes. While it is safe to store and retrieve objects
> from different threads (it has to be, otherwise the facility would be
> useless), there is no happens-before edge between a set and a get
Could you give an example of such a container? I can't seem to
understand how the set and a subsequent get would not have to be a
happens-before relationship, given that as you say, it is safe to store
and retrieve from different threads.
Thanks,
Endre.
Like Tim said, scopes (like singleton) are responsible for their own
thread safety. The singleton scope, for example, uses double-checked
locking. Theoretically, I could write special code for eager
singletons which wouldn't require any synchronization because they
would all be constructed in the same thread as the injector, and then,
the injector itself would be handed off in a thread-safe way.
> I think there are too many opinions here now - or everybody is saying
> the same thing, differently.
I think we're all saying the same thing. The important thing is, the
Injector itself is thread safe, but you need to make sure that your
code is, too. Scopes definitely need to be thread safe. Objects which
our in scopes probably need to be thread safe. For example, multiple
threads can access a singleton or session-scoped object concurrent.
> And just to get my head straight here: this one happens-before edge
> between the thread setting the object, and the thread getting the
> object, will also ensure a metaphorical "deep synch" of the entire
> object, making all fields (final and non, from the constructor, up and
> including every change up to that edge) be visible to the other thread,
> also including the entire graph from that object of?
>
> .. one happens-before edge will ensure all that?
That's a question for Tim. :)
Bob
An object is guaranteed to be fully *initialized* before its reference
is published. It is misleading to think of the constructor as having
"finished" in the case of multiple threads because non-final members
are not necessarily memory-coherent (i.e. visible to all threads). The
constructor could "finish", a reference to the object be published,
and yet threads may not see the same values for non-final fields of
the object (even those that are set in the ctor).
>
> If the constructor is not guaranteed to finish executing before the
> field value is changed then how can you possibly safely publish? Would
> the following code work?
>
> MyObject foo = new MyObject();
> myField = foo;
This is guaranteed by lexical ordering. A single thread going thru
this code will fully initialize and return from MyObject() before
assigning the value to myField.
The issue being debated here is publication to myField in the
constructed object (i.e. the one myField is a member of). If myField
is declared final, and your code above is in its ctor (and is the
*only* code in the ctor), then it is safe to assume foo is published
to myField when the ctor completes, i.e. the reference to the
enclosing object is published and visible to ALL threads.
For counter example,
class Enclosing {
private MyObject myField;
private final MyObject myFinalField;
Enclosing() {
myField = new MyObject();
myFinalField = myField;
}
}
Two threads receiving a reference to the same instance of Enclosing
may see different values of myField, even though its value is assigned
in the ctor, i.e. there is no guarantee made for non-final fields. But
ALL threads will see the correct value for myFinalField from a valid
reference to Enclosing.
Dhanji.
You're really helping me out here!
> But yes, the JMM says (for example) that if you write a value to a non-
> final, non-volatile field inside some data structure, then write a
> reference to the root of that data structure to a volatile, then in
> another thread read that volatile and then read the original field's
> value ... if you do that, you'll get back the value you wrote at the
> beginning, assuming no one else writes that field along the way.
>
> class C {
> int f;
> static volatile theC;
> }
>
> In Thread 1:
> C c = new C();
> c.f = 42; // 1. plain write
> C.theC = c; // 2. volatile write
>
> In Thread 2:
> C c = C.theC; // 3. volatile read
> int i = c.f; // 4. plain read
> // i is 42 assuming no other writes
>
> There are happens-before edges between 1 and 2 (program order), 2 and
> 3 (volatile write with subsquent read in a different thread), and 3
> and 4 (program order), and happens-before is transitive, so there's an
> edge between 1 and 4.
Okay, this was a nice, clear explanation!
Now, for the final little honing: what about unrelated, so to speak,
writes and reads across a happens-before edge? See, I realize I might
have over-complicated the problem in the question you answered: What
about all other variables in heap-memory, in regard to a happens-before
edge, not within the object in question (nor in its descendant graph)?
I'll try to exemplify, hoping that I don't actually introduce more
complications than the question entails:
class C {
int i;
}
Class Sc {
static C c;
}
Class Svol {
static volatile int vol;
}
Thread 1:
Sc.c = new C();
Sc.c.i = 1; // 1. plain write to h-b "unrelated" field
Svol.vol = 1; // 2. volatile write
Thread 2:
int ivol, ic;
ivol = Svol.vol; // 3. volatile read
ic = Sc.c.i; // 4. plain read of the h-b unrelated field
Basically, does the transitiveness of the happens-before carry across
_all_ heap memory in the system? Will, as long as two threads create any
single happens-before relationship (more precisely, a
/synchronized-with/ edge?), any writes to whatever data before the
synchronized-with edge, in source thread 1, be visible to all reads
after the synchronized-with edge, in destination thread 2?
From the earlier JMM, this I actually how I understood the text, but
I've never gotten to ask the elite about it.
I understand that the JMM part of JLS 2 wasn't precise enough, but I
found the concepts there way easier to grasp, even with its six
different terms for 'read' and 'write'!
Chap 17, Threads and Locks:
http://java.sun.com/docs/books/jls/second_edition/html/memory.doc.html#30206
" Each thread has a working memory, in which it may keep copies of
the values of variables from the main memory that is shared between all
threads. To access a shared variable, a thread usually first obtains a
lock and flushes its working memory. This guarantees that shared values
will thereafter be loaded from the shared main memory to the threads
working memory. When a thread unlocks a lock it guarantees the values it
holds in its working memory will be written back to the main memory".
Chap 17.6, Rules about the Interaction of Locks and Variables:
http://java.sun.com/docs/books/jls/second_edition/html/memory.doc.html#28325
" Let T be any thread, let V be any variable, and let L be any lock. ...
* Between an assign action by T on V and a subsequent unlock action
by T on L, a store action by T on V must intervene; moreover, the write
action corresponding to that store must precede the unlock action, as
seen by main memory. (Less formally: if a thread is to perform an unlock
action on any lock, it must first copy all assigned values in its
working memory back out to main memory.)
* Between a lock action by T on L and a subsequent use or store
action by T on a variable V, an assign or load action on V must
intervene; moreover, if it is a load action, then the read action
corresponding to that load must follow the lock action, as seen by main
memory. (Less formally: a lock action acts as if it flushes all
variables from the thread's working memory; before use they must be
assigned or loaded from main memory.) "
Ah, I just love those "less formally"s!
So, it seems to me that _whatever_ happens in one thread A before a
synchronizes-with relationship with thread B, will be visible to B after
the synchronized-with?
Why I've never been quite sure, is that I find this so .. wasteful? It
seems to me that it would have been much better, performance and
optimalization wise, to require the programmer to declare every single
field that one possibly wanted to be synchronized when in a
synchronized-with relationship. For example by tagging the field with
the 'volatile' modifier (This is what it means in C, isn't it?). Then
the much more elaborate volatile-modifier as we know it in java would
instead be called 'synched-volatile' or something. In that way, the JVM
could have cached way more aggressively - or rather, not having to
bother about all the flushing of caches on every single synch-edge. This
might be where the famed "escape analysis" come in (?), but there must
be many situations where it will be infeasible for the JVM to look
endlessly far ahead to see whether any thread ever will happen to read a
field in question, thus it will have to flush the value (or not use a
register for its workings) "just in case".
>
> (Note that you have to be careful here if the field is reached by
> following intermediate references; the visibility of each intermediate
> reference has to be considered, too.)
This comment, which I didn't quite get, might be what I'm really asking
about!
Thanks a lot,
Endre.
Thanks again.
I've got the book - I even urge people to buy it in the message in this
thread dated 2007-11-15 16:08!
The problem is that I've read quite a bit - and either I still haven't
comprehended some minute detail, or .. I am right!
>
>
>> Now, for the final little honing: what about unrelated, so to speak,
>> writes and reads across a happens-before edge?
>
> Same thing. I just picked an example to match what your question.
> There need be no special relationship between the field whose
> visibility is being reasoned about and the happens-before edge. But
> see below for a caveat.
I still feel you're stating two different things all the time!! See,
either _all_ writes, anywhere ("... let V be any variable ..."), from a
Thread 1 before a h-b edge will be visible to Thread 2 after the h-b
edge, or only the writes in relation to the specific lock (" ... let L
be any lock ...") that created the edge will be visible to Thread 2
(whatever "in relation to" should mean). And I actually believe the
former, since the latter seems way complicated.
I do not understand your distinction that is discussed in the next
section (with the "field-traversal" problem).
Do note that I acknowledge that it is quite possibly me that's totally
broken here!
>
>
>> class C {
>> int i;
>> }
>>
>> Class Sc {
>> static C c;
>> }
>>
>> Class Svol {
>> static volatile int vol;
>> }
>>
>> Thread 1:
>> Sc.c = new C();
>> Sc.c.i = 1; // 1. plain write to h-b "unrelated" field
>> Svol.vol = 1; // 2. volatile write
>>
>> Thread 2:
>> int ivol, ic;
>> ivol = Svol.vol; // 3. volatile read
>> ic = Sc.c.i; // 4. plain read of the h-b unrelated field
>
> This example is flawed. In the line marked 4, there are two reads,
> Sc.c and (Sc.c).i, and both of them are plain. There is no guarantee
> that Thread 2 sees the same value of Sc.c that Thread 1 is using. If
> you made Sc.c final, say, then yes, if thread 2 in line 3 reads the
> value written in line 2, then it will also read in line 4 the value
> written in line 1.
See, that's why I had this reference - to really make sure that the
synched and unsynced write/reads were totally different - and according
to you, I got to the point.
So if this example really is broken, then my understanding is broken.
WHY is this different? Why, given the program ordering in the two
threads, with the synched-edge in between, can there be a difference in
what Thread 1 and Thread 2 sees as C.c?
>
>
>> Basically, does the transitiveness of the happens-before carry across
>> _all_ heap memory in the system? Will, as long as two threads create any
>> single happens-before relationship (more precisely, a
>> /synchronized-with/ edge?), any writes to whatever data before the
>> synchronized-with edge, in source thread 1, be visible to all reads
>> after the synchronized-with edge, in destination thread 2?
>
> Those seem like different questions, and I'm uncomfortable answering
> either of them with a simple yes or no. I'll repeat that if there is a
> happens-before edge between the write of a field and a subsequent read
> of the field, then the read sees the value written. Just be sure that
> you're reading the same field that you're writing -- in your example,
> you weren't necessarily reading the same int that you wrote.
Those weren't supposed to be different questions! Both are supposed to
be nuances of the same thing: point to the exact problem at hand, which
you do latch on to in the above and next section.
>
>
>>> (Note that you have to be careful here if the field is reached by
>>> following intermediate references; the visibility of each intermediate
>>> reference has to be considered, too.)
>> This comment, which I didn't quite get, might be what I'm really asking about!
>
> Your code fragment above is an example of how the visibility of an
> intermediate reference (Sc.c) needs to be considered.
Exactly. And I'm lost!
>
>
>> From the earlier JMM, this I actually how I understood the text, but
>> I've never gotten to ask the elite about it.
>
> I'm not a member of the elite, and I still get things wrong, but I
> urge you not to put too much stock in the earlier JMM -- it's broken.
It's wasn't _that_ broken, was it? The big problem was the volatile
wording (which made the double checked locking not strictly possible),
and some specifics on the initialization of final fields (which is the
example of "/usr/tmp".substring(4), which may end up as both "/usr" and
"/tmp" in the old model).
The new specification supposedly fixes all that, but I think the new
wordings are extremely technical, and do not make for easy
comprehension! Also, it shies away from using those nice "less formally"
sections - which technically probably is a good thing, but makes it much
more useless for many more developers, I can imagine.
But then again, if my understanding of this actually is broken, as you
basically state above, then it wasn't easy to comprehend before either,
at least not for me!
>
>
>> Why I've never been quite sure, is that I find this so .. wasteful?
>
> You should definitely check out Bill Pugh's pages on this.
I will.
>
> This has strayed way OT, so perhaps we should call it quits on this
> thread.
It has become a bit OT, yes. But maybe others actually are finding it
interesting nonetheless - and maybe later developers will find the
discussion by googling! A discussion often clears up things much better
than "a concise explanation", since it will have to highlight the
problem from many different angles.
It is maybe a bit sad that this discussion didn't come up on the
advanced-java list instead.
Thanks again,
Endre.
-bp
On Nov 19, 10:02 am, Endre Stølsvik <stols...@gmail.com> wrote:Thread 1: Sc.c = new C(); Sc.c.i = 1; // 1. plain write to h-b "unrelated" field Svol.vol = 1; // 2. volatile writeThread 2: int ivol, ic; ivol = Svol.vol; // 3. volatile read ic = Sc.c.i; // 4. plain read of the h-b unrelated fieldThis example is flawed.Whoops -- no, it isn't. Duh. (I told you I still get things wrong!)
There is no thread 3 specified in the example, hence there is no thread 3.
Endre.
Just to argue this to the bitter end: I don't get the point. If I've
made an application where two threads are communicating with each other,
and that's how I made it, no "thread 3" will suddenly come barging in,
changing variables under my nose!
If one had to think like that for everything, one could not get any code
to work: No matter how one organized one's code in regard to
synchronization, there would always be races where your annoying mystery
Thread Three could sneak in, changing everything under one's feet!
But that's luckily not quite how it works. The threads are mine: I spawn
them and set them to do specific tasks. They will obey my orders - or
else I'll debug them!
Endre.