I have a garbage collection problem below. After line 15, why would the
object referenced by a is eligible for garbage collection whereas that
referenced by b is not?
Thanks for the help.
1 class TestA {
2  TestB b;
3    TestA ( ) { b = new TestB (this); }
4 }
5
6 class TestB {
7    TestA a;
8    TestB (TestA a) { this.a = a; }
9 }
10
11 class TestAll {
12   public static void main (String [ ] args) {
13        new TestAll.makeThings ( );
14       // ... code
15    }
16
17   void makeThings ( ) { testA test = new TestA ( ); }
18 }
> I have a garbage collection problem below. After line 15, why would the
> object referenced by a is eligible for garbage collection whereas that
> referenced by b is not?
> 
> Thanks for the help.
If you want homework help then it is to your benefit to show at least 
some evidence of trying to solve the problem yourself.  Some reasoning 
supporting at least a partial position on the question, something to 
show that you have at least read the relevant part of your text or (even 
better) the actual specification -- give us something to work with. 
Neither you nor any of the rest of us is well served if you manage to 
pass your class without actually knowing the material.
John Bollinger
jobo...@indiana.edu
1 class TestA {
2  TestB b;
3    TestA ( ) { b = new TestB (this); }
4 }
5
6 class TestB {
7    TestA a;
8    TestB (TestA a) { this.a = a; }
9 }
10
11 class TestAll {
12   public static void main (String [ ] args) {
13        new TestAll.makeThings ( );
14       // ... code
15    }
16
17   void makeThings ( ) { testA test = new TestA ( ); }
18 }
here is my thread of thought:
line 13 creates an instance of TestAll class and calls its makeThings
method, without keeping a reference to it.
line 17 instantiates a TestA object, calling its constructor. In the
process, this calls the constructor of class B that sets
the instance variable of TestB object a to the TestA object.
looks like an island of isolation but not quite; they are instances of
different classes.
so I disagree with the statement 'After line 15, the object referenced by a
is eligible for
garbage collection whereas that referenced by b is not?' since both should
be eligible ( the current thread has no access to
both objects ).
What is your opinion on that?
You meant: new TestAll().makeThings();
> 14       // ... code
> 15    }
> 16
> 17   void makeThings ( ) { testA test = new TestA ( ); }
You meant: void makeThings() {TestA test = new TestA();}
> 18 }
1. If you must use line numbers, make them /*...*/ comments, so
that others can simply copy&paste your code.
2. Only post code that you know will compile. The above
code will not compile (even after removing the line numbers).
3. If your code has comments, be sure they a /*...*/ comments
instead of // comments, because sometimes the code wraps and
screws-up the // comments.
4. The reasoning that "a" is eligible for garbage collection
and "b" is not: "b" is not eligible until all of its strong
references are nullified and "a" still has a strong reference
to "b". "b" is ineligible for GC until after "a" has been
reclaimed. "a" and "b" are not both simultaneously eligible,
but rather incrementally eligible. GC won't know that "b"
is eligible until sometime after it determines that "a"
is eligible. However, depending on the JVM implementation, GC
may defer reclaiming "a" until "b" is eligible also eligible
for reclaim. Eligibility for reclaim and actual reclaiming
are two completely different phases of GC.
It may be easier to understand by examining the "reachability"
of the objects:
At line 13, the current thread can only indirectly reach "b"
through a strong reference to "test" local variable. When the
makeThings() method returns (line 15), its stackframe is popped
(and all local variables on that stackframe are nullified). Thus,
the thread loses its last strong reference to the single instance
of TestA (i.e., the "a" instance). At that time, GC can make a
determination that the TestA instance is eligible for reclaim.
The TestB instance (i.e., the "b" instance) is not yet eligible,
because GC hasn't actually reclaimed the TestA instance (and
nullifying its reference fields). Only *after* the TestA instance
becomes eligible for reclaim will GC notice that its TestB instance
field was the last strong reference for the "b" object. The
determination of GC eligibility is an incremental process.
Note, however, that there are theoretical GC models that
can determine precisely when an instance is eligible for
reclaim without a reachability search (via reference tracking
algorithms). Such GC models still require an incremental
approach, but not a search of the heap.
Hope this helps.
> Only *after* the TestA instance
> becomes eligible for reclaim will GC notice that its TestB instance
> field was the last strong reference for the "b" object. The
> determination of GC eligibility is an incremental process.
Are you using the expression "eligible for reclaim" in a technical sense,
defined as part of some Java spec somewhere ?  I can't find anywhere that does
so, but could very easily have missed something.
If so then I can understand how the necessity for precision for describing the
lifetime of an object in the presence of finalisation could lead to terminology
that makes what you say exactly correct.
But if not, then I think it's wrong.  Ignoring finalisation for a moment, in
what I would call "normal" terminology, an object becomes eligible for reclaim
once there is no longer any path leading from a root, such as a thread's stack
frame, to that object.  (I'm also ignoring weak/soft/phantom references here).
Hence, at the moment where any one object becomes eligible, all other objects
that are only reachable via that object also become eligble -- by definition.
What may change is whether and how an actual GC algorithm can *detect* that
eligibility.  Algorithms in the broad category including mark-and-sweep,
copying, etc will in one sense discover that both objects are unreachable at
the same time.  In another sense they never discover that any object is
unreachable -- they are only interested in ones that are reachable, everything
else is just unintialised RAM.  GC algorithms which use some variant of
reference counting *do* have the incremental nature that you describe -- the GC
actively follows trains of unreachability: "aha this is unreachable.  Good.  So
that means *this* is unreachable too".  And so on...
When you factor finalisation into the picture then it gets more complicated,
and the terminology doesn't seem to be particularly well established.  However,
one way to see it is that finalisation breaks the link between being
"unreachable from a root" and "eligible for reclaim".  Another way of seeing it
(which matches the language of the JLS2 rather better) is that the system
automatically moves finalisable objects which are not otherwise reachable into
a state where they are only reachable by other objects in that state and by the
finalisation process (zero, one, or more threads).  In either case (as I read
the rather opaque text) finalisation does introduce something rather like the
incremental process that you describe in that no object becomes eligible for
reclaim until it, and all chains of references to it, have been finalized
without making it reachable again.
I suspect, though, that you know all this perfectly well, and what we have here
is a difference in terminology rather than a different understanding of how
real GC algorithms work.  Could you clarify please ?
-- chris
My opinion is that you have judged correctly, at least with regard to 
the Java GC model.  In Java an object is eligible for GC if it is not 
reachable from a live thread via a chain of strong references.  In the 
example, every TestA instance is paired with a TestB instance such that 
each holds a strong reference to the other.  Therefore, if one is 
strongly reachable then so is the other, and the two always have the 
same eligibility for GC.  With the precise code above, it would be 
possible to break the relationship after construction of the TestA and 
TestB instances by directly modifying their instance variables, but no 
such thing is actually done.  Both instances created during an 
invocation of TestAll.makeThings() in fact become eligible for GC as 
soon as makeThings() exits.
There are some subtleties involved in determining eligibility for GC in 
Java, the most notable being "hidden" local variables.  Hidden local 
variable arise because Java does not actually have nested local variable 
scopes at the VM level -- only at the Java source level.  To the VM all 
local variables are treated equally.  Therefore, a local reference 
variable that goes out of scope in the Java sense sticks around until 
the method in which it is declared terminates, no matter how long that 
may be.  Any object it refers to remains strongly reachable until that time.
That there are other GC models where the answer might be a bit 
different, perhaps including some in which the problem is not a trick 
question.  For Java, however, the bottom line answer is "there is no 
such reason."
John Bollinger
jobo...@indiana.edu
That is no quite true. At the bytecode level, local variables are 'slots' in the stack 
frame. Some compilers will reuse the slots allocated to nested local variables thus 
removing the reachability of references in the reused slots.
This only applies to direct interpretation of bytecodes. Runtime compilers are free to 
apply additional optimizations, including reuse of 'unnested' local variables. I have 
seem JVMs that perform even more esoteric optimizations.
> That there are other GC models where the answer might be a bit
> different, perhaps including some in which the problem is not a trick
> question.  For Java, however, the bottom line answer is "there is no
> such reason."
For Java, it is best to assume that (as you said) a local variable remains reachable 
until the method terminates. However, it is not guaranteed.
-- 
Lee Fesperman, FirstSQL, Inc. (http://www.firstsql.com)
==============================================================
* The Ultimate DBMS is here!
* FirstSQL/J Object/Relational DBMS  (http://www.firstsql.com)
[...]
> At the bytecode level, local variables are 'slots' in the stack 
> frame. Some compilers will reuse the slots allocated to nested local variables thus 
> removing the reachability of references in the reused slots.
> 
> This only applies to direct interpretation of bytecodes. Runtime compilers are free to 
> apply additional optimizations, including reuse of 'unnested' local variables. I have 
> seem JVMs that perform even more esoteric optimizations.
[...]
> For Java, it is best to assume that (as you said) a local variable remains reachable 
> until the method terminates. However, it is not guaranteed.
Yes, I suppose I was a bit presumptive.  I should have said that an 
object referred to by a local variable of some method _may_ remain 
reachable via that variable until the completion [abrupt or normal] of 
that method's execution, regardless of whether the variable goes out of 
scope in the Java source sense.  The specs do not require that behavior, 
and reasonable compilers might indeed emit bytecode that does not 
exhibit it.  The specs also do not forbid the behavior, and some 
compiler / VM combinations certainly do exhibit it in at least some cases.
It is seperate question whether a VM might reuse a local variable slot 
that would otherwise go unused for the remainder of the execution of 
some method.  Doing so requires some degree of program flow analysis in 
order to determine in the first place that the slot is available.  I'm 
having trouble imagining a scenario where a compliant VM could 
reasonably elect to do that outside the scope of JIT compilation, but 
once you JIT a piece of bytecode a wide variety of optimizations are 
possible.
John Bollinger
jobo...@indiana.edu
Chris Smith and I had a long discussion on this a while back and could not
come to an agreement. Consider the following case:
public void method()
{
    { Object o = new FooBar(); }
    someMethodThatExecutesALongTime();
}
I think we would agree that it would not be in error for the object to be
eligible for garbage collection during the call to the other method, since
it is no longer accessible and the variable itself has gone out of scope.
What if we remove that scope:
public void method()
{
    Object o = new FooBar();
    someMethodThatExecutesALongTime();
}
The question is whether the VM is allowed to make the object created
eligible for garbage collection before the called method returns. It is no
longer accessed in this method but it is still in scope.
I say that the VM should not be allowed to do this because while the
variable will not be accessed again, technically it is still visible after
the method returns and would be accessible even though it isn't actually
accessed.
But the only way to actually see an effect from this is if the finalizer for
the object had a side effect. This type of pattern is of course used all the
time in C++, but makes less sense in Java, but I'm not sure we can just
throw it out without losing program correctness.
Unfortunately, the spec is not explicitly clear on this point.
The problem is that these two methods generate the exact same byte code.
If you agree with me that in the second case it would be wrong for the
object to be garbage collected before the called method returns, then you
have to conclude that the VM is quite limited in what it can do to optimize
garbage collection and it doesn't matter what program flow analysis that you
do. The VM can only tell if the variable *will* be accessed again, but has
no way to know when it ceases to be "accessible".
--
  Dale King
I'll vote for better optimization. As you say, your approach would limit VM 
optimization. For instance, the runtime compiler could use a register instead of a 
variable. On some machines, "accessibility" could force a register save/restore.
I fully expect VMs to do this optimization and much, much more. It would be best to heed 
my caveat: "it [accessibility/reachability] is not guaranteed."
It is *not in scope*. Read the VM spec for local variable table 
definitions. This defines the locations within the method for which a 
variable is valid.
StartPC and Length - ie, the variable may not be valid until XX bytes 
into the method and then is only valid for YY bytes. Params are of 
course valid from 0 bytes into the method and for Length bytes from 
there.
Given your method, object 'o' would inhabit slot zero and would be valid 
from offset 0 to offset at the start of the line with the call to 
someMethod...
As such, the variable is valid for reclamation, however most GC's will 
take the simple and easy approach and wait for the method to end before 
thinking about such things.
Stephen
-- 
Stephen Kellett
Object Media Limited    http://www.objmedia.demon.co.uk
RSI Information:        http://www.objmedia.demon.co.uk/rsi.html
Following myself up. I feel a qualification of my statement is required.
The local variable table is optional - it does not have to be present 
for the java class to load and execute. Its job is to help debuggers and 
so forth. I've come to this conclusion after writing a prototype java 
tracer that dumped the params and locals to stdout. Many of the Sun 
supplied classes don't have a local variable table - even though they 
clearly do have local variables and method parameters.
To recast my original statement:
If you have a local variable table for the method and the JVM chooses to 
use that information the JVM can determine the variable is out of scope. 
The JVM will most likely not use the local variable table information as 
it is simpler to simply wait until the end of the method.
For the occasions when the local variable table is absent, you have to 
assume the variable is in scope even though you know better. Hence the 
JVM won't attempt garbage collection of such local variables until the 
method terminates.
Cheers
Dale King wrote:
> public void method()
> {
>     Object o = new FooBar();
>     someMethodThatExecutesALongTime();
> }
>
> The question is whether the VM is allowed to make the object created
> eligible for garbage collection before the called method returns. It is no
> longer accessed in this method but it is still in scope.
My apologies if this is a point that you covered in your thread with Chris
Smith, but it seems to me that the JLS2 is pretty clear on this point.  From
12.6.1 "Implementing Finalization":
========
A reachable object is any object that can be accessed in any potential
continuing
computation from any live thread. Optimizing transformations of a program
can be designed that reduce the number of objects that are reachable to be less
than those which would naively be considered reachable. For example, a compiler
or code generator may choose to set a variable or parameter that will no longer
be
used to null to cause the storage for such an object to be potentially
reclaimable
sooner.
========
Granted that that may not be a normative specification of what optimisations
may be performed, but does seem to be sufficiently explicit about that such a
normative spec would say if there were any such spec.  E.g. I'd expect it to be
legal for the compiler (let alone the JVM) to rewrite the above quoted snippet
to:
    public void method()
    {
        Object o = new FooBar();
        o = null;
        someMethodThatExecutesALongTime();
    }
-- chris
<Inaccurate quote above>
> It is *not in scope*. Read the VM spec for local variable table
> definitions. This defines the locations within the method for which a
> variable is valid.
You misquoted me above. The sentence "It is no longer accessed in this
method but it is still in scope." did not apply to the example you quoted.
It applied to the example with the braces removed. The rest of your logic is
based on that false mischaracterization of what I said.
You are correct that the local variable table will tell you scope
information for a variable, but that is optional and therefore cannot be
relied upon.
--
 Dale King
Yes, it was discussed and is not as clear as you seem to think.
> ========
> A reachable object is any object that can be accessed in any potential
> continuing
> computation from any live thread.
Here is one of the unclear parts. What does *can be* accessed mean. Does
*can* be accessed require that it *will* be accessed. Or is *could have*
been sufficient.
> Optimizing transformations of a program
> can be designed that reduce the number of objects that are reachable to be
less
> than those which would naively be considered reachable. For example, a
compiler
> or code generator may choose to set a variable or parameter that will no
longer
> be
> used to null to cause the storage for such an object to be potentially
> reclaimable
> sooner.
> ========
>
> Granted that that may not be a normative specification of what
optimisations
> may be performed, but does seem to be sufficiently explicit about that
such a
> normative spec would say if there were any such spec.  E.g. I'd expect it
to be
> legal for the compiler (let alone the JVM) to rewrite the above quoted
snippet
> to:
I agree with the spec that "a compiler or code generator" may do that. I
don't agree that the JVM should be allowed to do that, because it does not
have sufficient information.
FYI, to save recovering the same ground here is that previous discussion:
http://groups.google.com/groups?threadm=5b5776f8.0302041235.242e4f91%40posti
ng.google.com
And I'll vote for correctness of program execution. C++ has the RAII
paradigm where they create an object and they rely on the fact the object
will be destructed when the variable "holding" it goes out of scope. There
are several reasons why RAII does not work well in Java but the primary
reason is that the object may not be finalized until long after the variable
goes out of scope. RAII is not ver usefull in that case.
But you are talking about allowing the opposite extreme where the object may
be finalized long before (could be hours or days) the variable goes out of
scope. That is just plain wrong.
> I fully expect VMs to do this optimization and much, much more. It would
be best to heed
> my caveat: "it [accessibility/reachability] is not guaranteed."
Your opinion has little weight. It should be explictly specified.
--
  Dale King
My apologies. I only saw the article I replied to. A different posting I 
made has corrected my comments in any case.
I agree that the JLS is a bit vague on this point, but I think you are 
taking an extreme position.  To be sure, a VM that complied with your 
assertion of the required behavior certainly would be compliant on this 
point, but I don't interpret the spec as restrictively.  In particular, 
I think it is not required and not useful to interpret "can be accessed 
in any potential continuing computation from any live thread" in any 
other context than that of the classes currently loaded by the VM and 
any that might be loaded in the course of any of the computations described.
For instance, in the example quoted above, the Java compiler or VM could 
determine that there is no read of method()'s local variable "o" in the 
byte code currently loaded into the VM.  How could that not mean that o 
cannot be accessed?  I simply don't accept that the compiler or VM is 
required to take into account the continuum of possible alternative 
implementations of method().  The whole point of optimization is to 
shortcut a program is a way that has no effect on the results except for 
resource consumption and/or running time.  I see absolutely no point to 
interpreting the spec as you describe.  I guess all this puts me in the 
same camp as Chris Smith on the matter.
 From where I stand "will be" is equivalent to "can be" plus "cannot be 
otherwise".  It is thus a stronger and inequivalent condition.  It may 
be impossible or impractical to evaluate a "will be" condition, so the 
spec relies on the weaker "can be".  The implementation is not in fact 
required to make the determination precisely, so long as it doesn't 
erroneously mark anything unreachable, because, as you know, it is not 
required to GC those objects that are unreachable on any particular 
schedule.
>>Optimizing transformations of a program
>>can be designed that reduce the number of objects that are reachable to be
> 
> less
> 
>>than those which would naively be considered reachable. For example, a
> 
> compiler
> 
>>or code generator may choose to set a variable or parameter that will no
> 
> longer
> 
>>be
>>used to null to cause the storage for such an object to be potentially
>>reclaimable
>>sooner.
>>========
>>
>>Granted that that may not be a normative specification of what
> 
> optimisations
> 
>>may be performed, but does seem to be sufficiently explicit about that
> 
> such a
> 
>>normative spec would say if there were any such spec.  E.g. I'd expect it
> 
> to be
> 
>>legal for the compiler (let alone the JVM) to rewrite the above quoted
> 
> snippet
> 
>>to:
> 
> 
> I agree with the spec that "a compiler or code generator" may do that. I
> don't agree that the JVM should be allowed to do that, because it does not
> have sufficient information.
Hold your horses, there.  What information is the VM missing?  If it had 
that information (or an equivalent) would it be permitted to perform the 
transformation, via JIT or otherwise?
John Bollinger
jobo...@indiana.edu
C++ has no relevance here since it doesn't have GC. However, a VM could use RAII 
(including finalization) as an optimization. A Java programmer just can't take advantage 
of it.
> But you are talking about allowing the opposite extreme where the object may
> be finalized long before (could be hours or days) the variable goes out of
> scope. That is just plain wrong.
Exactly how is it wrong (other than in your opinion)? I reread the exchange from last 
year, again. You may remember that I contributed to the thread. You failed to find any 
authority for your position, and you failed to convince any posters to the thread.
You also didn't deal with the finalization aspect. There is some consensus that 
finalizeers should be used carefully and rarely. There are those who recommend not using 
them at all. I won't go that far; I have implemented finalizers for very solid reasons. 
I tend to think that finalizers that are vulnerable in this situation are poorly 
implmented. They really shouldn't have side effects except on external resources under 
their sole control.
> > I fully expect VMs to do this optimization and much, much more. It
> > would be best to heed my caveat: "it [accessibility/reachability]
> > is not guaranteed."
> 
> Your opinion has little weight. It should be explictly specified.
Your first comment is uncalled-for, though it is typical for you to use bluster and 
insults in technical discussions. I'd be glad to stack my knowledge and experience 
against yours any day.
I develop very complex systems software in Java. It's of great importance to me that JVM 
optimization go as far as it can. I don't want it hamstrung by C++ concepts or poor Java 
programming practices.
> FYI, to save recovering the same ground here is that previous discussion:
>
> http://groups.google.com/groups?threadm=5b5776f8.0302041235.242e4f91%40posti
> ng.google.com
Thanks. I've read it now...
But I still think the matter is clear ;-)
> > A reachable object is any object that can be accessed in any potential
> > continuing
> > computation from any live thread.
>
> Here is one of the unclear parts. What does *can be* accessed mean. Does
> *can* be accessed require that it *will* be accessed. Or is *could have*
> been sufficient.
I'm approaching it from a different direction, starting from the next part of
the quote:
> > For
> > example, a compiler or code generator may choose to set a variable or
> > parameter that will no longer be
> > used to null to cause the storage for such an object to be potentially
> > reclaimable sooner.
I cannot see any way that this could be true if it were not (at least) legal
for the compiler to rewrite:
    {
        Object o = new FooBar();
        someMethodThatExecutesALongTime();
    }
to:
    {
        Object o = new FooBar();
        o = null;
        someMethodThatExecutesALongTime();
    }
That's to say, I cannot think of any simpler (i.e. easier to prove) transform
that the quote could be referring to.  Note, I am not considering
"reachability" (or similar) at all, I'm only considering the legality of
transformations of the use of local variables (of course, that *affects*
reachability).  Note that the transform is only using the same (putative)
"licence" as allows the compiler to use just one slot for:
    {
        Object o = new FooBar();
        someMethod();
        Object p = new BarFoo()
        someMethod();
    }
I assert that it is legal for the compiler to reuse the slot on the basis that
for it to be illegal there would have to be a statement to that effect
somewhere, and there ain't one.
> I agree with the spec that "a compiler or code generator" may do that. I
> don't agree that the JVM should be allowed to do that, because it does not
> have sufficient information.
The way I see it is the other way around.  From the earlier point, and from the
fact that (as you noted in the original thread) the bytecodes that reach the
JVM are the same in each instance (before and after the transform), I conclude
it must be legal for the JVM to perform the equivalent of the same
optimisations.  I.e. if the compiler is entitled to emit either sequence of
bytecodes, then the JVM must be entitled to consider both sequences as
equivalent and produce identical runtime behaviour for them.
OK, the "must"s in that paragraph are too strong -- it isn't a logical
entailment -- but I doubt if you could convince any JVM implementer that s/he
didn't have the (almost;-) implied freedom.  And that really is the
bottom-line: I don't think that, pragmatically, it is safe to assume that JVM
implementers haven't interpreted the spec (such as it is) in the same way as
I/Chris Smith/et al.
-- chris
I am not asking it to look at all possible alternative implementations, but
that it should respect my code where I declared the actual scope of the
variable, which says when the variable is accessible. You are saying that
the JVM should be able to circumvent what is declared in the program. If I
declare a scope for the variable I may be depending on the object remaining
alive. It just seems to me that you are being over optimistic and violating
the correctness of the program. In the end it probably doesn't matter
because the only way being over optimistic can matter is if you have side
effects from a finally clause and even if you did the garbage collector
usually delays finalization.
> > I agree with the spec that "a compiler or code generator" may do that. I
> > don't agree that the JVM should be allowed to do that, because it does
not
> > have sufficient information.
>
> Hold your horses, there.  What information is the VM missing?  If it had
> that information (or an equivalent) would it be permitted to perform the
> transformation, via JIT or otherwise?
Sure if the class file actually has the optional local variable table, then
it is free to reclaim objects referenced from variables that have gone out
of scope.
I wasn't saying that C++ was relavent, but that the RAII paradigm is one
that depends upon objects not being destructed until the variable goes out
of scope. And in this sense C++ does have a GC. The stack variables are
collected automatically when they go out of scope.
C# has a mechanism for RAII with a garbage collector that eliminates the
need for most finally clauses. It uses an IDisposable interface and a using
statement. See Jon Skeet's RFE to add a similar feature to Java:
http://groups.google.com/groups?selm=MPG.1929810f411094bf98c14e%40dnews.pera
mon.com
> However, a VM could use RAII
> (including finalization) as an optimization. A Java programmer just can't
take advantage
> of it.
Funny, I don't see that in the spec. I know a Java programmer should not use
RAII because the finalization is not immediate, but this optimization says
that it is fine for the VM to do it prematurely. As long as there is a
finalize method which can have side effects then it seems to me that doing
it prematurely can break code.
> > But you are talking about allowing the opposite extreme where the object
may
> > be finalized long before (could be hours or days) the variable goes out
of
> > scope. That is just plain wrong.
>
> Exactly how is it wrong (other than in your opinion)? I reread the
exchange from last
> year, again. You may remember that I contributed to the thread.
Let's say I had code like this:
    new FooBar();
    callSomeMethod();
Would it be OK for the VM to simply omit the creation of the FooBar instance
or perhaps reorder it to do it after the method call? Of course not. The
reason is that the constructor can have side effects that affect the state
of objects outside of itself. So why does not the same logic apply to the
finalize method. It can have side effects that affect the state of objects
outside of itself. Why should the VM be allowed to do it for finalize, but
not for constructors. I personally do not see why the same rules do not
apply.
> You failed to find any
> authority for your position, and you failed to convince any posters to the
thread.
I wasn't looking for authority just reasoning.
> You also didn't deal with the finalization aspect. There is some consensus
that
> finalizeers should be used carefully and rarely. There are those who
recommend not using
> them at all. I won't go that far; I have implemented finalizers for very
solid reasons.
> I tend to think that finalizers that are vulnerable in this situation are
poorly
> implmented. They really shouldn't have side effects except on external
resources under
> their sole control.
But that is not enforceable. Finalizers can have side effects. I agree it is
not recommended practice. But I don't go as far as to say that the JVM is
free to run them prematurely.
> > > I fully expect VMs to do this optimization and much, much more. It
> > > would be best to heed my caveat: "it [accessibility/reachability]
> > > is not guaranteed."
> >
> > Your opinion has little weight. It should be explictly specified.
>
> Your first comment is uncalled-for, though it is typical for you to use
bluster and
> insults in technical discussions. I'd be glad to stack my knowledge and
experience
> against yours any day.
Sorry, that was not meant as an insult at all! Very poor wording on my part.
I didn't mean that your opinion had little weight because it was *your*
opinion, but because it was just an *opinion*. My opinion doesn't have any
weight either.
My point was that our opinions don't count, the only thing that has any
weight in these sort of matters is the specification. And the spec. needs to
be clarified on this point.
And it is certainly *not* typical for me to use bluster and insults in
technical discussions and I was not doing so here.
> I develop very complex systems software in Java. It's of great importance
to me that JVM
> optimization go as far as it can. I don't want it hamstrung by C++
concepts or poor Java
> programming practices.
And I would have no trouble with that if the spec. explicitly said that it
could make such an optimization and that the behavior is not guaranteed.
Until that time, I think a JVM should be conservative on that point and
guarantee it and that developers should also be conservative and follow your
advice and assume that it is not guaranteed.
--
 Dale King
> C# has a mechanism for RAII with a garbage collector that
> eliminates the need for most finally clauses. It uses an
> IDisposable interface and a using statement.
I wouldn't call syntactic sugar for a try-finally statement a garbage 
collector. The only link between a using-statement and the real GC is 
that the contract for Dispose() in IDisposable says that object 
should be registered with the GC as disposed so the GC doesn't need 
to call its finalizer.
> Funny, I don't see that in the spec. I know a Java programmer
> should not use RAII because the finalization is not immediate, but
> this optimization says that it is fine for the VM to do it
> prematurely. As long as there is a finalize method which can have
> side effects then it seems to me that doing it prematurely can
> break code. 
"Prematurely", "prematurely". Quite emotional words for technical 
discussion.
-- 
No address munging in use. I like the smell of nuked accounts in the 
morning.
Dale,
I'm jumping in a bit late just to clarify something and avoid confusion.  
I'm reading this thread with interest.
There are, as far as I can see, two reasonable choices for interpreting 
the JLS's "can be accessed" in the phrase above.  Certainly, the phrase 
"could have been accessed" is one of them.  However, the phrase "will be 
accessed" is not one of them.  There are any number of situations in 
which the JRE can't reasonably determine, or it's just plain 
indeterminate, whether an object will be accessed; for example, when 
there are still references in scope but future behavior depends on user 
input.  Getting the JRE to determine "will be accessed" is clearly 
beyond the scope of any specification.
The other reasonable interpretation (besides your own) is simply to 
interpret "can be accessed" as referring to the set of possible 
behaviors of the program right now, versus what it could have been 
written to do in the past.  I can't think of a replacement for the "can 
be accessed" phrase that more clearly defines this.  I think "can be 
accessed" says it very clearly already -- but that, of course, is the 
whole point of the disagreement.
-- 
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
Two answers:
1. It's because the JLS, in defining reachability (and thus eligibility 
for finalization), doesn't talk about scope.  There's really no reason 
to see collection of an object while there's still a reference to it in 
scope as a "reordering" at all, except that it runs counter to one 
obvious implementation of garbage collection -- namely, that of using 
the stack pointer to determine with stack slots should be considered 
part of the root set.
Basically, the comparison *depends* on interpreting the JLS in a 
specific way (and one that I happen to disagree with).  If the JLS is 
interpreted that way, then of course it's wrong for the finalizer to be 
run prior to the reference leaving scope.  If it's not interpreted that 
way, though, then the comparison is terribly meaningless.
2. Because, frankly, side-effects of constructors are at least vaguely 
useful.  Finalizers, on the other hand, are just a terrible mistake.  
Aside from believing that the JLS really does not make that restriction 
(and seeing compelling evidence that a majority of people within Sun 
agree), I just really hope that limitations on good VM implementation 
aren't made unreasonably just because of the impact of a failed language 
feature.  This answer wouldn't matter if I thought the JLS required 
delaying finalization, but it certainly tips the scales a bit in favor 
of choosing one reasonable interpretation over another.
> There are, as far as I can see, two reasonable choices for
> interpreting the JLS's "can be accessed" in the phrase above. 
> Certainly, the phrase "could have been accessed" is one of them.
Can you explain, please? "Can be accessed" is about the future, whereas 
"could have been accessed" is about the past, so how can one be 
interpreted as the other?
Well, I think that Dale should explain, since he holds that opinion.  I 
don't.  However, I justify it as at least somewhat reasonable because 
one might have a mental model in mind in which the JRE is simply going 
through a fetch-execute on bytecode and hence "can be accessed", to the 
virtual machine, is independent of what code might be coming up next.  I 
have trouble providing such a description without noting my objections 
to it (such as that the JLS never describes this mental model, and that 
it's not a very accurate description of reality either)... which is why 
Dale is probably better off explaining from his perspective.
I don't see any resemblance to GC at all. It's stil the C++ world of destroying objects 
at a specific point.
> C# has a mechanism for RAII with a garbage collector that eliminates the
> need for most finally clauses. It uses an IDisposable interface and a using
> statement. See Jon Skeet's RFE to add a similar feature to Java:
> 
> http://groups.google.com/groups?selm=MPG.1929810f411094bf98c14e%40dnews.pera
> mon.com
The above is just not germane to what we're discussing. Put simply, your basic concept 
is that reachability (as far as GC is concerned) be the same as scope. Let me look at 
some deeper issues here...
1) In general, scope, except method scope, is lost when Java code is translated to 
bytecodes (except when a valid Local Variable Table is included.) Your concept means the 
JVM would have to assume that all local variables (unless their 'slots' are reused) are 
reachable for the entire method.
On the other side, trying to use scope like this:
  {
    Foo o = new Foo();
  }
  callSomeLongMethod();
wouldn't force 'o' to be eligible for GC before callSomeLongMethod() completes.
Additional chances for optimization will be lost here.
2) In 1), the effect of translation to bytecode can cause the scope of a variable to be 
lengthened but never shortened (inappropriately). With your concept, a shortened scope 
could cause incorrect behavior. To avoid this, compilers could only reuse variable slots 
that are no longer in scope. I did a quick test on a couple of Sun's compilers and 
didn't see an instance of this, which tends to support your ideas. However, I don't know 
if it is true for all certified compilers.
> > > But you are talking about allowing the opposite extreme where
> > > the object may be finalized long before (could be hours or days)
> > > the variable goes out of scope. That is just plain wrong.
> >
> > Exactly how is it wrong (other than in your opinion)? I reread
> > the exchange from last year, again.
> 
> Let's say I had code like this:
> 
>     new FooBar();
>     callSomeMethod();
> 
> Would it be OK for the VM to simply omit the creation of the FooBar instance
> or perhaps reorder it to do it after the method call? Of course not. The
> reason is that the constructor can have side effects that affect the state
> of objects outside of itself. So why does not the same logic apply to the
> finalize method. It can have side effects that affect the state of objects
> outside of itself. Why should the VM be allowed to do it for finalize, but
> not for constructors. I personally do not see why the same rules do not
> apply.
Agreed, the VM couldn't omit or reorder FooBar creation ... unless it knew there were no 
side effects. However, the same only applies to finalizers if your concept must be 
enforced. That doesn't add anything to your position.
Actually, your concept doesn't apply here anyway. Your concept concerns the scope of 
local variables. There are no variables in the code, just expressions. The scope of an 
expression is no more than the statement it's in.
> > You failed to find any authority for your position, and you failed to
> > the thread.
> 
> I wasn't looking for authority just reasoning.
Fair enough. I am trying to clarify the basic issue and avoid side issues, like RAII and 
expression scoping.
> > You also didn't deal with the finalization aspect. There is some
> > consensus that finalizeers should be used carefully and rarely.
> > There are those who recommend not using them at all. I won't go that
> > far; I have implemented finalizers for very solid reasons. I tend to
> > think that finalizers that are vulnerable in this situation are
> > poorly implemented. They really shouldn't have side effects except
> > on external resources under their sole control.
> 
> But that is not enforceable. Finalizers can have side effects. I agree it is
> not recommended practice. But I don't go as far as to say that the JVM is
> free to run them prematurely.
Certainly it is not enforceable. I also don't want finalizers to be run permaturely, 
that is, unless there is no possiblity of access (the object is 
unreachable/inaccessible). The issue is that I'm thinking of 'physical' reachability 
(determined by a scan of registers, stacks and then the heap). You're espousing 
'logical' reachability (based on local variable scope).
Your approach can require extra code and reduces optimization options. My point above is 
that its only purpose is to prevent 'unexpected' invocation of poorly implemented 
finalizers.
By some weird sort of serendipity, this issue is also touched on in the current thread, 
"Declaring a reference in a loop versus outside a loop". In that thread, Neal Gafter 
seems to indicate that Sun doesn't support your idea.
> And I would have no trouble with that if the spec. explicitly said that it
> could make such an optimization and that the behavior is not guaranteed.
> Until that time, I think a JVM should be conservative on that point and
> guarantee it and that developers should also be conservative and follow your
> advice and assume that it is not guaranteed.
I doubt anyone is arguing that clarity in the spec is not desirable.
I do think detailing potential optimizations is generally out of place in the spec, 
though I would accept that this situation is an exception.
> I doubt anyone is arguing that clarity in the spec is not
> desirable. 
> 
> I do think detailing potential optimizations is generally out of
> place in the spec, though I would accept that this situation is an
> exception. 
Here's the non-normative text from the corresponding section in the 
C# specification: 
[Note: Implementations may choose to analyze code to determine which 
references to an object may be used in the future. For instance, if a 
local variable that is in scope is the only existing reference to an 
object, but that local variable is never referred to in any possible 
continuation of execution from the current execution point in the 
procedure, an implementation may (but is not required to) treat the 
object as no longer in use. end note]
Food for thought.
> "John C. Bollinger" <jobo...@indiana.edu> wrote in message
> news:c2iuhv$1f5$1...@hood.uits.indiana.edu...
>>For instance, in the example quoted above, the Java compiler or VM could
>>determine that there is no read of method()'s local variable "o" in the
>>byte code currently loaded into the VM.  How could that not mean that o
>>cannot be accessed?  I simply don't accept that the compiler or VM is
>>required to take into account the continuum of possible alternative
>>implementations of method().  The whole point of optimization is to
>>shortcut a program is a way that has no effect on the results except for
>>resource consumption and/or running time.  I see absolutely no point to
>>interpreting the spec as you describe.  I guess all this puts me in the
>>same camp as Chris Smith on the matter.
> 
> 
> I am not asking it to look at all possible alternative implementations, but
> that it should respect my code where I declared the actual scope of the
> variable, which says when the variable is accessible.
I agree that the Java source determines where the variable is 
"accessible" in an informal sense, but as I tried to define that sense 
of the term I realized that I was just coming up with something 
equivalent to "in scope".  That leaves me unable to take any useful 
information from your statement.
The JLS does not apply the term "accessible" to local variables, and 
does not use it in its defined sense in the discussion of finalization 
(back to JLS 12.6.1).  The JLS also does not explicitly rely on the Java 
scope of reference variables to define reachability, and it certainly 
does not depend on the semantics of bytecode for that definition.
>                                                        You are saying that
> the JVM should be able to circumvent what is declared in the program.
No, I'm saying that the JLS does not specify that the scope of a local 
reference variable has any special relationship to the reachability of 
its current referrent within that scope.  To read such a requirement 
into the spec you are depending on interpreting "can be accessed [...]" 
to include the in-scope reference variable case, but I think that is a 
mistake.  The context of the statement is program _execution_, to which 
questions of variable scope in the source code are irrelevant.
>                                                                        If I
> declare a scope for the variable I may be depending on the object remaining
> alive.
And your program may therefore be incorrect.  As Chris Smith wrote a 
couple days ago, even if your interpretation were technically right, JVM 
implementors may, and some apparently do, use the more permissive 
interpretation.  You may claim the high moral ground for yourself if you 
wish, but if you want your programs to be reliable with respect to this 
detail then you must assume the more permissive interpretation is in use 
in the JVM.
>         It just seems to me that you are being over optimistic and violating
> the correctness of the program.
Funny, I'd say exactly the same thing about your position.
>                                   In the end it probably doesn't matter
> because the only way being over optimistic can matter is if you have side
> effects from a finally clause and even if you did the garbage collector
> usually delays finalization.
Agreed that if a program that assumes finalization will be delayed until 
all references are out of scope runs in a JVM that does not ensure that 
that assumption is valid then actual misbehavior still requires a 
combination of unlikely circumstances.  Still, no matter how unlikely, 
chances are that eventually it will happen, and that the failure will be 
extremely difficult to diagnose.  Even if I agreed with you on the 
technical point, I would be inclined to play it safe.
John Bollinger
jobo...@indiana.edu
And I provided that, raising a number of technical issues. No response?
> ....... this issue is also touched on in the current thread,
> "Declaring a reference in a loop versus outside a loop". In that thread,
> Neal Gafter seems to indicate that Sun doesn't support your idea.
I also discussed this issue with a JVM expert. He asserted that only 'physical' 
reachability is used, and there is no consideration of logical (scope) reachability. He 
also stated that anything else would be very, very expensive.
I would add that a major aspect of optimization is intra-method optimization, often 
dealing with local variable usage. Restrictions in this area would be quite detrimental.
Sorry, couldn't really get to newsgroups for a while.
I haven't seen anyone really address the technical correct semantics. What I
basically have seen is people say that the VM should be able to do this
because it can be more efficient. To me the issue is program correctness. I
see absolutely no real difference with this situation than with this code:
    Foo method()
    {
        Foo f = new Foo();
        someMethod();
        return f;
    }
Should the VM be allowed to reorder the execution so that the method was
effectively:
    Foo method()
    {
        someMethod();
        return new Foo();
    }
Would it make any difference if I told you that the code would be more
efficient this way? After all the VM can do the analysis and determine that
the first reference to the variable is not referenced until after the method
call.
Would you allow a VM to be able to make such an optimization? Of course not!
Any VM that reordered the code so that the constructor were called after
some other code that could interact with the side effects of that
constructor would be incorrect.
So why should the same rules not apply to finalization as they do to
construction? Why should it be allowed to reorder the finalization call?
This violates the expected semantics of languages like C++ where it is
guaranteed and relied upon.
I actually have no problem if Sun wants to say that those semantics are not
guaranteed, but if that is what is desired it should be specified and not
simply asumed.
> > ....... this issue is also touched on in the current thread,
> > "Declaring a reference in a loop versus outside a loop". In that thread,
> > Neal Gafter seems to indicate that Sun doesn't support your idea.
>
> I also discussed this issue with a JVM expert. He asserted that only
'physical'
> reachability is used, and there is no consideration of logical (scope)
reachability. He
> also stated that anything else would be very, very expensive.
Well of course, it cannot use scope because the scope is not present in the
class file. But then whether it can do the optimization is begging the
question.
I did some looking and got some contradictory results. There is a paper at
citeseer that actually gives some experimental numbers for these kind of
optimizations.
But on my reading of the spec., there is this oft-quoted technical article
by Sun that says:
http://java.sun.com/developer/technicalArticles/ALT/RefObj/
"An executing Java program consists of a set of threads, each of which is
actively executing a set of methods (one having called the next). Each of
these methods can have arguments or local variables that are references to
objects. These references are said to belong to a root set of references
that are immediately accessible to the program...
"All objects referenced by this root set of references are said to be
reachable by the program in its current state and must not be collected.
Also, those objects might contain references to still other objects, which
are also reachable, and so on. "
From which I read that local variables are part of a root set and all
objects referenced by this rootset cannot be collected. I see no mention of
reachability of local variables, but that local variables are part of the
rootset.
So when does a local variable enter and leave the root set. According to the
JLS:
"Local variables are declared by local variable declaration statements
(§14.4). Whenever the flow of control enters a block (§14.2) or for
statement (§14.13), a new variable is created for each local variable
declared in a local variable declaration statement immediately contained
within that block or for statement. A local variable declaration statement
may contain an expression which initializes the variable. The local variable
with an initializing expression is not initialized, however, until the local
variable declaration statement that declares it is executed. (The rules of
definite assignment (§16) prevent the value of a local variable from being
used before it has been initialized or otherwise assigned a value.) The
local variable effectively ceases to exist when the execution of the block
or for statement is complete."
That tells me that the variable ceases to exist when the execution of the
block is complete. The question is the meaning of that "effectively" word.
You might read it to say that it could actually cease to exist before then.
I might read it to say that they have to hedge the statement because the
variable may actually continue to exist after that point (which is actually
what does happen in Java since in the class files variables exist until the
end of the method).
> I would add that a major aspect of optimization is intra-method
optimization, often
> dealing with local variable usage. Restrictions in this area would be
quite detrimental.
As I said optimization arguments don't mean a lot to me if it involves
breaking correct semantics.
Apologies. I didn't mean to rush you; I was just concerned that you had left the thread. 
Take your time; I'll be here.
> I haven't seen anyone really address the technical correct semantics. What I
> basically have seen is people say that the VM should be able to do this
> because it can be more efficient. To me the issue is program correctness. I
> see absolutely no real difference with this situation than with this code:
> 
> --- examples snipped ---
> 
> Would you allow a VM to be able to make such an optimization? Of course not!
> Any VM that reordered the code so that the constructor were called after
> some other code that could interact with the side effects of that
> constructor would be incorrect.
We've gone through this earlier in this thread. I agreed that code reodering was 
inappropriate (unless the JVM could prove it was correct).
> So why should the same rules not apply to finalization as they do to
> construction? Why should it be allowed to reorder the finalization call?
Simply, because the spec does not clarify this behavior and then there is existing 
practice (see below).
> This violates the expected semantics of languages like C++ where it is
> guaranteed and relied upon.
C++ is still irrelevant here. Definite execution of destructors is mandatory in C++ 
because they *must* destroy any held objects. This does not apply in Java. C++ semantics 
are ragged in a number of areas. For instance, C++ does not guarantee that an object 
won't be accessed after the destructor is called. Java guarantees that the object is not 
accessible after the finalizer is called (unless the finalizer overrides it.)
> I did some looking and got some contradictory results. There is a paper at
> citeseer that actually gives some experimental numbers for these kind of
> optimizations.
> 
> But on my reading of the spec., there is this oft-quoted technical article
> by Sun that says:
> 
> http://java.sun.com/developer/technicalArticles/ALT/RefObj/
> "An executing Java program consists of a set of threads, each of which is
> actively executing a set of methods (one having called the next). Each of
> these methods can have arguments or local variables that are references to
> objects. These references are said to belong to a root set of references
> that are immediately accessible to the program...
> 
> "All objects referenced by this root set of references are said to be
> reachable by the program in its current state and must not be collected.
> Also, those objects might contain references to still other objects, which
> are also reachable, and so on. "
> 
> From which I read that local variables are part of a root set and all
> objects referenced by this rootset cannot be collected. I see no mention of
> reachability of local variables, but that local variables are part of the
> rootset.
This is obviously about reachability. It does not specify which reference variables are 
in the rootset. It most certainly can't be all (reference) local variables in the 
method, since standard compiler generated class files can make that impossible to 
accomplish.
> So when does a local variable enter and leave the root set. According to the
> JLS:
Both quotes (above and below) mention local variables but are not connected. They are 
from different documents! The quote above is about reachability, the one below about 
scope.
> "Local variables are declared by local variable declaration statements
> (§14.4). Whenever the flow of control enters a block (§14.2) or for
> statement (§14.13), a new variable is created for each local variable
> declared in a local variable declaration statement immediately contained
> within that block or for statement. A local variable declaration statement
> may contain an expression which initializes the variable. The local variable
> with an initializing expression is not initialized, however, until the local
> variable declaration statement that declares it is executed. (The rules of
> definite assignment (§16) prevent the value of a local variable from being
> used before it has been initialized or otherwise assigned a value.) The
> local variable effectively ceases to exist when the execution of the block
> or for statement is complete."
> 
> That tells me that the variable ceases to exist when the execution of the
> block is complete. The question is the meaning of that "effectively" word.
> You might read it to say that it could actually cease to exist before then.
> I might read it to say that they have to hedge the statement because the
> variable may actually continue to exist after that point (which is actually
> what does happen in Java since in the class files variables exist until the
> end of the method).
I've already shown that they don't always exist until the end of the method (their slots 
are reused in the class file). Obfuscators also reuse variables.
'Effectively' is the crux of the matter, and it is vague (intentionally?). The JVM 
developer that I spoke to stated --- "All specs should include the implied 'as if' 
rule".
I'm not sure there is any point to 'reason' further. The JLS is not definitive on this 
issue and existing practice weighs against your approach.
Existing practice: Two developers of major JVMs have asserted that only physical 
reachability is considered in GC and that the JVM will not generate code to extend 
reachability to match logical scope (see my example earlier in this thread.)