I recently discovered a problem which I traced back to the specific
behavior of a compiler (jikes 1.06). At first I thought it was a
compiler bug but after some digging I discovered that it is due to a
change in the language spec (which only seems to be exploited by jikes
starting somewhere between 0.47 and 1.06).
Here's the deal: section 12.3.3 of the Java Language Specification
(http://www.javasoft.com/docs/books/jls/html/12.doc.html#44487), when
talking about the kinds of errors which should be thrown when
resolution of a symbolic reference fails, states:
"NoSuchFieldError: A symbolic reference has been encountered that refers
to a specific field of a specific class or interface, but the class or
interface does not declare a field of that name (it is specifically not
sufficient for it simply to be an inherited field of that class or
interface). This can occur, for example, if a field declaration was
deleted from a class after another class that refers to the field was
compiled (§13.4.7)."
Pay special attention to the phrase in parentheses:
"(it is specifically not sufficient for it simply to be an inherited
field of that class or interface)"
What this all means is that if a class references a field (variable) of
another class, then that field must be found explicitly within the class
referenced. The clause in parenthesis specifically disallows
referencing fields "virtually", that is to say, referencing a variable
in a class where it does not actually exist, and relying upon the VM to
determine that the variable actually referenced is from a superclass.
For example:
public class foo {
int a;
}
public class bar extends foo {
public bar(int b) {
a = b;
}
}
In the above case, bar references the field a from its superclass foo. In
Java 1.0 land, the class file would contain a reference to the field a from
the class foo via a CONSTANT_Fieldref_info structure. All is well and good.
See
http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html#420
41 for details on the class file format here.
BUT, Sun decided to strike out the clause in parenthesis with this
"clarification" from the Clarifications and Amendments document which
appeared sometime after 1.0 (in 1.1 I think):
"Resolution of Symbolic References
In JLS 12.3.3 the bullets discussing NoSuchFieldError and
NoSuchMethodError should be amended by omitting the restriction that the
declaration of the field or method must be local to class and may not be
an inherited field or method."
You can find this at:
http://www.javasoft.com/docs/books/jls/clarify.html
Well now this presents a problem. This "clarification" removes the
requirement that a field reference name a field that is actually in the
class named in the reference. In other words, you can reference a field
by declaring it to be in a class in which it actually doesn't exist --
for example, the FieldRef_info structure in the bar class above, we
could point to the field "a" in the class "bar" and according to this
"clarification", be perfectly correct. It would be up to the VM (or
whoever else cared, in my case an obfuscator for whom the details of
reference resolution are critically important) to determine that the
field referenced was not actually in the class named, but in a
superclass.
I have alot of problems with this change, and I will lay them out here.
I am hoping that someone out there can give me some good reasons why the
change that was made was valuable, because I certainly can't think of
any.
1. This breaks backwards compatibility. Sun went to *huge* lengths to
ensure that new language features in 1.1 and 1.2 (such as inner classes)
were backwards compatible (i.e. would compile into class files which ran
on 1.0 VMs without incident), and they did a great job. This is the
first time that I have run into any change which is not backwards
compatible. A 1.0 VM will choke when it sees that the reference is to a
field which does not exist in the named class, as rightly it should,
according to the original language specification.
2. This makes it much harder for VMs to do their job. Now the VM has
to, whenever a field is referenced which does not exist, look for the
field in all superclasses until the field is found. Currently not many
compilers (only one in my experiece, jikes) generate bytecode which use
these "virtual" field references, so it's generally not a problem. But
since Sun opened this can of worms I'm sure more and more compiler
writers will decide it's easier to let the VM do this resolution at
runtime than going through the trouble of making the compiler do the
resolution at compile time. So it's likely that more and more compilers
will generate this slower kind of field reference which requires the VM
to do a search through all superclasses for the field being referenced.
3. This makes it much harder for my obfuscator to do its job. Now I
have to, when I see a field being referenced that actually does not
exist in the class named, search through all superclasses for the field
until I find a matching one.
4. It serves no useful purpose whatsoever. It's not like the field
which is being referenced can change at runtime. If you are bar and you
reference "a", then it's always going to be foo's "a". That can't
change at runtime. So why bother allowing the compiler to reference it
as bar's "a" and make the VM determine every time it is referenced that
it's foo's "a" (or make the VM have to be smart enough to cache this
information, which it shouldn't have to do in the first place)?
I really would love to speak to the designers of the Java language and
find out what the heck they were thinking when they put this little
non-backwards-compatible, useless "clarification" in. I really think
they are doing a huge disservice to the Java langauge by breaking
backwards compatibility on something so minor, when they worked so hard
in so many areas to ensure backwards compatibility.
By the way, everything that I have said so far pertains to method
references as well, which have the exact same backwards-compatibility
problem due to the "clarification".
Thank you, and best wishes
Bryan
p.s. Here is an example of a class file which will run on 1.1+ VMs but
not 1.0 VMs, at least when compiled by jikes 1.06 (and perfectly
correctly according to the "clarification"), and which demonstrates the
problem at hand:
public class foo {
public static void main(String[] sArgs) {
bar b = new bar(7);
b.doit();
}
public void doit() {
System.out.println("a's value is: " + a);
}
int a = 3;
}
class bar extends foo {
public bar(int b) {
a = b;
}
}
Sent via Deja.com http://www.deja.com/
Before you buy.
[...]
> 4. It serves no useful purpose whatsoever. It's not like the field
> which is being referenced can change at runtime. If you are bar and you
> reference "a", then it's always going to be foo's "a". That can't
> change at runtime. So why bother allowing the compiler to reference it
> as bar's "a" and make the VM determine every time it is referenced that
> it's foo's "a" (or make the VM have to be smart enough to cache this
> information, which it shouldn't have to do in the first place)?
Let's have
class A
{
public int x;
}
class B extends A
{
}
class C extends B
{
void m()
{
x = 5;
}
}
Now if you refer to x as C.x, vm has to do the lookup. You could use A.x
as well in this case... but what if superclasses change, so int x would
go to B instead of A ? Class C would have to be recompiled, despite of
fact that class B still looks the same if looked at from class C. Always
using C.x allows to reuse same class without having to recompile. Thanks
to that you can do some changes inside libraries without requiring
developers to change apps. (I doubt if it is common case - exposing
public fields is not best thing for a library, but we are talking
theoretically).
Artur
As the reference is in the constant pool, I would expect that the look-
up would only happen once (for each class).
You're wrong about it serving no purpose. It allows fields (and methods)
to be moved from a subclass into a superclass without breaking code that
has been compiled against the old .class file.
Christian
In article <83t8uu$dn0$6...@f40-3.zfn.uni-bremen.de>,
ch...@uni-bremen.de (Christian Kaufhold) wrote:
> bryan...@my-deja.com wrote:
> [...]
> > 4. It serves no useful purpose whatsoever. It's not like the field
> > which is being referenced can change at runtime. If you are bar and
you
> > reference "a", then it's always going to be foo's "a". That can't
> > change at runtime. So why bother allowing the compiler to reference
it
> > as bar's "a" and make the VM determine every time it is referenced
that
> > it's foo's "a" (or make the VM have to be smart enough to cache this
> > information, which it shouldn't have to do in the first place)?
>
> As the reference is in the constant pool, I would expect that the
look-
> up would only happen once (for each class).
>
> You're wrong about it serving no purpose. It allows fields (and
methods)
> to be moved from a subclass into a superclass without breaking code
that
> has been compiled against the old .class file.
>
> Christian
Thank you for your reply.
You're right, the reference would probably only be resolved once per
class.
It breaks backwards compatibility with older VMs. I wonder if it's
really worth breaking such backwards compatibility to allow fields and
methods to be migrated in this way "transparently".
Furthermore while it makes it easier to make this kind of transparent
move happen, it also makes it easier for errors to occur due to the
resulting "ambiguity" of reference. Now instead of referring to a
field explicity, I can refer to "a field of a given name which exists
somewhere in a class inheritence chain". Seems potentially error
prone to me. Someone can change their class files to the extent of
removing whole methods or fields and the VM will still happily run my
classes which refer to these no-longer-existing entities, as long as
some superclass still has them (and the superclass's implementation
may be entirely different from the implementation in the class where I
originally referenced them, and which I was depending upon).
Anyway, despite my personal skepticism about the value of this change
that Sun made, I can certainly see your point and I thank you for
bringing it to my attention.
Thanks, and best wishes,
Bryan
Lead Condensity Developer
Thanks for your reply, Arthur.
As in my reply to another message in this thread, I can see your point
but I am not convinced that it was worth the breakage of backwards
compatibility.
I am of the opinion that changes in implementation which would change
the actual field or method being referenced ought to require a
recompile anyway. If I have a class which thinks that it is referring
to A's "x", is it really safe to assume that when it is moved down
into B that the code will still work? Of course it's all determined
by the particular code involved, but I think that as a general rule it
is error prone to try to continue to link against code which has
changed so fundamentally as to add or remove methods or fields which
"shadow" the methods or fields that I thought I was referring to.
Finally, if a compiler can behave correctly either by turning the
reference into a reference to A's "x" or C's "x" in the above code,
then how is any kind of consistency supposed to be attained? How can
any programmer know how his or her code will operate if it's up to the
compiler to decide what class to make the reference in? It would drive
me crazy, as a developer, to find that two compilers which both follow
the language specification generate code which potentially behaves
in entirely different ways.
Anyway, this is once again just my opinion, and I do appreciate your
bringing this point to my attention.
Thank you, and best wishes,
Bryan
Lead Condensity Developer
bryan...@my-deja.com wrote:
> compiler bug but after some digging I discovered that it is due to a
> change in the language spec (which only seems to be exploited by jikes
> starting somewhere between 0.47 and 1.06).
Yup, encountered that too.
> 4. It serves no useful purpose whatsoever. It's not like the field
> which is being referenced can change at runtime. If you are bar and you
> reference "a", then it's always going to be foo's "a". That can't
> change at runtime. So why bother allowing the compiler to reference it
> as bar's "a" and make the VM determine every time it is referenced that
> it's foo's "a" (or make the VM have to be smart enough to cache this
> information, which it shouldn't have to do in the first place)?
I believe the motivation for this to be RRBC. If class B
extends A, and A declares method foo(), then a third class,
Main, can call B.foo(). For virtual methods, it doesn't
really matter where the methodref in Main points, either A or
B is fine; in either case we still have to respect the
actual runtime type of the object and call the appropriate
foo() method.
RRBC enters into this when i start adding or deleting methods
without recompiling all the affected classes. Say B also
declared foo(), so Main refers to B.foo. If i choose to
delete the foo() method from B and instead rely on the foo()
method of A to implement B's behaviour, the old style of
methodref would break -- Main points to B.foo, and if B.foo
doesn't exist then it indicates a potential problem. With the
new style i can delete B.foo(), recompile just B, and my .class
file set is still consistent.
Similarly, if only A defines foo() then Main will refer to
A.foo in the old style and B.foo in the new style. If i then
introduce an overriding foo() in B, with the old style my
methodref in Main would then be technically incorrect. With
the new style Main would refer to B.foo, and the .class files
would still be fine afterward.
The same RRBC considerations apply to fields, with a little
less motivation (since they're not virtual to begin with).
But this new style would let you add and delete fields within
a hierarchy without necessarily recompiling all your code.
RRBC becomes an important issue for applications that use
3rd-party libraries. If i write an application using a
3rd-party library, and then once deployed the user/customer
decides to upgrade their 3rd-party library to a newer version
(say for bug fixes or feature enhancement), it would be nice
if i didn't have to recompile and re-deploy my code. With
the old style preserving compatibility in library code can be
a pain (have to leave forwarding methods etc behind), but the
new style gives a bit more flexibility.
OK, that's the only good reason i can think of for this change.
You've mentioned some bad points; i'd like to add debugging.
With the old style it was possible to get errors if i
added/deleted fields/methods without recompiling all affected
classes. The new style allows me to change the appearance of
a class without necessarily considering any of the classes that rely
on it. While good for RRBC, the errors indicated by the old
style are a significant debugging aid; they point out bits
of code i may have forgotten to change or think about.
ttfn,
cv
disclaimer: my opinions do not necessarily reflect the opinions
of my employer.
package pkg;
class A {
public int n ;
}
package pkg;
public class B extends A {
}
package other;
class X {
... pkg.B.n ....
}
-----------
Since A is not public, X's classfile cannot refer to "A.n" because it is not
allowed to refer to A. This rule has been in place a long time (see 5.1.1 of
the JVMS) but it wasn't enforced much. Sun needed to resolve this
contradiction. Either they needed to make the above ill-formed or they needed to
change the definition of what constitutes a legal classfile.
The way they resolved the contradiction was to allow "B.n" to be used in X's
classfile. In order to preserve as much compatibility with prior VM's as
possible they changed their compilers so that they would only generate
references to "B.n" when they had to. In other words if class A were declared
public in the above example, Sun's compilers continue to generate references to
"A.n" in X's classfile.
Jerry Schwarz wrote:
>
> The reason for this change is the following example
> -----------------
>
> package pkg;
> class A {
> public int n ;
> }
Surely this is not a very common situation - why would you need a public
field in a non-public class? The simplest approach is just to make n
inaccessible outside package pkg, and I can't see that this would cause
any problems.
> package pkg;
> public class B extends A {
> }
>
> package other;
> class X {
> ... pkg.B.n ....
> }
>
> -----------
>
> Since A is not public, X's classfile cannot refer to "A.n" because it is
> not allowed to refer to A. This rule has been in place a long time (see
> 5.1.1 of the JVMS) but it wasn't enforced much.
So it should have been enforced, rather than changing the spec incompatibly.
> Sun needed to resolve this
> contradiction. Either they needed to make the above ill-formed or they
> needed to change the definition of what constitutes a legal classfile.
>
> The way they resolved the contradiction was to allow "B.n" to be used
> in X's classfile. In order to preserve as much compatibility with prior
> VM's as possible they changed their compilers so that they would only
> generate references to "B.n" when they had to. In other words if class
> A were declared public in the above example, Sun's compilers continue to
> generate references to "A.n" in X's classfile.
Yuch. What an ugly hack. I wish Sun would discuss proposed spec changes
publicly (e.g. in this group), rather than unilaterally deciding on them
and then not telling anyone all the details.
- --
David Hopwood <hop...@zetnet.co.uk>
PGP public key: http://www.users.zetnet.co.uk/hopwood/public.asc
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
"Attempts to control the use of encryption technology are wrong in principle,
unworkable in practice, and damaging to the long-term economic value of the
information networks." -- UK Labour Party pre-election policy document
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv
iQEVAwUBOG6myjkCAxeYt5gVAQF2tQf/U+aX5v18HCv4TGyPo30rN18XyGqY3czr
fnkLwRcubg/RZb5pVKrQlew81OqYSCy9UrOdmTyQwXY5zvjoNzlSBVIlvwUJuoZq
YDUGK+uuGaB9jbNNkjSDDUb6CBliLzKwwEhtuhkm8lAwjWx7r+//gwa902Jh/w5P
Ier+aHX8PYOw2nc5CCOPJptjB1/Uh6w0vY09VzPYkiS4Iqb7IwF/jaShn1ktj2SV
4GJCAvnBY6Q7/9hjiEBJVJAYVseep3VRvHjN6FOylk28W6pTMrj2FoLN9UVd+ivy
61lEXr5RuGgg1mkwpS1PC08qkevTfuicFI1RSVttenQWNUB/apIVkQ==
=D3Og
-----END PGP SIGNATURE-----