class file size limits

29 views
Skip to first unread message

Per Bothner

unread,
Jul 6, 2009, 2:49:01 PM7/6/09
to jvm-la...@googlegroups.com
When I explained the JVM's method size limit to somebody
trying to run a large compiler-generated Scheme file
on Kawa, he commented:

> I thought Java was a modern language: but modern languages
> are not supposed to have arbritrary and pointless restrictions. :-(

Which of course is a valid point. It's really embarassing
in this day and age that we're still using such a lame classfile
format.

Binary compatibility should not be an issue: a class file generated
by javac targeting Java 6 is not going to run on Java 5, and
one targeting Java 7 is not going to run on Java 6.

Of course there are ways to split up a large method into multiple
methods or multiple class files, but of course that is the wrong
approach: Asking every language implementation to implement a
non-trivial *de*-optimization to work around a broken class file
format is really the wrong approach.

Of course it's not clear the best way to fix the format.
A simple-minded fix would replace all the u2 types by u4 types.
This would increase the class size by a bit, though perhaps
not all that much after compression. Some adaptive mechanism
would be better, so that compilers generate "large-model
class files" only when necessary.

Now if re-doing the class-file-format one would prefer to make
other fixes - for example allow multiple classes in the same
class file (to reduce constant pool duplication) or drop
old-style attributes that are subsumed by new attribute types.

However, let's not make the perfect be the enemy of the good
- at the very least we really should fix the size limitations,
because that is a hard limit. Having all these 16-bit limits
when the world is moving to 64-bit is ridiculous.

A strawman proposal - though I'm guessing other and
better-thought-out designs exist out there:

If the u2 constant_pool_count is zero, then the real
constant_pool_count follows as a u4. In that case, the
this_class, super_class, fields_count, methods_count, and
attributes_count are also u4. Likewise indexes and lengths
in field and method descriptors.

Each constant-pool entry has the same 16-bit format as now,
unless the tag has the high-order bit set, in which case
all the u2 indexes are u4.

(Notice it's helpful to allow a "long-mode classfile" to
contain mix of short-mode and long-mode entries of both
constants, fields, and methods, so a compiler can start by
generating short-mode entries, and only emit long-mode entries
when needed. )

For the various attributes, they can be in either short
mode or long mode - and a class file can have a mix.
One way we could indicate a long-mode attribute is that the
attribute_length has the high-order bit set.

More tricky is the actual Code attribute. It's preferable
for it "adaptive": Rather than a global flag that switch
between 16-bit and 32-bit offsets, perhaps it would be
better to have an extension of the wide opcode could be used
to indicate 32-bit offsets, since that makes it generate
code on-the-fly without having to know in advance if we
need a long-mode method.
--
--Per Bothner
p...@bothner.com http://per.bothner.com/

Charles Oliver Nutter

unread,
Jul 6, 2009, 7:17:21 PM7/6/09
to jvm-la...@googlegroups.com
On Mon, Jul 6, 2009 at 1:49 PM, Per Bothner<p...@bothner.com> wrote:
> Of course there are ways to split up a large method into multiple
> methods or multiple class files, but of course that is the wrong
> approach: Asking every language implementation to implement a
> non-trivial *de*-optimization to work around a broken class file
> format is really the wrong approach.

Not to mention that it's sometimes impossible to split things easily.
Imagine the case where you have a lot of embedded conditional or loop
logic manipulating variables at many levels of scoping. How do you
split that? We run into this case even now with JRuby, where a large
Ruby method can produce a gigantic amount of bytecode, with no easy
indication where it could be split into multiple pieces.

> Of course it's not clear the best way to fix the format.
> A simple-minded fix would replace all the u2 types by u4 types.
> This would increase the class size by a bit, though perhaps
> not all that much after compression.  Some adaptive mechanism
> would be better, so that compilers generate "large-model
> class files" only when necessary.
>
> Now if re-doing the class-file-format one would prefer to make
> other fixes - for example allow multiple classes in the same
> class file (to reduce constant pool duplication) or drop
> old-style attributes that are subsumed by new attribute types.
>
> However, let's not make the perfect be the enemy of the good
> - at the very least we really should fix the size limitations,
> because that is a hard limit.  Having all these 16-bit limits
> when the world is moving to 64-bit is ridiculous.

I know many people have asked for everything you mention, and the
truth is that even in JRuby's case, where we've bent the JVM over
backwards, we still have to work around most of these same
limitations. So would I support fixing them all? Absolutely. But I
suppose the limiting factor is getting someone to lead a JSR. For
better or worse, that's how changes get in.

The alternative would be to just hack these changes into javac and the
verifier ourselves, show how much nicer they are, and get them
fast-tracked through the JCP. That seems to be the most rapid way to
spin changes.

This sounds ok to me, but I'm not a .class format expert.

- Charlie

Frank Wierzbicki

unread,
Jul 6, 2009, 8:18:58 PM7/6/09
to jvm-la...@googlegroups.com
On Mon, Jul 6, 2009 at 2:49 PM, Per Bothner<p...@bothner.com> wrote:

> However, let's not make the perfect be the enemy of the good
> - at the very least we really should fix the size limitations,
> because that is a hard limit.  Having all these 16-bit limits
> when the world is moving to 64-bit is ridiculous.

This would indeed be great - the C implementation of Python can handle
very large method bodies, but Jython currently can't. So we see code
in the wild that we can't run. We will grow an interpreted mode in
the next few months, and so will be able to handle this (but with a
performance hit I'm sure). It would be much better if it just worked.

-Frank

Neal Gafter

unread,
Jul 6, 2009, 8:22:19 PM7/6/09
to jvm-la...@googlegroups.com
On Mon, Jul 6, 2009 at 4:17 PM, Charles Oliver Nutter <hea...@headius.com> wrote:
I know many people have asked for everything you mention, and the
truth is that even in JRuby's case, where we've bent the JVM over
backwards, we still have to work around most of these same
limitations. So would I support fixing them all? Absolutely. But I
suppose the limiting factor is getting someone to lead a JSR. For
better or worse, that's how changes get in.

The alternative would be to just hack these changes into javac and the
verifier ourselves, show how much nicer they are, and get them
fast-tracked through the JCP. That seems to be the most rapid way to
spin changes.

I think a new JSR would be overkill for this kind of change.  Specify and implement it, and it could be added to JDK7 as an update to JSR 202, which explicitly included the expansion of these limits in its charter but never actually made the necessary changes.  However, the changes should probably be done rather soon to get on that train.

John Rose

unread,
Jul 6, 2009, 8:37:20 PM7/6/09
to jvm-la...@googlegroups.com
On Jul 6, 2009, at 4:17 PM, Charles Oliver Nutter wrote:

The alternative would be to just hack these changes into javac and the
verifier ourselves, show how much nicer they are, and get them
fast-tracked through the JCP. That seems to be the most rapid way to
spin changes.

The implementation part of this would be a good Da Vinci Machine project.

The standardization part of it would need to be a new JSR.  Or an amendment to JSR 202, which would require rebooting that committee; I'm not sure that would be any better.

This fix is (and always has been) very desirable.  Getting it into JDK 7 would require extremely high-level pressure; the people guiding JDK7 have not planned on this, and it's getting late in the cycle.  (I know harried developers always say this, but it's true, really!)

The question about "good" vs. "better" (whether to go for a point-fix or provide a new code-file structure) is a crucial one.  The advantage of "good" is that it is quicker to demonstrate, and has a greater chance of getting into JDK7.  But if that chance is less than 10% (which I think it is), that particular advantage is negligible.  I would favor starting with the point-fix as a "finger exercise" but expecting to end with something more like Pack200 (minus the really woolly compression tricks) and/or Dalvik "dex" files.  I also recommend that we accept this must happen in a post-JDK7 world.

Beyond that, everybody will have a favorite add-on.  Mine is some extra provision for segmenting methods, classes, and packages into hot vs. cold and load eager vs. load lazy and execute vs. debug.

Back to Da Vinci:  Providing a proof of concept is a good way to get the ball rolling.  Anybody want to start on "code.patch"?   It has to be done at some point...  And see:

-- John

John Rose

unread,
Jul 6, 2009, 9:03:50 PM7/6/09
to jvm-la...@googlegroups.com
On Jul 6, 2009, at 11:49 AM, Per Bothner wrote:

> It's really embarassing
> in this day and age that we're still using such a lame classfile
> format.

Java's not perfect, but anything that supports such an impressive
ecosystem isn't exactly lame either.

But sure, there are imperfections (or equivalently, opportunities for
improvement). The major question from my point of view is not "how do
we fix them" but (a) "where's the code?" and (b) "where's the JSR?"
Supplying either (and both are necessary) is a big job, probably
requiring experience in JVM implementation. And the implementors are
really, really busy.

(I hope the Da Vinci Project will continue to help grow some new ones
to help the rest of us out!)

For small integers serialized into the class format, I suggest using
Pack's UNSIGNED5 format, which scales cleanly from 1 to 5 bytes, is
monotonic and continuous throughout the 32-bit range, and has
efficient (bit-twiddling) encoders and decoders. BTW, the HotSpot JVM
uses this internally for serialized data, so there are already
encoders and decoders written for it in both C++ and Java.

> More tricky is the actual Code attribute. It's preferable
> for it "adaptive": Rather than a global flag that switch
> between 16-bit and 32-bit offsets, perhaps it would be
> better to have an extension of the wide opcode could be used
> to indicate 32-bit offsets, since that makes it generate
> code on-the-fly without having to know in advance if we
> need a long-mode method.

I agree, but I think we would need a new prefix 'wide4', since 'wide'
already means "change 1 byte field to 2 bytes", where it has meaning
(local load/store/inc, ret). Or maybe a 'wide' prefix could be
duplicated for extra effect, where it already means "2 bytes", as with
C's "long long" type? Or maybe we just leave the limitation in of
2**16 locals, and have it mean "2 bytes" or "4 bytes" in an ad hoc
manner.

BTW, I'm expecting we'll use a prefix for encoding tailcall and tuple
references also, not that it has to be the historical "wide" opcode.

-- John

John Rose

unread,
Jul 6, 2009, 9:19:33 PM7/6/09
to jvm-la...@googlegroups.com
On Jul 6, 2009, at 5:22 PM, Neal Gafter wrote:

> JSR 202, which explicitly included the expansion of these limits in
> its charter but never actually made the necessary changes.

Good point... Given that they did not exhaust their charter, I guess
it would be more reasonable to revive the JSR, than to create a new
one. The JCP allows specification amendments.

-- John

Jim Baker

unread,
Jul 6, 2009, 9:31:25 PM7/6/09
to jvm-la...@googlegroups.com
Although I would prefer to have large code objects too, I don't see this as a high priority for Java 7. For Python, such code objects are invariably generated code, typically for initializing data structures, and are not performance critical. I'd imagine that's the typical use case seen in other languages. So unless the required changes are truly minimal, it may make more sense to invest development attention in areas with greater payoff.

It helps that we have already a solution under development. Still, Jython's interpreted mode (implemented through a VM that executes CPython bytecode) is useful for many other purposes.

Incidentally, it would be nice if large switches got faster... :)

- Jim
--
Jim Baker
jba...@zyasoft.com

Charles Oliver Nutter

unread,
Jul 6, 2009, 11:11:33 PM7/6/09
to jvm-la...@googlegroups.com
On Mon, Jul 6, 2009 at 8:31 PM, Jim Baker<jba...@zyasoft.com> wrote:
> Although I would prefer to have large code objects too, I don't see this as
> a high priority for Java 7. For Python, such code objects are invariably
> generated code, typically for initializing data structures, and are not
> performance critical. I'd imagine that's the typical use case seen in other
> languages. So unless the required changes are truly minimal, it may make
> more sense to invest development attention in areas with greater payoff.

In JRuby, it's not really a "problem" to have large bodies of code,
but it does limit some use cases. For example, people often want to
AOT compile everything for obfuscation purposes. If there's large
bodies of code that don't fit into 64k, we're stuck. We also would
like to have a mode that runs 100% compiled, and if there's giant
chunks of code we can't do anything with them.

We are, on the other hand, working on a new compiler that should make
it easier to split logic into pieces. It's just one of those
implementation headaches we have to deal with...just like other
platforms do.

And as I've said many times, it's all our collective work on JVM
languages that have made improvements like JSR292, JSR223, debugging,
.class file format, standard MOP, and much more important projects. I
would just like to see us as language implementers spending more time
collaborating on common projects, like hacking javac (it's fun,
really!) or improving .class format or what have you. Grousing about
long-time limitations of the platform makes noise but doesn't
necessarily accomplish anything.

> Incidentally, it would be nice if large switches got faster... :)

We would love to have a bytecode interpreter in JRuby, but the large
switch performance is the limiting factor. Alternatively you may want
to look at implementing your interpreter by loading the bytecodes into
their own polymorphic Instruction objects...you'll see *substantially*
improved performance as a result.

- Charlie

Per Bothner

unread,
Jul 7, 2009, 12:24:25 PM7/7/09
to jvm-la...@googlegroups.com
On 07/06/2009 06:03 PM, John Rose wrote:
> On Jul 6, 2009, at 11:49 AM, Per Bothner wrote:
>
>> It's really embarassing
>> in this day and age that we're still using such a lame classfile
>> format.
>
> Java's not perfect, but anything that supports such an impressive
> ecosystem isn't exactly lame either.

Java is not perfect, but parts of it are lame (including parts
I've worked on) - given its size, that's inevitable. The
extensibility of the class file format has worked out very well.
But its size limitations and inefficiencies are unfortunate, and
it should have been updated or replaced a long time ago.

> For small integers serialized into the class format, I suggest using
> Pack's UNSIGNED5 format, which scales cleanly from 1 to 5 bytes, is
> monotonic and continuous throughout the 32-bit range, and has
> efficient (bit-twiddling) encoders and decoders.

That seems a good idea. I'll follow up more in a separate message.

> BTW, the HotSpot JVM
> uses this internally for serialized data, so there are already
> encoders and decoders written for it in both C++ and Java.

Does that mean HotSpot can directly execute Pack200 files?
Or could be made to do so without excessive work?

If so, would using Pack200 more-or-less-directly as a
classfile replacement be worth considering? I assume not:
Pack200 is complicated to both produce and decode, and the
Java api is oriented towards batch conversion to/from a
jar file.

Jim Baker

unread,
Jul 7, 2009, 12:42:26 PM7/7/09
to jvm-la...@googlegroups.com

Cool tip, the polymorphic instruction object approach should have direct applicability to our regex engine, a mini-VM that's a direct port of the sre engine in CPython.

However, if I get the idea correctly, it may be less applicable to our serialization mini-VM (pickle), because of object allocation overhead for the instruction objects. But maybe we can be clever.

--
Jim Baker
jba...@zyasoft.com

John Rose

unread,
Jul 7, 2009, 1:03:14 PM7/7/09
to jvm-la...@googlegroups.com
On Jul 7, 2009, at 9:24 AM, Per Bothner wrote:

> Does that mean HotSpot can directly execute Pack200 files?
> Or could be made to do so without excessive work?

No, the UNSIGNED5 format is just a variable-length 32-bit int encoding
used in most places by pack.

> Pack200 is complicated to both produce and decode, and the
> Java api is oriented towards batch conversion to/from a
> jar file.

Pack is a serialization format for a group of classes and their
methods. It is not tied strongly to JAR, and could skip the JAR step
for loading into the JVM. A strength-reduced version of the format
(sort of like zip -1) would not be hard to decode. Re-thinking the
code pipeline from disk to JIT requires a lot of work, but (and this
is all I'm suggesting) Pack could play a role in that pipeline.

-- John

Per Bothner

unread,
Jul 7, 2009, 1:32:07 PM7/7/09
to jvm-la...@googlegroups.com
On 07/06/2009 06:03 PM, John Rose wrote:
> For small integers serialized into the class format, I suggest using
> Pack's UNSIGNED5 format, which scales cleanly from 1 to 5 bytes, is
> monotonic and continuous throughout the 32-bit range, and has
> efficient (bit-twiddling) encoders and decoders.

A simple, efficient, and easily-implementable idea is to
just replace all the u2 counts and indexes in classfile
format by unsigned5. We can also replace u4 counts and sizes,
to reduce the typical size of class files.

In terms of specification, I'd (mostly) use two new types:
v2 - either u2 or unsigned5, depending on version number/flags
v4 - either u4 or unsigned5, depending on version number/flags

Thus for example the Code attribute:

Code_attribute {
v2 attribute_name_index;
v4 attribute_length;
v2 max_stack;
v2 max_locals;
v4 code_length;
u1 code[code_length];
v2 exception_table_length;
{ v2 start_pc;
v2 end_pc;
v2 handler_pc;
v2 catch_type;
} exception_table[exception_table_length];
v2 attributes_count;
attribute_info attributes[attributes_count];
}

This approach has the advantage that it's simple to
modify an existing classfile reader or producer;
the class files are compact; and it's easy to
read/write "legacy" class files depending on a switch
or a version number.

The actual instructions could also be changed to take
an unsigned5 where appropriate. For example invokeXxx
would be followed by an unsigned5 instead of (indexbyte1,
indexbyte2). The if<cond> instructions would be followed
by branch5, rather than (branchbyte1,branchbyte2), etc.

Basically, using encoding of Pack200 for the counts and
offsets, but maintaining the structure of class class files.

Alternatively, we could use 'wide wide' or a 'wide4'
instruction. That has the advantage that we can generate
code for the legacy format until we find out we need the
large model, rather than having to know up-front. I don't
know how useful that is - presumably one would specify a
-target flag if they want legacy format.

The classic encoding of switch statements is for direct
lookup in direct bytecode interpreters. Assuming there
are few or no interpreters that don't do at least *some*
processing of the instruction stream, we could encode
the switches more efficiently.

Reply all
Reply to author
Forward
0 new messages