0. Hotspot versus JVM versus arbitrary bytecoded VMs
First, a general complaint: what JVM are you talking about here? They
all have different ways of optimizing. Some don't optimize at all.
Your use of "JVM" here is almost meaningless.
JVM is also just one example of a bytecoded VM. There are many others,
some of them designed for dynamic languages (e.g. Ruby 1.9, Rubinius,
Python, Parrot). You're basing a lot of your argument on the JVM (and
on incorrect statements about the JVM, at that).
From here we'll assume that by JVM you mean the reference
implementation, OpenJDK/Hotspot.
1. "performance is generally not on par with specialized VMs (see
JRuby or Rhino)."
JRuby has been one of the fastest Ruby runtimes for a few years now,
and with the addition of invokedynamic to the JVM it has made a big
leap even further ahead. JRuby is the fastest or one of the fastest
Ruby implementations today, and will continue to get faster in the
coming months as we make better use of invokedynamic.
First, a general complaint: what JVM are you talking about here?
There are many others, some of them designed for dynamic languages (e.g. Ruby 1.9, Rubinius, Python, Parrot).
1. "performance is generally not on par with specialized VMs (see JRuby or Rhino)." JRuby has been one of the fastest Ruby runtimes for a few years now, and with the addition of invokedynamic to the JVM it has made a big leap even further ahead. JRuby is the fastest or one of the fastest Ruby implementations today
2. JVM does not let you do what Java cannot do
Sometimes you can get around "missing" opcodes by compiling to other existing ones, but often that's impossible while still getting good performance. Tail-call elimination, continuations (call/cc), eval(), floats, and long doubles require specialized opcodes to be fast.
You're not wrong here, but I don't see what the point is. Any runtime you choose to target will have its own shape.
are you trying to say that there's some runtime out there that can support all possible language features equally well?
You can't possibly be arguing that allowing users to run arbitrary source is more secure than allowing users to run arbitrary bytecode, can you?
But the justifications you give in this article are misleading at best and often completely wrong.
Very tangentially, I always thought this post from back in the day
was a good way to explain the "why no VM" stance of DART:
https://groups.google.com/a/dartlang.org/group/misc/msg/5f86f14a4961e5b7
It made sense to me anyway. I was surprised that more of that posts
points didn't make it into the article. It's less "bytecode VMs are
bad" and more "bytecode VMs don't work for today's JS developers
because they expect the browser to compile/debug/run the source
anyway".
- Stephen
I can't let this one slide :)
Your article on "Why Not A Bytecode VM" contains a multitude of
inaccuracies about the JVM. Even closer to my heart, it contains a
complete falsehood about JRuby.
http://www.dartlang.org/articles/why-not-bytecode/
Your claims, with rebuttal:
0. Hotspot versus JVM versus arbitrary bytecoded VMs
First, a general complaint: what JVM are you talking about here? They
all have different ways of optimizing. Some don't optimize at all.
Your use of "JVM" here is almost meaningless.
JVM is also just one example of a bytecoded VM. There are many others,
some of them designed for dynamic languages (e.g. Ruby 1.9, Rubinius,
Python, Parrot). You're basing a lot of your argument on the JVM (and
on incorrect statements about the JVM, at that).
From here we'll assume that by JVM you mean the reference
implementation, OpenJDK/Hotspot.
1. "performance is generally not on par with specialized VMs (see
JRuby or Rhino)."
JRuby has been one of the fastest Ruby runtimes for a few years now,
and with the addition of invokedynamic to the JVM it has made a big
leap even further ahead. JRuby is the fastest or one of the fastest
Ruby implementations today, and will continue to get faster in the
coming months as we make better use of invokedynamic.
2. JVM does not let you do what Java cannot do
This is blatantly false. JRuby is a very fast dynamic language on the
JVM with or without invokedynamic, and for a while now has been the
Ruby to beat when it comes to Ruby execution performance. This comes
on top of the fact that:
* We don't create a Java class for each Ruby class and have our own
class hierarchy and method table logic.
* Ruby has a limited form of multiple inheritance through mixins/
modules, which we implement correctly.
* Ruby's classes are mutable.
Furthermore, none of these things has *anything* to do with bytecode.
The JVM itself doesn't care if you create classes in the same way as
Java, or if you use a normal JVM class hierarchy at all.
3. A bytecode VM can't support all possible language features.
No VM or runtime can natively support all possible language feature.
Or, put another way, any VM can support all possible features. The
damn things are obviously turing complete, so saying that by using a
bytecode VM you can't implement some language feature is obviously
wrong.
Also, the "complexity" from adding new opcodes is generally blunted by
not adding new opcodes. The JVM instruction set has not changed
significantly in 15 years and has supported hundreds of languages --
many of them dynamic-typed -- during that time. invokedynamic is a new
addition, but a naive implementation of it adds very little VM
complexity. Yes, optimizing it adds complexity, but it's exposing
optimized dynamic call site binding to as a user-exposed API. You'd
expect that to take some heavy lifting to run fast.
4. A bytecode VM is more than just bytecode
You're not wrong here, but I don't see what the point is. Any runtime
you choose to target will have its own shape. Compiling to Javascript
works best when your feature set matches Javascript. If it doesn't you
have to work around the runtime. Compiling to NativeClient works best
if your feature set fits what NativeClient provides. If it doesn't,
you have to build your own stuff. Compiling to x86 works best if your
feature set fits x86. If it doesn't, you have to work around it. This
is pretty much a tautology...are you trying to say that there's some
runtime out there that can support all possible language features
equally well?
5. JVM can't optimize for non-statically-typed languages
Again, this is completely wrong. The JVM doesn't care about static
types. It just cares about objects and code. Ruby is dynamically
typed, but the JVM is able to inline across dynamic call boundaries,
optimize dynamically-typed code, and efficiently manage (and in some
cases eliminate) allocated objects. It is precisely because of this
that JRuby is able to perform so well.
I'll go toe-to-toe with anyone who claims the JVM has to be running a
statically-typed language to optimize, because I actually read the
assembly code it generates for JRuby on a regular basis.
6. The JVM is statically typed.
The JVM is not statically typed. http://blog.headius.com/2008/09/first-taste-of-invokedynamic.html
7. Allowing users to run their own bytecode opens up security holes.
You can't possibly be arguing that allowing users to run arbitrary
source is more secure than allowing users to run arbitrary bytecode,
can you? Language grammar can dictate no more than a bytecode
specification, and both are subject to the exact same sorts of
exploits. In fact, bytecode is likely *much* easier to secure because
the set of operations is considerably smaller. OpenJDK's bytecode
verifier has a rigorous proof to accompany it that *proves* it's
secure. Do you have such a thing for Dart? Can you make one? Now, can
you guarantee it will remain secure across all target runtimes?
---
Now, I'm not arguing with your primary point: that Dart is better
compiling to source, rather than to some kind of bytecoded VM. I'm not
arguing for it either...it's your choice, and the runtimes you intend
to target will certainly influence it. But the justifications you give
in this article are misleading at best and often completely wrong.
- Charlie
Thanks for the "merry christmas".
But at least my christmas was merrier than yours must have been, writing such mails on 25 dec.
Actually, it has everything to do with why we don't have a bytecode VM: Focus.
On Friday, December 23, 2011, Charles Oliver Nutter <hea...@headius.com> wrote:
> On Dec 23, 2:54 am, "Seldaiendil D. Flourite" <seldaiend...@gmail.com>
> wrote:
>> I thought he was talking about the more extended JVM, after all if you
>> distribute your code it will run on the client VM
>
> OpenJDK has a "tiered" mode now that can start up as fast as client
> and optimize as well as server. It's like to be the standard, if Java
> in the browser ever returns from the grave.
>
> In any case, the failing of JVM developers to recognize and address
> the needs of an in-browser VM have little bearing on whether "bytecode
> VMs" are a good or bad idea.
We're not building another Java. We're building Dart, a "programming language for creating structured web applications".
Source code is simple. Bytecodes are an additional complication we don't need.
Sun never really focused on making something that is easy to use. It all got lost in volumes of specifications of things that no developer would ever need, that only other vendors would need.
We make the implementation and our tests available with virtually no strings attached, and specify the things developers actually use. This way we support other vendors even better than Sun or Oracle ever did.
We're interested in supporting other programming languages compiling to Dart, yet there are certain features (threads, most notably) that we don't plan to support. We would not support these things even if we had a specified byte code format.
Byte codes brings nothing to the table but extra complications, some of which Bob and Florian, bless their hearts, tried to illustrate.
Not in Java:
$ cat > Hello.java
public class Hello {
public static void main(String[] arguments) {
System.out.println("Hello, World!");
}
}
$ javac Hello.java
$ wc -c Hello.java Hello.class
116 Hello.java
417 Hello.class
533 total
$ javac -source 1.4 -target 1.4 -g:none Hello.java
$ wc -c Hello.java Hello.class
116 Hello.java
337 Hello.class
453 total
$ bzip2 Hello.java
$ bzip2 Hello.class
$ wc -c Hello.*
286 Hello.class.bz2
134 Hello.java.bz2
420 total
Here is a dump of the information of the latter version of Hello.class
(the former contains debugging information and stack maps):
Hello.class
Magic number: 0xCAFEBABE
Class file version: 48.0
Constant pool {
#1: void <init>() in java.lang.Object
#2: java.io.PrintStream out in java.lang.System
#3: "Hello, World!"
#4: void println(java.lang.String) in java.io.PrintStream
#5: Hello
#6: java.lang.Object
#7: <init>
#8: ()V
#9: Code
#10: main
#11: ([Ljava/lang/String;)V
#12: void <init>()
#13: java.lang.System
#14: java.io.PrintStream out
#15: Hello, World!
#16: java.io.PrintStream
#17: void println(java.lang.String)
#18: Hello
#19: java/lang/Object
#20: java/lang/System
#21: out
#22: Ljava/io/PrintStream;
#23: java/io/PrintStream
#24: println
#25: (Ljava/lang/String;)V
}
[ACC_PUBLIC, ACC_SUPER] class Hello extends java.lang.Object {
Method: void <init>() [ACC_PUBLIC] {
Code attribute maxStack = 1, maxLocals = 1, codeLength = 5 {
0x0000(00000): 0x2a aload_0
// Load reference from local variable.
0x0001(00001): 0xb7 invokespecial 1 /* void <init>() in
java.lang.Object */
// Invoke instance method; special handling for
superclass, private, and instance initialization method invocations.
0x0004(00004): 0xb1 return
// Return void from method.
Exception table
}
}
Method: void main(java.lang.String[]) [ACC_PUBLIC, ACC_STATIC] {
Code attribute maxStack = 2, maxLocals = 1, codeLength = 9 {
0x0000(00000): 0xb2 getstatic 2 /* java.io.PrintStream out in
java.lang.System */
// Get static field from class.
0x0003(00003): 0x12 ldc 3 /* "Hello, World!" */
// Push item from runtime constant pool.
0x0005(00005): 0xb6 invokevirtual 4 /* void
println(java.lang.String) in java.io.PrintStream */
// Invoke instance method; dispatch based on class.
0x0008(00008): 0xb1 return
// Return void from method.
Exception table
}
}
}
> which means fewer bytes being sent
> down the wire. There's also the possibility of an in-memory representation
> to avoid any performance overhead required to parse the source at the client
> end.
I think you're talking about a "binary representation", not "byte codes" :-)
We are contemplating using a binary representation in some cases. Most
likely, this will be no more than a binary token stream. If this is to
be used for communication between isolates running on different
machines, it will be specified.
>
> I'm not arguing against the Dart team decision to use source, just pointing
> out that it does bring something to the table other than complications.
>
I'm not convinced :-)
Here are some things that Java byte codes doesn't (guarantee) to bring
to the table:
* Names of local variables (temporaries)
* Names of parameters (of methods and constructors)
* Reification of generic types
* Comments
* Annotations
* Debugging information (something as simple as line numbers in stack traces)
* Fast application start up (class data sharing is using
implementation specific format and stil doesn't do enough). FYI:
http://docs.oracle.com/javase/1.5.0/docs/relnotes/features.html#vm_classdatashare
Cheers,
Peter