Greetings Simon,
Yes, for this particular benchmark and what was involved (iteration, at/atPut prims, optimized integer arith) I suspected 2x for 9.1 and about 1x for 8.6.3. And it turned out to be pretty much exactly that.
Had it involved any appreciable amount of float calculations or certain optimized iteration statements (that I didn't see), the IBM vm would have lost by a landslide...not even close.
And sometimes its statement by statement (I give an example at the end).
I wish I could do 8.6.3 on 64-bit...but of course that's not possible so I'm stuck with a 32-bit only comparison.
Some interesting notes:
- IBM vm was 32-bit only, had 1st tier JIT (typically means non-optimizing compiler and designed to generate the native code quickly)
- IBM vm was i386 optimized and lacked some of the necessary optimizations for deeper pipeline processors that came in 486 and later.
- IBM vm code generator is all in ENVY/Smalltalk. It is essentially unrestricted and has no runtime rules, it can jump wherever it wants and even use ESP stack pointer as a general purpose register (something we can't do in llvm) Doesn't much matter for 64-bit, but does for 32-bit x86 where you barley have any registers to work with.
- LLVM vm is 32-bit and 64-bit, and now has a similar 1st tier JIT with more optimizations applied
- LLVM vm is optimized for modern processors and generates far superior code
- LLVM vm can not use the ESP register like IBM vm, so for some small sections of code it might not be possible to beat IBM (because x86 is so register starved). But usually, the LLVM code gen quality makes up for this.
- LLVM vm code generator is in C++, and we use C++ Classes and C++ Lambdas to recreate the majority of the ENVY/Smalltalk description of the model. Not quite as nice, but we can map one to the other easily enough.
- LLVM vm is coded using a pure SSA style. This caused issues so we created additional infrastructure to allow us to code SSA without so many mistakes.
- LLVM vm uses a function-oriented design pattern. Bytecodes, Primitives, Return Points...everything is a function. Each function uses a special calling convention we created and is the only patch to LLVM base code. This calling convention helps us create guaranteed tail-call optimized functions and pin registers with calling convention argument specification (arguments like pc, sp, bp...). This is also how we tell LLVM to perform jmp assembly statements instead of call statements as it transitions from bytecode to bytecode in the interpreter.
- LLVM vm 1st tier JIT is generated by doing a second pass on the Interpreter (during compile time) which generates the native templates for all bytecodes, return points, prims... So we reuse the same code as the Interpreter for the most part to build the JIT, but the template versions of a bytecode will look slightly different than what the Interpreter will run.
As I mentioned, for a given benchmark you really have to almost inspect it statement by statement. For example, try this in <= 8.6.3 and 9.1. 8.6.3 should win but it doesn't...and this is because there is just some bad code generation in the IBM vm and some really good code generation in LLVM. There are lots of examples like these...which is why I like to see this run on larger applications and hear feedback. So I look forward to ECAP2 feedback.
"Try in 8.6.3 and 9.1...8.6.3 should win...but won't (or didn't on my machine)"
| t |
t := Time millisecondsToRun: [
100 timesRepeat: [| total |
total := 0.
1 to: 1000000 do: [:each | total := total + each]]
].