[jvm-l] Trace compiler

Rémi Forax

unread,

May 23, 2010, 8:39:46 PM5/23/10

to jvm-la...@googlegroups.com

Last week, I've written a small trace compiler (at least a kind of) for phpreboot.
For those who did not attend to last JVM Summit, here is the basics:
http://wiki.jvmlangsummit.com/Trace-based_JIT

In my case, phpr is an interpreter written in Java.
it basically parses an instruction or a block, create the corresponding AST
and interpret it by walking on the AST.
It also comes with a compiler that transform function and lambda
directly into bytecode.
I use JSR292 method handles to jump back and forth between the interpreter
and the compiled code.

During last week, I've written a trace compiler which compile a while loop
during its execution. I've added a counter that count each iteration
and when the counter overflow, it compiles the test of the loop
and its body into a bytecode blob and execute it.
Furthermore, during the first runs, I record the runtime type of the variables
so when I generate the loop body, I use these information to specialize
the generated bytecode (mainly in order to remove all boxing).
And it works very well !

I wonder if someone had already tried to do the same with its own JVM Language ?

For the curious, the source are here:
http://code.google.com/p/phpreboot/source/browse/trunk/phpreboot/src/com/googlecode/phpreboot/compiler

Rémi

--
You received this message because you are subscribed to the Google Groups "JVM Languages" group.
To post to this group, send email to jvm-la...@googlegroups.com.
To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.

Rémi Forax

unread,

May 24, 2010, 8:29:00 PM5/24/10

to jvm-la...@googlegroups.com

Hum, nobody answer ?

I have written a blog with a small benchmark that can be instructive:
http://weblogs.java.net/blog/forax/archive/2010/05/24/jvm-language-developpers-your-house-burning

Guillaume Laforge

unread,

May 25, 2010, 2:44:37 AM5/25/10

to jvm-la...@googlegroups.com

The article is great and interesting...

But I'll react on the FUD aspect :-)
(otherwise, it *is* really interesting minus the apple/orange comparison part)

But your Groovy sample is really not reflecting reality, and makes
apples and oranges comparisons.
By default Groovy uses BigDecimal for it's decimal numbers, which
means you're doing a mix of double and BigD arithmetics, which slows
down Groovy terribly, compared to the other languages which use
doubles by default.
So this is a bit misleading.
Add a 'd' suffix to the numbers, and you'll have a more fair
comparison! (ie. 3.4d, 4d, etc)

Guillaume

--
Guillaume Laforge
Groovy Project Manager
Head of Groovy Development at SpringSource
http://www.springsource.com/g2one

Charles Oliver Nutter

unread,

May 25, 2010, 3:31:26 AM5/25/10

to jvm-la...@googlegroups.com

JRuby has an optimizing compiler in the works that can lift numeric
operations to primitives. Hopefully we'll have it running interpreted
for JRuby 1.6 and get it compiling shortly after.

Rémi Forax

unread,

May 25, 2010, 5:29:44 AM5/25/10

to jvm-la...@googlegroups.com

It's not FUD.
I was not aware (I think I am not the only one) that groovy default for
real number is BigDecimal.

Rémi

Guillaume Laforge

unread,

May 25, 2010, 5:46:01 AM5/25/10

to jvm-la...@googlegroups.com

Salut Rémi,

I was being a bit provocative by saying it was "FUD", sorry ;-)
Thanks a lot for showing how Groovy performed when using doubles
instead -- ie. comparing the same thing for all those languages.
Using BigD by default is a very old decision we made (as I briefly
explained in the comments of your post), and is not something we can
easily come back on.
Nor do we have the intention to come back on it, as (oddly as it may
sound) it's been an aspect of Groovy that make it a great sell to
financial institutions! (insurance companies, banks, hedge funds
managements, trader and bank software vendors, etc.)

Guillaume

--
Guillaume Laforge
Groovy Project Manager
Head of Groovy Development at SpringSource
http://www.springsource.com/g2one

Charles Oliver Nutter

unread,

May 25, 2010, 7:06:54 AM5/25/10

to jvm-la...@googlegroups.com

I profiled the benchmark in JRuby, running with --fast (for all
optimizations in JRuby), and found some interesting things:

* Doing a sampling profile, these two methods gathered the most
samples; JRuby itself barely registered.

Stub + native Method
54.8% 0 + 2633 java.lang.StrictMath.pow
40.3% 0 + 1934 java.lang.StrictMath.exp

* A memory profile shows allocations almost completely dominated by
RubyFloat instances:
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 90.41% 90.41% 24433056 763533 235295808 7352994 301908
org.jruby.RubyFloat

Using this measurement, you can see that RubyFloat instances are 32
bytes large for JRuby (64-bit double + 32-bit flags + 32-bit reference
to metaclass + the object itself). Reducing the number of RubyFloat
objects would probably do the most to improve this benchmark (or
getting the JVM to do a better job of eliminating those objects;
DoEscapeAnalysis doesn't appear to help this benchmark at all).

Reducing the size of RubyFloat objects would also help. A normal
java.lang.Double would be 8 bytes smaller (no flags or metaclass). So
this benchmark appears to either bottleneck on the Math methods (seems
a bit unlikely) or on RubyFloat allocations (more likely). Optimizing
to lift RubyFloat objects to actual double primitives wouldn't be
hard...we just haven't done it yet.

It would certainly be nice if the JVM eliminate those RubyFloat
objects, since it appears that almost everything here inlines. Perhaps
there's something missing that would allow escape analysis to
eliminate the objects?

- Charlie

Charles Oliver Nutter

unread,

May 25, 2010, 7:08:44 AM5/25/10

to jvm-la...@googlegroups.com

It's probably also worth mentioning that because of how Float objects
work in Ruby, every float in the body of the loop (and even in the
conditional) are created new each time they're encountered. That would
probably be a simple way to reduce the overhead of this benchmark and
put it probably where Groovy is.

Guillaume Laforge

unread,

May 25, 2010, 7:34:46 AM5/25/10

to jvm-la...@googlegroups.com

Interesting. On Groovy's side, doing this doesn't change anything, I
get more or less the same results.

Anyhow, the problem of all those micro-benchmarks is that you need to
know the various languages well to give an accurate comparison (ie not
apples and oranges). But not focusing on the comparison per se (which
is always great for FUD, inflamatory emails, and usual pissing games)
and seeing where our implementation is lacking is more interesting.

On Tue, May 25, 2010 at 13:08, Charles Oliver Nutter

--
Guillaume Laforge
Groovy Project Manager
Head of Groovy Development at SpringSource
http://www.springsource.com/g2one

Charles Oliver Nutter

unread,

May 25, 2010, 2:27:11 PM5/25/10

to jvm-la...@googlegroups.com

I'm sure this doesn't need pointing out, but performance numbers for
an experimental new language that carries none of the baggage of
existing languages (especially languages not originally written for
the JVM) is a bit suspect :)

Going on to claim that JVM language developers' "house is burning"
because your language has started to do math faster (and not
drastically faster, I'd add; even Java's only 4x faster than JRuby on
this benchmark) is probably going a bit far.

Of course you've seen numeric benchmarks I wrote for the dynamic
language Surinx that were many times faster than JRuby, but of course
Surinx wasn't Ruby and I wouldn't claim that there's something Surinx
was doing (using JSR292) that could potentially improve JRuby the same
amount. Duby now supports dynamic invocation and performs very well
with JSR-292, but again there's no real comparison...it's not Ruby.

JSR-292 is obviously very exciting work, especially if it can optimize
primitive call paths straight through without any boxing along the way
(something that's very difficult for most dynlangs on JVM). Are you
able to get the same performance using the 292 backport on Java 6?
Those of us with users need to continue supporting the JVMs they
run...

Rémi Forax

unread,

May 25, 2010, 6:26:15 PM5/25/10

to jvm-la...@googlegroups.com

Le 25/05/2010 20:27, Charles Oliver Nutter a �crit :

> I'm sure this doesn't need pointing out, but performance numbers for
> an experimental new language that carries none of the baggage of
> existing languages (especially languages not originally written for
> the JVM) is a bit suspect :)
>

You're right.
I've designed the language to perform well on JVM and
not port an existing language.
It's unfair :)

> Going on to claim that JVM language developers' "house is burning"
> because your language has started to do math faster (and not
> drastically faster, I'd add; even Java's only 4x faster than JRuby on
> this benchmark) is probably going a bit far.
>

It don't only do math faster. I do all codes that use primitives faster.
math is just an example.

I have a small problem. I currently hit easily the limit of 10
argument max for a method handle imposed by the current state
of the jdk7 that why I have to limit the test to only small benchmarks.

> Of course you've seen numeric benchmarks I wrote for the dynamic
> language Surinx that were many times faster than JRuby, but of course
> Surinx wasn't Ruby and I wouldn't claim that there's something Surinx
> was doing (using JSR292) that could potentially improve JRuby the same
> amount. Duby now supports dynamic invocation and performs very well
> with JSR-292, but again there's no real comparison...it's not Ruby.
>

My point is not that JSR292 directly help to have better performance,
it most probably will. But above all, it drastically simplify the code
of the runtime
and let us to focus on what is in my opinion more important i.e provide
good profile type information to the VM.

I you write a runtime compiler that is able to optimize/deopt and reopt
when necessary based on type profiling.
You will get a boost. This is applicable for any dynamic languages.

> JSR-292 is obviously very exciting work, especially if it can optimize
> primitive call paths straight through without any boxing along the way
> (something that's very difficult for most dynlangs on JVM). Are you
> able to get the same performance using the 292 backport on Java 6?
> Those of us with users need to continue supporting the JVMs they
> run...
>

No, I haven't the same perf with the backport.
The perf are better :)

1.3178402348194146E24
real 0m16.291s
user 0m18.500s
sys 0m0.179s

R�mi

Rémi Forax

unread,

May 25, 2010, 8:19:42 PM5/25/10

to jvm-la...@googlegroups.com

Le 26/05/2010 00:26, R�mi Forax a �crit :

Hum, I was a little bit optimistic about the perf of the backport.
Perf are not good for short time lived script.
(phpr16.sh -> phpr for 1.6 VM, phpr.sh -> phpr for 1.7 VM)

[forax@localhost phpreboot]$ time bin/phpr16.sh
test/testtraceoptimistic2.phpr
61075
0123456789101112131415161718192021222324252627282930313233343536373839...
real 0m2.004s
user 0m3.633s
sys 0m0.196s

[forax@localhost phpreboot]$ time bin/phpr.sh
test/testtraceoptimistic2.phpr
61075
0123456789101112131415161718192021222324252627282930313233343536373839...

real 0m0.296s
user 0m0.277s
sys 0m0.088s

R�mi

Attila Szegedi

unread,

May 26, 2010, 7:45:01 AM5/26/10

to jvm-la...@googlegroups.com

On 2010.05.26., at 0:26, Rémi Forax wrote:

> I you write a runtime compiler that is able to optimize/deopt and reopt
> when necessary based on type profiling.
> You will get a boost. This is applicable for any dynamic languages.

That's my main takeaway from Rémi's post: JSR-292 enables the implementation of incremental type-specializing optimizing/deoptimizing bytecode compilers for your dynamic language. This is the huge deal in the long run. Rémi's current benchmark are largely irrelevant and I don't think we need to get bogged down in arguing over them.

Emphasis is on "incremental" and "deoptimizing" - you can do type-specializing compilers without JSR-292 if you want today. However, the ability to swap a dynamic language interpreter with a type-specialized bytecode with a call site granularity, and the ability to switch back to interpreter at, again, a call site granularity, are things you need JSR-292 for.

Attila.

Rémi Forax

unread,

May 26, 2010, 8:57:28 AM5/26/10

to jvm-la...@googlegroups.com

Le 26/05/2010 13:45, Attila Szegedi a �crit :

> On 2010.05.26., at 0:26, R�mi Forax wrote:
>
>
>> I you write a runtime compiler that is able to optimize/deopt and reopt
>> when necessary based on type profiling.
>> You will get a boost. This is applicable for any dynamic languages.
>>

> That's my main takeaway from R�mi's post: JSR-292 enables the implementation of incremental type-specializing optimizing/deoptimizing bytecode compilers for your dynamic language. This is the huge deal in the long run. R�mi's current benchmark are largely irrelevant and I don't think we need to get bogged down in arguing over them.

>
> Emphasis is on "incremental" and "deoptimizing" - you can do type-specializing compilers without JSR-292 if you want today. However, the ability to swap a dynamic language interpreter with a type-specialized bytecode with a call site granularity, and the ability to switch back to interpreter at, again, a call site granularity, are things you need JSR-292 for.
>
> Attila.
>
>

Well said :)

R�mi

Charles Oliver Nutter

unread,

May 26, 2010, 5:09:00 PM5/26/10

to jvm-la...@googlegroups.com

On Tue, May 25, 2010 at 5:26 PM, Rémi Forax <fo...@univ-mlv.fr> wrote:
>> Going on to claim that JVM language developers' "house is burning"
>> because your language has started to do math faster (and not
>> drastically faster, I'd add; even Java's only 4x faster than JRuby on
>> this benchmark) is probably going a bit far.
>>
>
> It don't only do math faster. I do all codes that use primitives faster.
> math is just an example.

Ok, by saying "math" I mean "primitives" in general. Anything that
fits unmodified into Object I would expect to perform very well on
JRuby.

> My point is not that JSR292 directly help to have better performance,
> it most probably will. But above all, it drastically simplify the code of
> the runtime
> and let us to focus on what is in my opinion more important i.e provide
> good profile type information to the VM.
>
> I you write a runtime compiler that is able to optimize/deopt and reopt
> when necessary based on type profiling.
> You will get a boost. This is applicable for any dynamic languages.

Yes, I definitely agree with that. And that's why I've started work
recently (and one of our committers started work some time ago) to
build both a better compiler and type-profiled optimizations for
JRuby. Hopefully we'll start to get some of that in place for JRuby
1.6 later this summer, at least to eliminate boxed math in local
scopes where it can be shown that we're only doing known operations
against numeric types. That much should be pretty easy.

- Charlie

Charles Oliver Nutter

unread,

May 26, 2010, 5:12:28 PM5/26/10

to jvm-la...@googlegroups.com

On Wed, May 26, 2010 at 6:45 AM, Attila Szegedi <szeg...@gmail.com> wrote:
> Emphasis is on "incremental" and "deoptimizing" - you can do type-specializing compilers without JSR-292 if you want today. However, the ability to swap a dynamic language interpreter with a type-specialized bytecode with a call site granularity, and the ability to switch back to interpreter at, again, a call site granularity, are things you need JSR-292 for.

That's certainly true. Specializing any call path in current JRuby
means specializing it for *all* possible targets, and that's
cumbersome. That's why we currently only arity-specialize up to three
arguments and have no specialized call paths for primitives. Indy
certainly makes that easier, at least for arguments. Return values are
still a problem, however: for example, when doing Fixnum + Fixnum we
may be able to just return Fixnum, and therefore reduce the entire
operation to primitives...or we may need to return Bignum if it
overflows. That requires a hard bailout to the Object version of a
compiled+optimized method.

Floats are relatively easy, since they don't overflow into an
arbitrary-precision type; if we can show that a given method's calls
are all on Float and Float has not been modified...it can be primitive
math. I could implement that in a very short amount of time even
without indy.

- Charlie

Rémi Forax

unread,

May 26, 2010, 6:44:24 PM5/26/10

to jvm-la...@googlegroups.com

Le 26/05/2010 23:09, Charles Oliver Nutter a �crit :

Here is what I'm doing or planned to do:

I work at loop level and not at method level.
There is several reasons for that:
- loop body can be called often, so body can be hot
- you know the closure at that time and my language
as no real function, all is closures.
- you also know the range of the loop variable
the array size, etc
in your case, you will know if you will have to use bignum or not

R�mi

Charles Oliver Nutter

unread,

May 26, 2010, 6:51:10 PM5/26/10

to jvm-la...@googlegroups.com

On Wed, May 26, 2010 at 5:44 PM, Rémi Forax <fo...@univ-mlv.fr> wrote:

> Le 26/05/2010 23:09, Charles Oliver Nutter a écrit :
>> Yes, I definitely agree with that. And that's why I've started work
>> recently (and one of our committers started work some time ago) to
>> build both a better compiler and type-profiled optimizations for
>> JRuby. Hopefully we'll start to get some of that in place for JRuby
>> 1.6 later this summer, at least to eliminate boxed math in local
>> scopes where it can be shown that we're only doing known operations
>> against numeric types. That much should be pretty easy.
>>
>> - Charlie
>>
>>
>
> Here is what I'm doing or planned to do:
>
> I work at loop level and not at method level.
> There is several reasons for that:
> - loop body can be called often, so body can be hot
> - you know the closure at that time and my language
> as no real function, all is closures.
> - you also know the range of the loop variable
> the array size, etc
> in your case, you will know if you will have to use bignum or not

Oh yes, for loops we'll certainly be able to reduce it as long as we
can see a constant stride and a static starting point. That's not
always the case, however.

For arbitrary math, however, we can't really do that. We don't know
until runtime if 2 ** x will need to overflow, and so we basically
have to compile two paths. The automatic overflow into Bignum is a
real hassle since we can't represent operations against both long and
Bignum in the same code. I'm open to suggestions on that end of
things...it's really the only thing preventing us from attempting
primitive specialization for Fixnum right now.

As far as methods versus loop bodies...yes, I figure we'll probably
reserve the right to "outline" any basic block in the code, and in
fact the optimized body of a method (+ inlined code) will very likely
break down into several static methods calling each other, so that
hotspot can choose the hot path for us and so that bytecode size isn't
increased too much by inlining.

- Charlie

Per Bothner

unread,

May 26, 2010, 7:17:36 PM5/26/10

to jvm-la...@googlegroups.com

On 05/26/2010 03:51 PM, Charles Oliver Nutter wrote:
> The automatic overflow into Bignum is a
> real hassle since we can't represent operations against both long and
> Bignum in the same code. I'm open to suggestions on that end of
> things...it's really the only thing preventing us from attempting
> primitive specialization for Fixnum right now.

This is another example of where structs would be helpful. That would
allow:

struct Integer {
int ivalue;
int[] iwords;
}

If the value fits in 32 bits then iwords is null. Otherwise, we
allocate an array of "big-digits". The codepath for add(i1, i2)
is:
if (i1.iwords == null && i2.iwords == null) {
long sum = (long) i1.ivalue + (long) i2.ivalue;
int isum = (int) sum;
if (sum == isum)
return Integer { ivalue: isum; iwords: null };
/* OPTIONAL - skip if inlining
else
return Integer.makeFromLong(sum);
*/
}
return Integer.slowAdd(i1,i2);

Kawa basically does this using a regular class gnu.math.Integer,
and "small" Integers pre-allocated. That is one reason Kawa's
arbitrary-precision handling is (mostly) noticeably faster than BigInteger.
(Some of that could be achieved by further tweaking BigInteger.)

However, if we could use structs, we can avoid heap-allocation
in those cases where we know the values are integers though
not necessarily fixnums. JVM optimizations could further optimize
this, especially if struct Integer is a standard part of the platform.
--
--Per Bothner
p...@bothner.com http://per.bothner.com/

John Rose

unread,

May 26, 2010, 8:32:56 PM5/26/10

to jvm-la...@googlegroups.com

On May 25, 2010, at 4:06 AM, Charles Oliver Nutter wrote:

> Perhaps
> there's something missing that would allow escape analysis to
> eliminate the objects?

EA is a fragile optimization. Caching optimizations in factory methods can inhibit it.

Does your factory method do anything other than create a new RubyFloat on every call?

-- John

Charles Oliver Nutter

unread,

May 26, 2010, 8:37:45 PM5/26/10

to jvm-la...@googlegroups.com

On Wed, May 26, 2010 at 7:32 PM, John Rose <john....@oracle.com> wrote:
> EA is a fragile optimization. Caching optimizations in factory methods can inhibit it.
>
> Does your factory method do anything other than create a new RubyFloat on every call?

Just as easy to show as to explain:

public static RubyFloat newFloat(Ruby runtime, double value) {
return new RubyFloat(runtime, value);
}

public RubyFloat(Ruby runtime, double value) {
super(runtime, runtime.getFloat());
this.value = value;
}

Is it possible for the non-final "metaclass" field to defeat EA in
this case? Are there any good switches we can use to get some
visibility into EA decisions?

Fixnum, of course, does have numerous caching behaviors. If we remove
them, low-valued numeric operations become considerably slower without
EA, but I'm not sure we've ever been able to see EA solve this case.

I hope there's a university out there somewhere about to announce
they've added value types to OpenJDK...we're going to fall behind
other dynlang impls that have fixnums and flonums :(

- Charlie

John Rose

unread,

May 26, 2010, 8:51:50 PM5/26/10

to jvm-la...@googlegroups.com

On May 26, 2010, at 5:37 PM, Charles Oliver Nutter wrote:

> Just as easy to show as to explain:
>
> public static RubyFloat newFloat(Ruby runtime, double value) {
> return new RubyFloat(runtime, value);
> }
>
> public RubyFloat(Ruby runtime, double value) {
> super(runtime, runtime.getFloat());
> this.value = value;
> }

Well, that looks simple and EA-friendly.

> Is it possible for the non-final "metaclass" field to defeat EA in
> this case?

No, EA doesn't depend on final-ness.

> Are there any good switches we can use to get some
> visibility into EA decisions?

Good old LogCompilation will tell you when allocations are eliminated.

Also PrintEscapeAnalysis and PrintEliminateAllocations (non-product switches).

> Fixnum, of course, does have numerous caching behaviors. If we remove
> them, low-valued numeric operations become considerably slower without
> EA, but I'm not sure we've ever been able to see EA solve this case.

We had to put special logic into the dynamic compiler to recognize and discount the effects of such caching in Integer.valueOf. The caching is required by the specification, useful for most apps, and noxious to the EA optimization.

> I hope there's a university out there somewhere about to announce
> they've added value types to OpenJDK...we're going to fall behind
> other dynlang impls that have fixnums and flonums :(

The mlvm project is looking for a few good grad students...

-- John

Charles Oliver Nutter

unread,

May 26, 2010, 9:08:31 PM5/26/10

to jvm-la...@googlegroups.com

On Wed, May 26, 2010 at 7:51 PM, John Rose <john....@oracle.com> wrote:
> On May 26, 2010, at 5:37 PM, Charles Oliver Nutter wrote:
>> Are there any good switches we can use to get some
>> visibility into EA decisions?
>
> Good old LogCompilation will tell you when allocations are eliminated.

A quick examination seemed to show the allocations were alive and well.

I assume EA depends on being able to inline all paths completely; that
could be where things get in the way for us. There's bound to be at
least one path that receives RubyFloat that we can't full inline, and
that one path would mean the object needs to exist...right?

> Also PrintEscapeAnalysis and PrintEliminateAllocations (non-product switches).

I'll have to give that a shot once I get my mlvm build env back up and going...

>> Fixnum, of course, does have numerous caching behaviors. If we remove
>> them, low-valued numeric operations become considerably slower without
>> EA, but I'm not sure we've ever been able to see EA solve this case.
>
> We had to put special logic into the dynamic compiler to recognize and discount the effects of such caching in Integer.valueOf. The caching is required by the specification, useful for most apps, and noxious to the EA optimization.

Since the EA stuff has filtered down to most Java 6 installs I
wouldn't be opposed to eliminating our cache if we could start to show
EA working. We're not using java.lang.Long/Integer for our fixnum
representation...

>> I hope there's a university out there somewhere about to announce
>> they've added value types to OpenJDK...we're going to fall behind
>> other dynlang impls that have fixnums and flonums :(
>
> The mlvm project is looking for a few good grad students...

What fun that would be! :)

- Charlie

John Rose

unread,

May 26, 2010, 9:35:54 PM5/26/10

to jvm-la...@googlegroups.com

On May 26, 2010, at 6:08 PM, Charles Oliver Nutter wrote:

> There's bound to be at
> least one path that receives RubyFloat that we can't full inline, and
> that one path would mean the object needs to exist...right?

Yes. That's why it's fragile. -- John

Kresten Krab Thorup

unread,

May 28, 2010, 6:24:42 AM5/28/10

to JVM Languages

On May 27, 1:17 am, Per Bothner <p...@bothner.com> wrote:
> This is another example of where structs would be helpful. That would
> allow:
>
> struct Integer {
> int ivalue;
> int[] iwords;
>
> }

....

>
> Kawa basically does this using a regular class gnu.math.Integer,
> and "small" Integers pre-allocated. That is one reason Kawa's
> arbitrary-precision handling is (mostly) noticeably faster than BigInteger.
> (Some of that could be achieved by further tweaking BigInteger.)

1: I'm intrigued. How much does this give you?

I can see that you avoid a virtual call for all math operators where
you can determine the integer-ness of operands; so it does not have to
choose between SmallInt and BigInt objects. But it comes at the
overhead of an extra word per integer.

Integer arithmetic (most notably X+1, X-1, and X==<constant>) used in
loops is currently a noticeable performance issue in Erjang (when
comparing to the normal erlang implementation using tags). In many
such cases, I could avoid a virtual call and that does sound
appealing.

2: What is the motivation in Kawa to make your own bignum
implementation. Why not just have

class KawaInteger {
int ival;
BigInteger bval;

...
}

i.e., fall back on the standard bignum implementation?

Kresten

John Rose

unread,

May 28, 2010, 9:55:29 PM5/28/10

to jvm-la...@googlegroups.com

Part of the advantage of the structs Per is talking about is this: Structs may be passed as arguments and used as locals elementwise, with no allocation on the stack. Think of it as guaranteed escape analysis, or guaranteed unboxing.

-- John

Per Bothner

unread,

May 28, 2010, 10:04:27 PM5/28/10

to jvm-la...@googlegroups.com

On 05/28/2010 06:55 PM, John Rose wrote:
> Part of the advantage of the structs Per is talking about is this: Structs may be passed as arguments and used as locals elementwise, with no allocation on the stack.

And more critically: Returned as method results.

It is easy enough to simulate struct arguments or locals
or fields: just use one argument or local for each struct "field".
But you can't do that for method results.

Per Bothner

unread,

May 28, 2010, 11:15:49 PM5/28/10

to jvm-la...@googlegroups.com

On 05/28/2010 03:24 AM, Kresten Krab Thorup wrote:
>
>
> On May 27, 1:17 am, Per Bothner<p...@bothner.com> wrote:
>> This is another example of where structs would be helpful. That would
>> allow:
>>
>> struct Integer {
>> int ivalue;
>> int[] iwords;
>>
>> }
> ....
>>
>> Kawa basically does this using a regular class gnu.math.Integer,
>> and "small" Integers pre-allocated. That is one reason Kawa's
>> arbitrary-precision handling is (mostly) noticeably faster than BigInteger.
>> (Some of that could be achieved by further tweaking BigInteger.)
>
> 1: I'm intrigued. How much does this give you?

I don't have numbers convenient, but even with the current JVM (i.e.
no "struct" support) the big advantage is that in most cases you
don't allocate a "data" array, as long as the integer fits in 32 bits.
That same you a lot of memory and gc time. It means you have to explicitly
check for the immediate vs array modes (i.e. words==null or not), but
once determined you have a 32-bit number the actual work is quick,
and requires fewer memory accesses (and hence cache misses).

BigInteger could (and perhaps should) do the same optimization.
But BigInteger has a some further overheads, including some
seldomly-used fields.

> I can see that you avoid a virtual call for all math operators where
> you can determine the integer-ness of operands; so it does not have to
> choose between SmallInt and BigInt objects. But it comes at the
> overhead of an extra word per integer.

Right, but that is modest compared to the space used by BigInteger.

> Integer arithmetic (most notably X+1, X-1, and X==<constant>) used in
> loops is currently a noticeable performance issue in Erjang (when
> comparing to the normal erlang implementation using tags). In many
> such cases, I could avoid a virtual call and that does sound
> appealing.
>
> 2: What is the motivation in Kawa to make your own bignum
> implementation. Why not just have
>
> class KawaInteger {
> int ival;
> BigInteger bval;
>
> ...
> }
>
> i.e., fall back on the standard bignum implementation?

That was not an option at the time: gnu.math.IntNum was implemented
before java.math.BigInteger was available. If I started from scratch
that would probably have made sense to do what you're suggesting. But
since the current implementation is faster and more space efficient
than using BigInteger I don't see much point in ripping it out.

You're free to use gnu.math.IntNum (and gnu.math in general); it has
no dependencies on the rest of Kawa.

Kresten Krab Thorup

unread,

May 30, 2010, 10:46:35 AM5/30/10

to JVM Languages

OK, my current implementation for integers [int Erjang] is

class EInteger extends ENumber { ... }
class ESmall extends EInteger { int value; }
class EBig extends EInteger { BigInteger value; }

With this, I run the bench_tak at

MacOSX 16.3-b01-279 -server 385ms/loop
soylatte16-i386-1.0.3 -server 500ms/loop

I tried to implement your data structure, ...

class EInteger extends ENumber { int ival; BigInteger bival; ... }

MacOSX 16.3-b01-279 -server 485ms/loop
soylatte16-i386-1.0.3 -server 497ms/loop (-XX:+DoEscapeAnalysis)
soylatte16-i386-1.0.3 -server 606ms/loop (-XX:-DoEscapeAnalysis)

With escape analysis it runs more stable, i.e. it looks like there is
much less GC going on.

Kresten

Erlang test code looks like this

-----------------------------------------------------------
tak(X,Y,Z) when Y >= X ->
Z;
tak(X,Y,Z) ->
tak( tak(X-1, Y, Z),
tak(Y-1, Z, X),
tak(Z-1, X, Y) ).

main(N) ->
Body = fun() ->
Before = erlang:now(),
times(10, fun() -> tak(24,16,8) end),
After = erlang:now(),
Diff = timer:now_diff(After, Before),
io:format("run: ~pms~n", [Diff div 1000])
end,

timer:tc(?MODULE, times, [N, Body]).
-----------------------------------------------------------

So I tried to re-do it with essentially your

Rémi Forax

unread,

May 30, 2010, 11:04:50 AM5/30/10

to jvm-la...@googlegroups.com

Le 30/05/2010 16:46, Kresten Krab Thorup a �crit :

> OK, my current implementation for integers [int Erjang] is
>
> class EInteger extends ENumber { ... }
> class ESmall extends EInteger { int value; }
> class EBig extends EInteger { BigInteger value; }
>
> With this, I run the bench_tak at
>
> MacOSX 16.3-b01-279 -server 385ms/loop
> soylatte16-i386-1.0.3 -server 500ms/loop
>
> I tried to implement your data structure, ...
>
> class EInteger extends ENumber { int ival; BigInteger bival; ... }
>
> MacOSX 16.3-b01-279 -server 485ms/loop
> soylatte16-i386-1.0.3 -server 497ms/loop (-XX:+DoEscapeAnalysis)
> soylatte16-i386-1.0.3 -server 606ms/loop (-XX:-DoEscapeAnalysis)
>
> With escape analysis it runs more stable, i.e. it looks like there is
> much less GC going on.
>
> Kresten
>
>

Hi Kresten,
Your bench is a good candidate for class hierarchy analysis (CHA).
No, EBig is never loaded, you only use ESmall instances.
How about mixing EBig and ESmall ?

Morover, there is flaw in the bench, you don't use the return value of talk,
JITs will see that, and don't call talk.

R�mi

Charles Oliver Nutter

unread,

Jun 3, 2010, 2:17:37 AM6/3/10

to jvm-la...@googlegroups.com

Another data point for JRuby.

JRuby has only two integer data types: Fixnum (implemented by
RubyFixnum, always containing a long) and Bignum (implemented by
RubyBignum, using BigDecimal). All Fixnum math operations have an
overflow check, and the arbitrary-precision nature of integers will
probably be the biggest hindrance to lifting integer math to
primitives.

This is OS X Java 1.6.0_20, on a 2.66GHs Core 2 Duo.

~/projects/jruby ➔ jruby --server -J-d64 bench/bench_tak.rb 5
user system total real
2.861000 0.000000 2.861000 ( 2.793000)
1.987000 0.000000 1.987000 ( 1.987000)
1.978000 0.000000 1.978000 ( 1.978000)
1.981000 0.000000 1.981000 ( 1.982000)
1.980000 0.000000 1.980000 ( 1.980000)

Standard execution. A large part of the overhead here is managing a
Ruby frame, which has backtrace info, etc, for every Ruby call. Tak is
much more tak-call-heavy than math-heavy from what I've seen.

~/projects/jruby ➔ jruby --server -J-d64 --fast bench/bench_tak.rb 5
user system total real
1.984000 0.000000 1.984000 ( 1.939000)
0.786000 0.000000 0.786000 ( 0.786000)
0.782000 0.000000 0.782000 ( 0.782000)
0.793000 0.000000 0.793000 ( 0.793000)
0.781000 0.000000 0.781000 ( 0.782000)

"fast" mode, which eliminates Ruby frames when they aren't needed (and
generates backtraces in another way), dispatches literal Fixnums as
long, and pulls the method objects all the way to the call site so
they can inline. Still several layers between each call, plus all
dynamic calling logic.

~/projects/jruby ➔ jruby --server -J-d64 -J-Djruby.compile.dynopt=true
bench/bench_tak.rb 5
user system total real
0.891000 0.000000 0.891000 ( 0.821000)
0.325000 0.000000 0.325000 ( 0.325000)
0.323000 0.000000 0.323000 ( 0.323000)
0.322000 0.000000 0.322000 ( 0.322000)
0.323000 0.000000 0.323000 ( 0.323000)

Experimental dynopt mode; based on previously-seen calls, generates
direct static dispatches to both Fixnum operations and for the
recursive calls to tak. The least Ruby compatible, so far: no dispatch
guards, no backtrace logic, and if it overflows into Bignum it would
ClassCastException. But it's nearly as fast as writing the same code
against RubyFixnums in Java.

If I go a tiny bit further and turn off some other Rubyisms (updating
a thread-local line number variable, checking for thread events
periodically) I can get it down to this:

~/projects/jruby ➔ jruby --server -J-d64 -J-Djruby.compile.dynopt=true
-J-Djruby.compile.threadless=true -J-Djruby.compile.positionless=true
bench/bench_tak.rb 5
user system total real
0.946000 0.000000 0.946000 ( 0.868000)
0.272000 0.000000 0.272000 ( 0.272000)
0.277000 0.000000 0.277000 ( 0.277000)
0.274000 0.000000 0.274000 ( 0.274000)
0.274000 0.000000 0.274000 ( 0.274000)

I'm curious why your base perf is so close to this final number, since
it seems pretty amazing to me, and still doesn't have the guards it
needs to be really valid.

Maybe you can post the tak bytecode for the final result in Ejang? I'd
love to see what you're doing...

Here's the bytecode for the "most optimized currently possible" Ruby
version of this:

(3, 4, and 5 are the incoming arguments...and yeah, there's obviously
room for improvement here)
*** Dumping ***
ALOAD 3
ASTORE 13
ALOAD 4
ASTORE 14
ALOAD 5
ASTORE 15
L0
L1
LINENUMBER 2 L1
ALOAD 14
CHECKCAST org/jruby/RubyFixnum
ALOAD 1
ALOAD 13
INVOKEVIRTUAL org/jruby/RubyFixnum.op_ge
(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
INVOKEINTERFACE org/jruby/runtime/builtin/IRubyObject.isTrue ()Z
IFEQ L2
L3
LINENUMBER 3 L3
ALOAD 15
ARETURN
GOTO L4
L2
L5
LINENUMBER 5 L5
ALOAD 0
ALOAD 1
ALOAD 2
ALOAD 0
ALOAD 1
ALOAD 2
ALOAD 13
CHECKCAST org/jruby/RubyFixnum
ALOAD 1
LDC 1
INVOKEVIRTUAL org/jruby/RubyFixnum.op_minus
(Lorg/jruby/runtime/ThreadContext;J)Lorg/jruby/runtime/builtin/IRubyObject;
ALOAD 14
ALOAD 15
ACONST_NULL
INVOKESTATIC
ruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89.__file__
(Lruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;
ALOAD 0
ALOAD 1
ALOAD 2
ALOAD 14
CHECKCAST org/jruby/RubyFixnum
ALOAD 1
LDC 1
INVOKEVIRTUAL org/jruby/RubyFixnum.op_minus
(Lorg/jruby/runtime/ThreadContext;J)Lorg/jruby/runtime/builtin/IRubyObject;
ALOAD 15
ALOAD 13
ACONST_NULL
INVOKESTATIC
ruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89.__file__
(Lruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;
ALOAD 0
ALOAD 1
ALOAD 2
ALOAD 15
CHECKCAST org/jruby/RubyFixnum
ALOAD 1
LDC 1
INVOKEVIRTUAL org/jruby/RubyFixnum.op_minus
(Lorg/jruby/runtime/ThreadContext;J)Lorg/jruby/runtime/builtin/IRubyObject;
ALOAD 13
ALOAD 14
ACONST_NULL
INVOKESTATIC
ruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89.__file__
(Lruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;
ACONST_NULL
INVOKESTATIC
ruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89.__file__
(Lruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;
ARETURN
L4
ARETURN
L6
LOCALVARIABLE x Lorg/jruby/runtime/builtin/IRubyObject; L0 L6 13
LOCALVARIABLE y Lorg/jruby/runtime/builtin/IRubyObject; L0 L6 14
LOCALVARIABLE z Lorg/jruby/runtime/builtin/IRubyObject; L0 L6 15
@Lorg/jruby/anno/JRubyMethod;(name="__file__", frame=true,
required=3, optional=0, rest=-1)

John Cowan

unread,

Jun 3, 2010, 4:48:29 PM6/3/10

to jvm-la...@googlegroups.com

On Thu, Jun 3, 2010 at 2:17 AM, Charles Oliver Nutter
<hea...@headius.com> wrote:
> Another data point for JRuby.
>
> JRuby has only two integer data types: Fixnum (implemented by
> RubyFixnum, always containing a long) and Bignum (implemented by
> RubyBignum, using BigDecimal). All Fixnum math operations have an
> overflow check, and the arbitrary-precision nature of integers will
> probably be the biggest hindrance to lifting integer math to
> primitives.

Why bother with this? Java Longs (which is de facto what you have)
aren't that much faster than BigDecimals, and if you add overflow
checks, probably no better. Is it really a win to have more than one
integer type? Or is this something you need for CRuby compatibility?

--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures

Charles Oliver Nutter

unread,

Jun 3, 2010, 4:57:59 PM6/3/10

to jvm-la...@googlegroups.com

On Thu, Jun 3, 2010 at 3:48 PM, John Cowan <johnw...@gmail.com> wrote:
> On Thu, Jun 3, 2010 at 2:17 AM, Charles Oliver Nutter
> <hea...@headius.com> wrote:
>> Another data point for JRuby.
>>
>> JRuby has only two integer data types: Fixnum (implemented by
>> RubyFixnum, always containing a long) and Bignum (implemented by
>> RubyBignum, using BigDecimal). All Fixnum math operations have an
>> overflow check, and the arbitrary-precision nature of integers will
>> probably be the biggest hindrance to lifting integer math to
>> primitives.
>
> Why bother with this? Java Longs (which is de facto what you have)
> aren't that much faster than BigDecimals, and if you add overflow
> checks, probably no better. Is it really a win to have more than one
> integer type? Or is this something you need for CRuby compatibility?

Sorry, I meant BigInteger above. Perhaps that changes the situation?
At any rate, I'll reply as though we both said BigInteger.

Wow, I'd be very surprised if that's true. Big(Integer/Decimal) are
enormous compared to a long, especially on 64-bit JVM, so at a minimum
you have the allocation cost of all that extra data. I may be wrong,
but I believe 64-bit JVMs are able to do long operations using 64-bit
instructions. And add to that the fact that even on 32-bit, math
operations against long are always going to be faster than those
against a byte[] in Big(Integer/Decimal). I'd be absolutely
dumbfounded if it weren't a major performance hit to use
Big(Integer/Decimal).

That said...if I end up being dumbfounded, there's no technical reason
why they couldn't both be Big(Integer/Decimal) under the covers. We'd
just have the object use a different metaclass (though we might still
need the overflow to know which metaclass to use...).

- Charlie

Per Bothner

unread,

Jun 3, 2010, 5:16:55 PM6/3/10

to jvm-la...@googlegroups.com

On 06/03/2010 01:57 PM, Charles Oliver Nutter wrote:
> Wow, I'd be very surprised if that's true. Big(Integer/Decimal) are
> enormous compared to a long, especially on 64-bit JVM, so at a minimum
> you have the allocation cost of all that extra data.

BigInteger has the following fields:

bitCount, bitLength, lowestSetBit, firstNonzeroIntNum
These are all deprecated (in the jdk7 source. at least), and are
just seldomly-used (?) caches, so could reasonably be removed.

The actual data:
int signum
int[] mag

If mag was changed to use 2's complement (as gnu.math.IntNum does)
then one could rid of signum, leaving just the (renamed) mag field.

Better of course would be to use the gnu.math.IntNum mechanism
of two fields:

public int ival;
public int[] words;

In this case words==null is treated as an optimization of
the case when words.length<=1.

I think this would be a very worthwhile optimization.

Less work and almost as good would be to keep using
signed-magnitude representation in the bignum case,
and merge the signum field with the fixnum optimization.

> And add to that the fact that even on 32-bit, math
> operations against long are always going to be faster than those
> against a byte[] in Big(Integer/Decimal).

Actually, it's an int array. (It gets converted to/from a byte
array during serialization.)

John Cowan

unread,

Jun 4, 2010, 4:04:13 AM6/4/10

to jvm-la...@googlegroups.com

On Thu, Jun 3, 2010 at 4:57 PM, Charles Oliver Nutter
<hea...@headius.com> wrote:

> Sorry, I meant BigInteger above. Perhaps that changes the situation?
> At any rate, I'll reply as though we both said BigInteger.

Right, that's what I meant.

> Wow, I'd be very surprised if that's true. Big(Integer/Decimal) are
> enormous compared to a long, especially on 64-bit JVM, so at a minimum
> you have the allocation cost of all that extra data.

I actually tested Integers (with autoboxing) against BigIntegers on a
32-bit system (I don't have a 64-bit system at present), and I made
sure that the arithmetic operations I was performing never overflowed
32 bits. The additional cost of using BigIntegers was between 2x and
3x, which I assume is the joint cost of:

1) overflow tests that fail;

2) the additional indirection: an Integer contains an int, whereas a
BigInteger contains an array with one int.

That seems to me small compared to having to handle two integer
representations, with special provisions for mixed mode and so on.

On Thu, Jun 3, 2010 at 5:16 PM, Per Bothner <p...@bothner.com> wrote:

> just seldomly-used (?) caches

"Seldom" is always an adverb in English, so "seldom-used" or
"rarely-used", take your pick. The latter is probably clearer.

> In this case words==null is treated as an optimization of
> the case when words.length<=1.
>
> I think this would be a very worthwhile optimization.

I would need some numbers to convince me that removing the indirection
would be worth the additional conditional tests, not so much when
testing an integer (null checks are cheap) as when creating one.

Charles Oliver Nutter

unread,

Jun 4, 2010, 1:01:06 PM6/4/10

to jvm-la...@googlegroups.com

On Fri, Jun 4, 2010 at 3:04 AM, John Cowan <johnw...@gmail.com> wrote:
> I actually tested Integers (with autoboxing) against BigIntegers on a
> 32-bit system (I don't have a 64-bit system at present), and I made
> sure that the arithmetic operations I was performing never overflowed
> 32 bits. The additional cost of using BigIntegers was between 2x and
> 3x, which I assume is the joint cost of:
>
> 1) overflow tests that fail;
>
> 2) the additional indirection: an Integer contains an int, whereas a
> BigInteger contains an array with one int.
>
> That seems to me small compared to having to handle two integer
> representations, with special provisions for mixed mode and so on.

Well dealing with the two isn't that big a deal; if the overflow check
fails, we just construct the BigInteger version and do the math again.
That's essentially the same logic you'd have to do with any
split-brain arbitrary-precision integer implementation. Of course
you're talking about whether just using BigInteger to begin with would
be faster.

Looking at size, if we're talking about Long it's object + long field.
We can ignore the object since that's the same for BigInteger, and
then the additional size is just 8 bytes. For BigInteger, it's object
+ int + int[] + (the remaining all deprecated, but *still there) four
more ints = 24-28 bytes, and can only represent up to 32-bit values
before adding the array, which is something like 8-12 bytes for the
header and then 4 bytes for each entry. So in terms of allocation
alone, BigInteger is several times larger than Long.

The overflow check is nontrivial. Here's the opto assembly for it (no
inlining, full method body, so some of this would disappear when
inlined):

000 B1: # B4 B2 <- BLOCK HEAD IS JUNK Freq: 1
000 PUSHL EBP
SUB ESP,8 # Create frame
007 MOV ECX,[ESP + #24]
MOV EBX,[ESP + #28]
00f XOR ECX.lo,[ESP + #16]
XOR ECX.hi,[ESP + #16]+4
017 MOV EBP,[ESP + #32]
MOV EDI,[ESP + #36]
01f XOR EBP.lo,[ESP + #16]
XOR EBP.hi,[ESP + #16]+4
027 AND ECX.lo,EBP.lo
AND ECX.hi,EBP.hi
02b AND ECX.lo,#-9223372036854775808.lo
AND ECX.hi,#-9223372036854775808.hi
034 MOV EDI,ECX.lo
OR EDI,ECX.hi ! Long is EQ/NE 0?
038 Jne,s B4 P=0.000000 C=19097.000000
038
03a B2: # B3 <- B1 Freq: 1
03a XOR EAX,EAX
03c
03c B3: # N1 <- B2 B4 Freq: 1
03c ADD ESP,8 # Destroy frame
POPL EBP
TEST PollPage,EAX ! Poll Safepoint

046 RET
046
047 B4: # B3 <- B1 Freq: 4.76837e-07
047 MOV EAX,#1
04c JMP,s B3
04c

This from the following code:

private static boolean subtractionOverflowed(long original, long
other, long result) {
return (~(original ^ ~other) & (original ^ result) & SIGN_BIT) != 0;
}

So I guess the question comes down to whether this overflow check plus
the associated branching logic is less costly than always using
BigInteger and paying the allocation costs (not to mention whatever
internal costs it has versus doing "long op long", even if you have to
unbox and box on the way in and out.

In JRuby, RubyFixnum has that long field plus an int flags field and a
reference to metaclass, for an additional 8-12 bytes. Narrows the gap
a bit.

I wasn't able to get +PrintAssembly to work (see my other email) so I
wasn't able to see how 32 and 64-bit Hotspot eventually assembles the
above opto assembly. It may be even tighter.

- Charlie

Rémi Forax

unread,

Jun 4, 2010, 2:36:41 PM6/4/10

to jvm-la...@googlegroups.com

Le 03/06/2010 08:17, Charles Oliver Nutter a écrit :

[...]

> I'm curious why your base perf is so close to this final number, since
> it seems pretty amazing to me, and still doesn't have the guards it
> needs to be really valid.
>

erjang uses a trick: tailcall optimization.

> Maybe you can post the tak bytecode for the final result in Ejang? I'd
> love to see what you're doing...
>

Here is the bytecode generated with invokedynamic and no optimization.
The first argument is the environment (where to print, etc), the second
is a thin wrapper over a MethodHandle ; in my language all functions
are lambdas ; here the method handle references tak itself because
the function is recursive and the others arguments are the arguments of tak.

As you can see all casts are removed.

Rémi

var name:tak readOnly:true type:function value:tak[any X, any Y, any Z]
declaringNode:null bound:true slot:1
tak(Ljava/lang/Object;Lcom/googlecode/phpreboot/model/Function;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
00000 ALOAD 3
00001 ALOAD 2
00002 INVOKEDYNAMIC lt (Ljava/lang/Object;Ljava/lang/Object;)Z
00003 IFNE L0
00004 ALOAD 4
00005 ARETURN
00006 L0
00007 ALOAD 1
00008 INVOKEVIRTUAL
com/googlecode/phpreboot/model/Function.getMethodHandle
()Ljava/dyn/MethodHandle;
00009 ALOAD 0
00010 ALOAD 1
00011 INVOKEVIRTUAL
com/googlecode/phpreboot/model/Function.getMethodHandle
()Ljava/dyn/MethodHandle;
00012 ALOAD 0
00013 ALOAD 2
00014 ICONST_1
00015 INVOKEDYNAMIC minus (Ljava/lang/Object;I)Ljava/lang/Object;
00016 ALOAD 3
00017 ALOAD 4
00018 INVOKEVIRTUAL java/dyn/MethodHandle.invoke
(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
00019 ALOAD 1
00020 INVOKEVIRTUAL
com/googlecode/phpreboot/model/Function.getMethodHandle
()Ljava/dyn/MethodHandle;
00021 ALOAD 0
00022 ALOAD 3
00023 ICONST_1
00024 INVOKEDYNAMIC minus (Ljava/lang/Object;I)Ljava/lang/Object;
00025 ALOAD 4
00026 ALOAD 2
00027 INVOKEVIRTUAL java/dyn/MethodHandle.invoke
(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
00028 ALOAD 1
00029 INVOKEVIRTUAL
com/googlecode/phpreboot/model/Function.getMethodHandle
()Ljava/dyn/MethodHandle;
00030 ALOAD 0
00031 ALOAD 4
00032 ICONST_1
00033 INVOKEDYNAMIC minus (Ljava/lang/Object;I)Ljava/lang/Object;
00034 ALOAD 2
00035 ALOAD 3
00036 INVOKEVIRTUAL java/dyn/MethodHandle.invoke
(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
00037 INVOKEVIRTUAL java/dyn/MethodHandle.invoke
(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
00038 ARETURN

<clinit>()V
00000 LDC Lcom/googlecode/phpreboot/runtime/RT;.class
00001 LDC "bootstrap"
00002 INVOKESTATIC java/dyn/Linkage.registerBootstrapMethod
(Ljava/lang/Class;Ljava/lang/String;)V
00003 RETURN

Charles Oliver Nutter

unread,

Jun 4, 2010, 5:08:47 PM6/4/10

to jvm-la...@googlegroups.com

On Fri, Jun 4, 2010 at 1:36 PM, Rémi Forax <fo...@univ-mlv.fr> wrote:
> Le 03/06/2010 08:17, Charles Oliver Nutter a écrit :
>
> [...]
>
>> I'm curious why your base perf is so close to this final number, since
>> it seems pretty amazing to me, and still doesn't have the guards it
>> needs to be really valid.
>>
>
> erjang uses a trick: tailcall optimization.

Well that could explain a lot; the tail calls in JRuby are normally
going back through the full dyncall path at minimum, and even in the
dynopt version they're still doing a static Java dispatch. At any
rate, I'm less concerned now :)

>> Maybe you can post the tak bytecode for the final result in Ejang? I'd
>> love to see what you're doing...
>>
>
> Here is the bytecode generated with invokedynamic and no optimization.
> The first argument is the environment (where to print, etc), the second
> is a thin wrapper over a MethodHandle ; in my language all functions
> are lambdas ; here the method handle references tak itself because
> the function is recursive and the others arguments are the arguments of tak.
>
> As you can see all casts are removed.

I assume this is for your PHP, not for Erjang.

Very nice that you can remove the casts; in JRuby currently I am not
doing any runtime type inference yet...just wiring a direct Object
(IRubyObject) path to the next method, or direct plus primitive
argument for known "intrinsics" like Fixnum and Float math operations.
I'll go as far as I can with this and meanwhile Tom and Subbu will
continue working on the newer JRuby compiler that can actually do
things like constant and type propagation. Fun times ahead for us
mixed-mode JVM languages :)

- Charlie

Rémi Forax

unread,

Jun 4, 2010, 6:27:57 PM6/4/10

to jvm-la...@googlegroups.com

Le 04/06/2010 23:08, Charles Oliver Nutter a �crit :
> On Fri, Jun 4, 2010 at 1:36 PM, R�mi Forax<fo...@univ-mlv.fr> wrote:
>
>> Le 03/06/2010 08:17, Charles Oliver Nutter a �crit :

>>
>> [...]
>>
>>
>>> I'm curious why your base perf is so close to this final number, since
>>> it seems pretty amazing to me, and still doesn't have the guards it
>>> needs to be really valid.
>>>
>>>
>> erjang uses a trick: tailcall optimization.
>>
> Well that could explain a lot; the tail calls in JRuby are normally
> going back through the full dyncall path at minimum, and even in the
> dynopt version they're still doing a static Java dispatch. At any
> rate, I'm less concerned now :)
>
>
>>> Maybe you can post the tak bytecode for the final result in Ejang? I'd
>>> love to see what you're doing...
>>>
>>>
>> Here is the bytecode generated with invokedynamic and no optimization.
>> The first argument is the environment (where to print, etc), the second
>> is a thin wrapper over a MethodHandle ; in my language all functions
>> are lambdas ; here the method handle references tak itself because
>> the function is recursive and the others arguments are the arguments of tak.
>>
>> As you can see all casts are removed.
>>
> I assume this is for your PHP, not for Erjang.
>

Yes.

> Very nice that you can remove the casts; in JRuby currently I am not
> doing any runtime type inference yet...just wiring a direct Object
> (IRubyObject) path to the next method, or direct plus primitive
> argument for known "intrinsics" like Fixnum and Float math operations.
>

Sorry, I wasn't clear.
This code is the code without any optimizations, so no cast removal.
If there is no cast in the sample it's not because they are removed
but because they are done by the method handle trees
behind each invokedynamic.

> I'll go as far as I can with this and meanwhile Tom and Subbu will
> continue working on the newer JRuby compiler that can actually do
> things like constant and type propagation. Fun times ahead for us
> mixed-mode JVM languages :)
>

I'm not far from being able to do constant propagation of lambdas,
I hope to come with that for the JVM'Summit :)

> - Charlie
>

R�mi

Charles Oliver Nutter

unread,

Jun 4, 2010, 7:37:13 PM6/4/10

to jvm-la...@googlegroups.com

On Fri, Jun 4, 2010 at 5:27 PM, Rémi Forax <fo...@univ-mlv.fr> wrote:

> Le 04/06/2010 23:08, Charles Oliver Nutter a écrit :
>> I'll go as far as I can with this and meanwhile Tom and Subbu will
>> continue working on the newer JRuby compiler that can actually do
>> things like constant and type propagation. Fun times ahead for us
>> mixed-mode JVM languages :)
>>
>
> I'm not far from being able to do constant propagation of lambdas,
> I hope to come with that for the JVM'Summit :)

We'll race to see who can do more dynamic optimizations by July :)

- Charlie

Kresten Krab Thorup

unread,

Jun 5, 2010, 12:13:57 PM6/5/10

to JVM Languages

For reference, here's the bytecode for tak in Erjang [JAD'ed bytecode
even further below].

In the actual generated code, it moves things around quite a bit in
local variables, but that is because of the way I translate BEAM to
JVM by keeping a "stack" in the local variables. All function calls
then do aload_0, aload_1, ... aload_N before a function call with
N-1. I'm assuming that the JVM will see through all these register
moves and optimize them away.

tak is very easy to translate for Erjang, because a simple local
analysis says that it doesn't suspend. So because of that it
generates very straightforward code. This is often the case for
"leafs" in the call graph.

Except for all the extra moving things around in variables, this is
what it does. The only "issue" with this is that Erjang cannot
suspend the thread while running in this loop (i.e. Kilim is not
pushed on this code). The check_exit() at the end of the loop is
there to make sure I can kill the process. is_ge() and dec()
[decrement] are native method on EObject.

static EObject tak(EProc proc, EObject X, EObject Y, EObject Z)
{
do {
if (Y.is_ge(X)) { return Z; }
EObject X1 = tak(proc, X.dec(), Y, Z);
EObject Y1 = tak(proc, Y.dec(), Z, X);
EObject Z1 = tak(proc, Z.dec(), X, Y);
X = X1; Y = Y1; Z = Z1;
proc.check_exit();
} while (true);
}

Kresten

-----[actual generated code]----
Method name:"tak__3" public static Signature:
(erjang.EProc,erjang.EObject,erjang.EObject,erjang.EObject)erjang.EObject
Attribute "Code", length:170, max_stack:4, max_locals:9, code_length:
138
0: goto 3
3: aload_2
4: aload_1
5: invokevirtual <Method erjang.EObject.is_ge
(erjang.EObject)boolean>
8: ifeq 15
11: aload_3
12: astore_1
13: aload_1
14: areturn
15: getstatic <Field erjang.ERT.NIL erjang.ENil>
18: dup
19: nop
20: astore 5
22: dup
23: nop
24: astore 6
26: dup
27: nop
28: astore 7
30: nop
31: astore 8
33: aload_1
34: invokevirtual <Method erjang.EObject.dec ()erjang.EObject>
37: astore 4
39: aload_1
40: astore 5
42: aload 4
44: astore_1
45: aload_3
46: astore 7
48: aload_2
49: astore 6
51: aload_0
52: aload_1
53: aload_2
54: aload_3
55: invokestatic <Method erjang.m.bench_tak.bench_tak.tak__3
(erjang.EProc,erjang.EObject,erjang.EObject,erjang.EObject)erjang.EObject>
58: astore_1
59: aload 6
61: invokevirtual <Method erjang.EObject.dec ()erjang.EObject>
64: astore_2
65: aload_1
66: astore 8
68: aload 5
70: astore_3
71: aload_2
72: astore_1
73: aload 7
75: astore_2
76: aload_0
77: aload_1
78: aload_2
79: aload_3
80: invokestatic <Method erjang.m.bench_tak.bench_tak.tak__3
(erjang.EProc,erjang.EObject,erjang.EObject,erjang.EObject)erjang.EObject>
83: astore_1
84: aload 7
86: invokevirtual <Method erjang.EObject.dec ()erjang.EObject>
89: astore_2
90: aload_1
91: astore 4 93: aload 6
95: astore_3
96: aload_2
97: astore_1
98: aload 5
100: astore_2
101: aload 4
103: astore 5
105: getstatic <Field erjang.ERT.NIL erjang.ENil>
108: astore 7
110: getstatic <Field erjang.ERT.NIL erjang.ENil>
113: astore 6
115: aload_0
116: aload_1
117: aload_2
118: aload_3
119: invokestatic <Method erjang.m.bench_tak.bench_tak.tak__3
(erjang.EProc,erjang.EObject,erjang.EObject,erjang.EObject)erjang.EObject>
122: astore_1
123: aload 5
125: astore_2
126: aload_1
127: astore_3
128: aload 8
130: astore_1
131: aload_0
132: invokevirtual <Method erjang.EProc.check_exit ()void>
135: goto 3

Here's the JAD'ed version of the same code.

public static EObject tak__3(EProc eproc, EObject eobject, EObject
eobject1, EObject eobject2)
{
do
{
if(eobject1.is_ge(eobject))
{
eobject = eobject2;
return eobject;
}
EObject obj;
EObject obj1;
EObject obj2;
EObject obj3 = obj2 = obj1 = obj = ERT.NIL;
EObject eobject3 = eobject.dec();
obj = eobject;
eobject = eobject3;
obj2 = eobject2;
obj1 = eobject1;
eobject = tak__3(eproc, eobject, eobject1, eobject2);
eobject1 = obj1.dec();
obj3 = eobject;
eobject2 = obj;
eobject = eobject1;
eobject1 = obj2;
eobject = tak__3(eproc, eobject, eobject1, eobject2);
eobject1 = obj2.dec();
eobject3 = eobject;
eobject2 = obj1;
eobject = eobject1;
eobject1 = obj;
obj = eobject3;
obj2 = ERT.NIL;
obj1 = ERT.NIL;
eobject = tak__3(eproc, eobject, eobject1, eobject2);
eobject1 = obj;
eobject2 = eobject;
eobject = obj3;
eproc.check_exit();
} while(true);
}

Charles Oliver Nutter

unread,

Jun 5, 2010, 12:27:53 PM6/5/10

to jvm-la...@googlegroups.com

Thanks very much Kresten. That satisfies my curiousity, and it makes
perfect sense that you'd get performance close to the JRuby dynopt
version with the way this compiles. It makes me wish Ruby were as
straightforward at times :)

I also see something else I find interesting: EObject apparently
implements dec() and is_ge()...I'm guessing you have implemented a
full set of numeric operators on EObject so that when they're
explicitly provided by a subtype it can be no more than an interface
or abstract dispatch? We've thought of doing the same for JRuby, but
were unsure whether it would be too cumbersome. The essential idea
would be that the default implementations would proceed to doing a
dynamic dispatch while the overridden versions would provide the
implementation in-place, perhaps with a guard in case someone
monkey-patched them. It would reduce most math operators against core
types to static calls (+ a two-field object comparison for the guard)
and greatly improve their inlinability.

Is that what you've done and why you've done it for Erjang?

Kresten Krab Thorup

unread,

Jun 5, 2010, 2:48:25 PM6/5/10

to JVM Languages

Yep, EObject has all the arithmetic operators.

EObject has lots of methods, and I have not worried about trying to
reduce them. EObject is an abstract class; and is not intended to be
subclassed outside of the erjang eco-system. I have an EObject
subclass called EPseudoTerm which is intended as an opaque reference
that you can subclass and can be passed around inside the Erjang
system.

dec(), inc() and is_zero() are special cases of arithmetics that are
there simply because they occur very often, especially in loops which
are typically written like thins in erlang programs:

do_something(0, OtherArgs, ...) -> done;
do_something(N, OtherArgs, ...) ->
do .. what ... ever ... with .. otherargs,
do_something(N-1, OtherArgs, ...).

dec(), inc() and is_zero() are implemented generically in EObject:

EObject dec() { return ESmall.MINUS_ONE.plus(this); } // (*note1)
EObject inc() { return ESmall.ONE.plus(this); }
boolean is_zero() { return false; }

and those three are then overridden in ESmall, as ...

EInteger dec() { return this.value == Integer.MIN_VALUE ?
EBig.MIN_INT_MINUS_ONE : new ESmall(this.value-1) ; }
EInteger inc() { return this.value == Integer.MAX_VALUE ?
EBig.MAX_INT_PLUS_ONE : new ESmall(this.value+1) ; }
boolean is_zero() { return this.value == 0; }

(*note1) The compiler will swap order of arguments to arithmetic
operators (if it can be done semantics-preserving) if the second has
static type, and the first has not; since binary operators dispatch
virtually on the first argument. So "X+1" becomes "1+X", "X-3"
becomes "-3 + X", etc.

Arithmetics is double dispatched, like this:

in ESmall:

ENumber minus(EObject other) {
return other.r_minus(this.value);
}
EInteger r_minus(int lhs) {
return EInteger.box((long)lhs - (long)this.value);
}
EDouble r_minus(double lhs) {
return EDouble.box(lhs - this.value);
}
EInteger r_minus(BigInteger lhs) {
return
EInteger.box(lhs.subtract(BigInteger.fromLong(this.value)));
}

... etc... and all those methods are declared in EObject ... most of
which throw "aritherr" in the generic implementation. But that is not
really an issue as far as I can see.

Kresten

On Jun 5, 6:27 pm, Charles Oliver Nutter <head...@headius.com> wrote:
> Thanks very much Kresten. That satisfies my curiousity, and it makes
> perfect sense that you'd get performance close to the JRuby dynopt
> version with the way this compiles. It makes me wish Ruby were as
> straightforward at times :)
>
> I also see something else I find interesting: EObject apparently
> implements dec() and is_ge()...I'm guessing you have implemented a
> full set of numeric operators on EObject so that when they're
> explicitly provided by a subtype it can be no more than an interface
> or abstract dispatch? We've thought of doing the same for JRuby, but
> were unsure whether it would be too cumbersome. The essential idea
> would be that the default implementations would proceed to doing a
> dynamic dispatch while the overridden versions would provide the
> implementation in-place, perhaps with a guard in case someone
> monkey-patched them. It would reduce most math operators against core
> types to static calls (+ a two-field object comparison for the guard)
> and greatly improve their inlinability.
>
> Is that what you've done and why you've done it for Erjang?
>

Kresten Krab Thorup

unread,

Jun 5, 2010, 3:06:54 PM6/5/10

to JVM Languages

Oh, btw, ...

In HotRuby I do the same. The codegen for X + B is X.fast_plus(B,
SEL_PLUS). (SEL_PLUS is statically allocated "Selector" object which
is akin to a CallSite object) And then fast_plus is implemented as a
normal dispatch in RubyObject:

public IRubyObject fast_plus(IRubyObject arg, Selector selector) {
return this.do_select(selector).call(this, arg);
}

See: http://github.com/krestenkrab/hotruby/blob/master/modules/vm-loaded/src/com/trifork/hotruby/objects/RubyObject.java#L153

In RubyFixnum, it knows you're in an arithmetic plus:

http://github.com/krestenkrab/hotruby/blob/master/modules/vm-loaded/src/com/trifork/hotruby/objects/RubyFixnum.java#L107

So the technique should be adaptable to JRuby as well.

Kresten

John Cowan

unread,

Jun 5, 2010, 7:02:23 PM6/5/10

to jvm-la...@googlegroups.com

On Sat, Jun 5, 2010 at 12:27 PM, Charles Oliver Nutter
<hea...@headius.com> wrote:

> I also see something else I find interesting: EObject apparently
> implements dec() and is_ge()...I'm guessing you have implemented a
> full set of numeric operators on EObject so that when they're
> explicitly provided by a subtype it can be no more than an interface
> or abstract dispatch? We've thought of doing the same for JRuby, but
> were unsure whether it would be too cumbersome. The essential idea
> would be that the default implementations would proceed to doing a
> dynamic dispatch while the overridden versions would provide the
> implementation in-place, perhaps with a guard in case someone
> monkey-patched them. It would reduce most math operators against core
> types to static calls (+ a two-field object comparison for the guard)
> and greatly improve their inlinability.

That's what Jcon does. There are about 150 methods defined on the
root class vDescriptor, most of which do nothing or throw errors.
Jcon is a very interesting early JVM implementation of the Icon
language, which is dynamically typed and has non-LIFO flow control; it
was written back in 1999, and doesn't provide (AFAIK) any escape to
Java: the JVM is treated as a pure implementation target.

Charles Oliver Nutter

unread,

Jun 5, 2010, 10:47:21 PM6/5/10

to jvm-la...@googlegroups.com

On Sat, Jun 5, 2010 at 2:06 PM, Kresten Krab Thorup <kr...@trifork.com> wrote:
> Oh, btw, ...
>
> In HotRuby I do the same. The codegen for X + B is X.fast_plus(B,
> SEL_PLUS). (SEL_PLUS is statically allocated "Selector" object which
> is akin to a CallSite object) And then fast_plus is implemented as a
> normal dispatch in RubyObject:
>
> public IRubyObject fast_plus(IRubyObject arg, Selector selector) {
> return this.do_select(selector).call(this, arg);
> }

Yup, that looks familiar...I've prototyped this a few times but never
felt like I wanted to special-case arithmetic quite yet. Now that
other Ruby implementations are starting to catch up on performance,
it's time to start tweaking again :)

> See: http://github.com/krestenkrab/hotruby/blob/master/modules/vm-loaded/src/com/trifork/hotruby/objects/RubyObject.java#L153
>
> In RubyFixnum, it knows you're in an arithmetic plus:
>
> http://github.com/krestenkrab/hotruby/blob/master/modules/vm-loaded/src/com/trifork/hotruby/objects/RubyFixnum.java#L107
>
> So the technique should be adaptable to JRuby as well.

Actually I had a much more ambitious idea at one point that depended
on interface injection:

* For each method name in the system, generate a one-method (or
N-method, if splitting up arities) interface that takes all
IRubyObject and returns IRubyObject
* As new methods are defined, add additional interfaces
* Compiled invocation of any dynamic method then is done by calling
against the appropriate one-method interface, injecting it if it has
not yet been injected

As the system reaches a steady state, all classes implement N
interfaces for the N method names they define, and dispatch is as fast
as interface injection will allow it to be for all calls.

It's perhaps a gross perversion of interface injection, but I thought
it was a cute idea :)

- Charlie

Alessio Stalla

unread,

Jun 6, 2010, 4:40:20 AM6/6/10

to jvm-la...@googlegroups.com

ABCL adopts a similar approach, too. It has a root class, LispObject,
with many methods which in their default implementation signal a type
error.
ABCL is a Common Lisp implementation for the JVM.

-- Alessio

Kresten Krab Thorup

unread,

Jun 9, 2010, 5:35:28 AM6/9/10

to JVM Languages

Charles,

I do something like the interface injection you talk about in HotRuby
too, except it is the CallSite object which I inject methods into, one
for each receiver type.
Here is some pseudo code to show how it works in HotRuby

It's contemplated as a "polymorphic inline cache"; but since we cannot
rewrite java byte code at runtime I put the replaceable code inside a
static which can then be replaced:

At the actual callsite, is a

static CallSite CALL_SITE1 = new CallSite("+", <info about how to
update the static CALL_SITE1> );

// A + B becomes
A . select ( CALL_SITE1 ) . call (A, B).

// each callsite has it's own class, eventually! which is re-
compiled in response to
// call-site statistics reaching some limit. Here is one for a call-
site that dispatches receiver
// types Fixnum and Foo.

class "CallSite#+" implementes CallSite__Fixnum, CallSite_Foo {

// update the static field that points to this callsite; needed
when
// this class is recompiled...
void become(CallSite site) { ... }

Callable for_Fixnum() { return RubyClass_Fixnum.PLUS_METHOD; /
*this field may be exactly typed */ }
Callable for_Foo() { return RubyClass_Foo.PLUS_MEHTOD; /*this
field may be exactly typed */ }
}

... the receiver is the one that chooses the method to run ...

class RubyClass_Foo {

Callable select(CallSite site) {
if (site instanceof CallSite_Foo)
return ((CallSite_Foo)site).for_Foo();
else
return slow_lookup(this, site);
}
}

Once you've missed the cache at this callsite for a number of times,
it will replace the callsite object with one that implements one or
more of the callback interfaces.

As far as I can see ... For the optimal case, this leaves just only
two virtual dispatches in the invocation path: one to call the
receiver's select method, and then secondly because we cannot
statically type the callsite the instance-of check inside select needs
some runtime checking.

I wrote a little about it in a blog entry some months back
http://www.javalimit.com/2009/12/yet-another-rubyonjvm-implementation.html

... kresten ...

On Jun 6, 4:47 am, Charles Oliver Nutter <head...@headius.com> wrote:

> On Sat, Jun 5, 2010 at 2:06 PM, Kresten Krab Thorup <k...@trifork.com> wrote:
>
> > Oh, btw, ...
>
> > In HotRuby I do the same. The codegen for X + B is X.fast_plus(B,
> > SEL_PLUS). (SEL_PLUS is statically allocated "Selector" object which
> > is akin to a CallSite object) And then fast_plus is implemented as a
> > normal dispatch in RubyObject:
>
> > public IRubyObject fast_plus(IRubyObject arg, Selector selector) {
> > return this.do_select(selector).call(this, arg);
> > }
>
> Yup, that looks familiar...I've prototyped this a few times but never
> felt like I wanted to special-case arithmetic quite yet. Now that
> other Ruby implementations are starting to catch up on performance,
> it's time to start tweaking again :)
>

> > See:http://github.com/krestenkrab/hotruby/blob/master/modules/vm-loaded/s...

>
> > In RubyFixnum, it knows you're in an arithmetic plus:
>

> >http://github.com/krestenkrab/hotruby/blob/master/modules/vm-loaded/s...

apmckinlay

unread,

Jun 9, 2010, 1:40:00 PM6/9/10

to JVM Languages

I take a slightly different approach in jSuneido. Standard operations
(like "add") are implemented by static methods. The methods check for
instanceof a jSuneido object and if so cast and call a virtual method.
The static operation methods also check for Java types like Int or
String. This means I can use these kind of simple Java types without
wrapping them. (Suneido doesn't allow operator overloading, which
simplifies this.)

I have not looked at performance yet (more concerned with getting
things working) so I have no idea what the implications of this
approach are. Presumably there is less memory usage with not wrapping
common Java types. But maybe the static methods won't inline very
well? And there is the cost of the type checks.

On Jun 6, 2:40 am, Alessio Stalla <alessiosta...@gmail.com> wrote:

Reply all

Reply to author

Forward