AOT performance

283 views

Skip to first unread message

shannah

unread,

Dec 3, 2012, 2:44:54 AM12/3/12

to av...@googlegroups.com

Hi Joel,

I have just finished setting up a port of CodenameOne using Avian for IOS. The current default port uses XMLVM, and it was my hunch that the performance of XMLVM wouldn't be as good as Avian since it is having to convert all code into stack operations in C. However, I set up a simple benchmark to compare my Avian build with the same application converted with XMLVM, and XMLVM actually out performed Avian.

The benchmark was to run the towers of hanoi. I ran it with n=30 moving from pole 1 to pole 3 on my iPhone 4S and found that XMLVM would complete in approx 35 seconds vs Avian 43 seconds.

Given your intimate knowledge of how Avian works, do these results make sense to you? Should avian AOT be approaching raw C code performance or are there factors that affect performance? Can you suggest things that I can do to try to improve the performance?

Thanks

Steve

Joel Dice

unread,

Dec 3, 2012, 11:13:25 AM12/3/12

to av...@googlegroups.com

On Sun, 2 Dec 2012, shannah wrote:

> Hi Joel,
> I have just finished setting up a port of CodenameOne using Avian for IOS.

> ï¿œThe current default port uses XMLVM, and it was my hunch that the

> performance of XMLVM wouldn't be as good as Avian since it is having to

> convert all code into stack operations in C. ï¿œHowever, I set up a simple

> benchmark to compare my Avian build with the same application converted with
> XMLVM, and XMLVM actually out performed Avian.

> The benchmark was to run the towers of hanoi. ï¿œI ran it with n=30 moving

> from pole 1 to pole 3 on my iPhone 4S and found that XMLVM would complete in
> approx 35 seconds vs Avian 43 seconds.
>
> Given your intimate knowledge of how Avian works, do these results make

> sense to you? ï¿œShould avian AOT be approaching raw C code performance or are
> there factors that affect performance? ï¿œCan you suggest things that I can do

> to try to improve the performance?

I can't comment much on how XMLVM works, but if it translates to C it can
in theory take advantage of the optimizations provided by the native C
compiler for the target platform.

Avian's AOT compiler is really just its JIT compiler run ahead of time,
and the JIT compiler is not very sophisticated. It doesn't do method
inlining, loop unrolling, autovectorization, code motion, intellegent
register allocation, etc. -- all of which modern C compilers like GCC and
LLVM are really good at. So, no, Avian would give you "raw C code"
performance when compared to a good C compiler.

ProGuard can help bridge the gap in theory, since it's capable of doing
many of the optimizations mentioned above on the Java bytecode before
Avian even sees it. However, I've had to disable optimization in e.g. the
hello-ios example because ProGuard crashes otherwise. I've tried to
reduce it to a simple test case so I could submit a useful bug report to
the ProGuard maintainer, but I haven't succeeded yet. If you've got the
time and the interest, you might want to give that a shot. You can start
by removing the "-dontoptimize" line from the makefile and see what
happens.

Besides that, my only suggestion is to run a real world test app (e.g. not
towers of hanoi unless that's actually the app you care about) through a
profiler to determine (A) if there really is a performance problem in
practice and (B) which code is the bottleneck. In my code at least, the
performance-sensitive parts are all implemented in native code and/or
dedicated hardware anyway (e.g. audio/video decoding and image scaling),
so the performance of Java code has not generally been an issue.

FWIW, I'm planning to do some work on trace-based JIT compilation in the
future, which will ultimately involve adding more sophisticated
optimizations to the compiler which should also be applicable to AOT
compilation. I can't say for sure whether that's months or years away,
though, since it will be strictly a spare time project, and spare time is
always hard to come by.

If you're really passionate about this, I can give you an overview of how
the JIT compiler works in Avian with an eye towards improving it, but I
must warn you that it is the most obscure and compilicated part of the VM.
I'll admit I didn't even try to make it easy for other people to
understand, so it reads like a raw brain dump and probably only makes
sense if your brain is wired just like mine :)

Steve Hannah

unread,

Dec 3, 2012, 1:34:52 PM12/3/12

to av...@googlegroups.com

Thanks for the reply. It offers a lot of insight and makes sense. I certainly would be interested in learning more about the JIT compile process. I haven't written a compiler before so I was kind of out of my depths in looking through the Avian code, but sometimes all it takes is a few well placed pointers to get things rolling.

Thanks again for all your help.

(btw a posted more information about my benchmark results and experimentation in my blog

http://sjhannah.com/blog/?p=225)

-Steve

On Mon, Dec 3, 2012 at 8:13 AM, Joel Dice <joel...@gmail.com> wrote:

On Sun, 2 Dec 2012, shannah wrote:

Hi Joel,
I have just finished setting up a port of CodenameOne using Avian for IOS.

The current default port uses XMLVM, and it was my hunch that the
performance of XMLVM wouldn't be as good as Avian since it is having to

convert all code into stack operations in C. However, I set up a simple

benchmark to compare my Avian build with the same application converted with
XMLVM, and XMLVM actually out performed Avian.

The benchmark was to run the towers of hanoi. I ran it with n=30 moving

from pole 1 to pole 3 on my iPhone 4S and found that XMLVM would complete in
approx 35 seconds vs Avian 43 seconds.

Given your intimate knowledge of how Avian works, do these results make

sense to you? Should avian AOT be approaching raw C code performance or are
there factors that affect performance? Can you suggest things that I can do

to try to improve the performance?

I can't comment much on how XMLVM works, but if it translates to C it can in theory take advantage of the optimizations provided by the native C compiler for the target platform.

Avian's AOT compiler is really just its JIT compiler run ahead of time, and the JIT compiler is not very sophisticated. It doesn't do method inlining, loop unrolling, autovectorization, code motion, intellegent register allocation, etc. -- all of which modern C compilers like GCC and LLVM are really good at. So, no, Avian would give you "raw C code" performance when compared to a good C compiler.

ProGuard can help bridge the gap in theory, since it's capable of doing many of the optimizations mentioned above on the Java bytecode before Avian even sees it. However, I've had to disable optimization in e.g. the hello-ios example because ProGuard crashes otherwise. I've tried to reduce it to a simple test case so I could submit a useful bug report to the ProGuard maintainer, but I haven't succeeded yet. If you've got the time and the interest, you might want to give that a shot. You can start by removing the "-dontoptimize" line from the makefile and see what happens.

Besides that, my only suggestion is to run a real world test app (e.g. not towers of hanoi unless that's actually the app you care about) through a profiler to determine (A) if there really is a performance problem in practice and (B) which code is the bottleneck. In my code at least, the performance-sensitive parts are all implemented in native code and/or dedicated hardware anyway (e.g. audio/video decoding and image scaling), so the performance of Java code has not generally been an issue.

FWIW, I'm planning to do some work on trace-based JIT compilation in the future, which will ultimately involve adding more sophisticated optimizations to the compiler which should also be applicable to AOT compilation. I can't say for sure whether that's months or years away, though, since it will be strictly a spare time project, and spare time is always hard to come by.

If you're really passionate about this, I can give you an overview of how the JIT compiler works in Avian with an eye towards improving it, but I must warn you that it is the most obscure and compilicated part of the VM. I'll admit I didn't even try to make it easy for other people to understand, so it reads like a raw brain dump and probably only makes sense if your brain is wired just like mine :)

--
You received this message because you are subscribed to the Google Groups "Avian" group.
To post to this group, send email to av...@googlegroups.com.
To unsubscribe from this group, send email to avian+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/avian?hl=en.

--
Steve Hannah

Web Lite Solutions Corp.

Steve Hannah

unread,

Dec 6, 2012, 5:55:24 PM12/6/12

to av...@googlegroups.com

Update on this. I decided to try the same benchmark today using native C code to serve as a baseline. I found that it ran it in 0ms. This is because the benchmark was poorly designed, and GCC/LLVM was able to optimize it into NOOP. This gave XMLVM a bit of an advantage in the previous benchmark results.

After fixing the issue with the benchmark (so that GCC/LLVM couldn't cheat anymore) I found that Avian ran about 15% faster than XMLVM. 15% is really negligible, but I thought I'd just post to clear up any misconceptions that I might have created with my previous results.

I have posted this in my blog also at

http://sjhannah.com/blog/?p=226

Sorry for the mixup, Joel. These results make a lot more sense.

-Steve

Joel Dice

unread,

Dec 6, 2012, 7:43:21 PM12/6/12

to av...@googlegroups.com

On Thu, 6 Dec 2012, Steve Hannah wrote:

> Update on this. ï¿œI decided to try the same benchmark today using native C
> code to serve as a baseline. ï¿œI found that it ran it in 0ms. ï¿œThis is

> because the benchmark was poorly designed, and GCC/LLVM was able to optimize

> it into NOOP. ï¿œThis gave XMLVM a bit of an advantage in the previous

> benchmark results.
> After fixing the issue with the benchmark (so that GCC/LLVM couldn't cheat

> anymore) I found that Avian ran about 15% faster than XMLVM. ï¿œ15% is really

> negligible, but I thought I'd just post to clear up any misconceptions that
> I might have created with my previous results.
>

> I have posted this in my blog also atï¿œ
> http://sjhannah.com/blog/?p=226
>
> Sorry for the mixup, Joel. ï¿œThese results make a lot more sense.

No problem at all. Thanks for the update.

I do want to reiterate that microbenchmarks like this only test a very
narrow part of the execution environment (C, Java, or otherwise). In this
case, you're mainly testing recursive static function call overhead.
That can be interesting if that's where your app spends a big chunk of
time, but most apps will also allocate heap memory, manipulate data
structures, do I/O, execute loops, etc.. Any of those things might be a
performance bottleneck, whereas even a 500% difference in static function
call overhead might not have any measurable effect relative to everything
else.

All of which is to say I agree that 15% is negligible in this context, and
there's little to be extrapolated about XMLVM's or Avian's overall
performance.

Reply all

Reply to author

Forward

0 new messages