Dalvik JIT Compiler

1,580 views
Skip to first unread message

Bill Buzbee

unread,
Nov 16, 2009, 5:53:50 PM11/16/09
to android-platform
As some of you have noticed, the latest Android Open Source Project
tree (eclair) includes source code for a Dalvik JIT compiler. The
Dalvik team has been actively investigating what kind of JIT would
work best over a wide range of memory- and power-constrained portable
Android devices, and the code in AOSP master is an old snapshot of
what we consider a promising proof-of-concept. It is a trace-based
JIT, compiling only hot code traces rather than method-at-a-time
strategy typically found on server-class JITs. It attempts to
minimize heap usage, and it requires no persistent storage. The goal
is to give a quick performance boost using very little heap and
battery.

The JIT has progressed significantly since the snapshot in AOSP
eclair, and we're working on pushing out a more current version.
Meanwhile, if you'd like to play with the prototype, you can build it
by creating a buildspec.mk file in your AOSP root which includes the
line "WITH_JIT := true".

Note that the prototype JIT had not been extensively tested at the
time the snapshot was taken, so you can expect some breakage. Also,
it contains few optimizations other than the basic elimination of the
interpreter's fetch/decode cycle. We're looking forward to getting a
newer version into the AOSP tree.

Bill Buzbee, Ben Cheng & the rest of the Dalvik team

Disconnect

unread,
Nov 16, 2009, 6:33:41 PM11/16/09
to android-...@googlegroups.com
Thats great news :) Will the new jvm be developed in the public repos or just through code dumps?


--

You received this message because you are subscribed to the Google Groups "android-platform" group.
To post to this group, send email to android-...@googlegroups.com.
To unsubscribe from this group, send email to android-platfo...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/android-platform?hl=.



Jean-Baptiste Queru

unread,
Nov 16, 2009, 6:37:53 PM11/16/09
to android-...@googlegroups.com
We're aiming for more openness than what we currently have (not hard,
given that less openness wouldn't be open at all at this point). I
can't make any precise commitment at the moment, but it's high up on
my todo list to help the dalvik team try to achieve that goal.

JBQ
--
Jean-Baptiste M. "JBQ" Queru
Software Engineer, Android Open-Source Project, Google.

Questions sent directly to me that have no reason for being private
will likely get ignored or forwarded to a public forum with no further
warning.

Cédric Berger

unread,
Nov 16, 2009, 6:45:36 PM11/16/09
to android-...@googlegroups.com
So this is not used yet on the Droid ?

Dan Bornstein

unread,
Nov 16, 2009, 8:30:19 PM11/16/09
to android-...@googlegroups.com
2009/11/16 Cédric Berger <cedric....@gmail.com>:
> So this is not used yet on the Droid ?

Correct.

-dan

Aaron

unread,
Nov 17, 2009, 12:30:55 AM11/17/09
to android-platform
Any numbers on how much of a performance gain?

On Nov 16, 5:30 pm, Dan Bornstein <danf...@android.com> wrote:
> 2009/11/16 Cédric Berger <cedric.berge...@gmail.com>:

Jim Huang

unread,
Nov 17, 2009, 12:34:25 AM11/17/09
to android-...@googlegroups.com
2009/11/17 Aaron <arr...@gmail.com>:
> Any numbers on how much of a performance gain?
>

hi Aaron,

There are some independent benchmark results done by 0xlab:
http://groups.google.com/group/0xlab-devel/browse_thread/thread/1edef26f4e5b7427

It is too rough to express in details though.

Sincerely,
Jim Huang (jserv)
http://0xlab.org/

Ben Cheng

unread,
Nov 17, 2009, 1:49:31 AM11/17/09
to android-...@googlegroups.com
The performance improvement provided by the JIT is highly dependent on the nature of the programs. Our favorite benchmark is the Checkers game from the market. The app is very CPU intensive, and at the end of each thinking period it will print out a hard number at the bottom right corner showing how many future moves are computed. From there you can get some sense of speedups from real Android apps.

As Bill mentioned earlier the JIT in the Eclair tree is an early prototype. The JIT is designed to be tightly integrated with the interpreter, so we can single-step back and forth between translated code and the interpreter at any bytecode boundary. This provides much development freedom as we can implement and test the JIT one bytecode at a time, instead of translating all 256 or nothing. Don't be surprised to see the JIT simply punt to the interpreter for certain bytecodes and provided no performance gains in the Eclair tree.

-Ben

blindfold

unread,
Nov 17, 2009, 7:02:16 AM11/17/09
to android-platform
Interesting development!

> It is a trace-based JIT, compiling only hot code traces rather than method-at-a-time strategy typically found on server-class JITs.

So the upcoming JIT compiler will not include support for explicit JIT
compilation through compileClass()? http://developer.android.com/reference/java/lang/Compiler.html

I imagine that having some developer control over what gets compiled
could serve cases that would otherwise perhaps not be classified as
"hot" due to - for instance - a low duty cycle. (I have an app that
does its CPU intensive work in a kind of burst mode, typically for a
fraction of a second per second; on ARM devices I have moved the most
costly parts to native code processing through the NDK, but on x86 and
other CPUs, performance will still much depend on what a JIT compiler
makes of it; I used to get a factor 6 or so speedup when using the
Java ME JIT on Nokia phones as compared to running the equivalent
program under the Dalvik interpreter on ADP1).

Thanks

Bill Buzbee

unread,
Nov 17, 2009, 10:16:52 AM11/17/09
to android-platform
> So the upcoming JIT compiler will not include support for explicit JIT
> compilation through compileClass()?http://developer.android.com/reference/java/lang/Compiler.html

No, though we do expect to provide some mechanism to disable the JIT
on an application by application basis.

> I imagine that having some developer control over what gets compiled
> could serve cases that would otherwise perhaps not be classified as
> "hot" due to - for instance - a low duty cycle. (I have an app that
> does its CPU intensive work in a kind of burst mode, typically for a
> fraction of a second per second;

One of the key trade-offs between trace and method JIT models is that
trace JITs can detect and exploit hot code faster than method JITs.
On the other hand, method JITs generally have a larger optimization
scope and thus can deliver more optimal translations. To restate, the
choice is between a near-immediate good performance boost vs. a
better, but delayed, performance boost.

An important use case for us is when a user downloads a new app from
the market. Anecdotally, we think that for many users there is a very
short window in which the first keep/uninstall decision is made - on
the order of a few seconds. If the application feels sluggish on
first use it may get discarded before it has a chance to shine. We
didn't want to require that an application build up a usage profile
over a series of runs, or introduce a big up-front compilation pause
to benefit from the JIT.

The trace JIT should give an immediate boost, including detecting and
translating your hot,burst-mode fragments. If it doesn't, then let us
know so we can adjust the trace selection mechanism.

...Bill Buzbee

andybak

unread,
Nov 20, 2009, 8:29:13 AM11/20/09
to android-platform


On Nov 17, 3:16 pm, Bill Buzbee <buz...@android.com> wrote:
> Anecdotally, we think that for many users there is a very
> short window in which the first keep/uninstall decision is made - on
> the order of a few seconds.  If the application feels sluggish on
> first use it may get discarded before it has a chance to shine.  We
> didn't want to require that an application build up a usage profile
> over a series of runs, or introduce a big up-front compilation pause
> to benefit from the JIT.

But it would be a shame to reduce the effectiveness of the Jit because
of naive user expectations.

Is there any way applications could be pre-optimized or have some kind
of JIT hinting stored in the downloaded package?

Tomei

unread,
Nov 20, 2009, 1:05:44 PM11/20/09
to android-platform
Hi Ben,

For "switchable between JIT and interpreter at bytecode level" -- is
this a just convenience way for developing/debugging the JIT, or part
of a fundamental design (i.e., to support single-stepping inside the
debugger). I hope it's the former, or else it seems performance gain
will be quite limited.

How much overhead does the trace collection add to interpretation?
There's a large part of the framework code that's executed exactly
once during app initialization. Will app start-up be hurt? With method-
based JIT it's quit easy to turn off JIT-sampling until the app has
painted the first screen, etc. However, with trace-based JIT, I
imagine that such a strategy would be much harder to do.

What would be the worst case scenario for trace-based compilation? For
example, if I have a large switch statement like this

switch (x) { // case 0 .. 255
case 1: very simple code; break;
case 2: very simple code; break;
case 3: very simple code; break;
...
case 255: very simple code; break;
}

can this be handled well?

Thanks!

On Nov 16, 10:49 pm, Ben Cheng <bcch...@android.com> wrote:
> The performance improvement provided by the JIT is highly dependent on the
> nature of the programs. Our favorite benchmark is the Checkers game from the
> market. The app is very CPU intensive, and at the end of each thinking
> period it will print out a hard number at the bottom right corner showing
> how many future moves are computed. From there you can get some sense of
> speedups from real Android apps.
>
> As Bill mentioned earlier the JIT in the Eclair tree is an early prototype.
> The JIT is designed to be tightly integrated with the interpreter, so we can
> single-step back and forth between translated code and the interpreter at
> any bytecode boundary. This provides much development freedom as we can
> implement and test the JIT one bytecode at a time, instead of translating
> all 256 or nothing. Don't be surprised to see the JIT simply punt to the
> interpreter for certain bytecodes and provided no performance gains in the
> Eclair tree.
>
> -Ben
>
>
>
> On Mon, Nov 16, 2009 at 9:34 PM, Jim Huang <jserv...@gmail.com> wrote:
> > 2009/11/17 Aaron <arro...@gmail.com>:
> > > Any numbers on how much of a performance gain?
>
> > hi Aaron,
>
> > There are some independent benchmark results done by 0xlab:
>
> >http://groups.google.com/group/0xlab-devel/browse_thread/thread/1edef...
>
> > It is too rough to express in details though.
>
> > Sincerely,
> > Jim Huang (jserv)
> >http://0xlab.org/
>
> > --
>
> > You received this message because you are subscribed to the Google Groups
> > "android-platform" group.
> > To post to this group, send email to android-...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > android-platfo...@googlegroups.com<android-platform%2Bunsubscrib­e...@googlegroups.com>
> > .

Ben Cheng

unread,
Nov 20, 2009, 2:34:09 PM11/20/09
to android-...@googlegroups.com
Hi Tomei,

Please see my replies inline:

On Fri, Nov 20, 2009 at 10:05 AM, Tomei <tomei.n...@gmail.com> wrote:
Hi Ben,

For "switchable between JIT and interpreter at bytecode level" -- is
this a just convenience way for developing/debugging the JIT, or part
of a fundamental design (i.e., to support single-stepping inside the
debugger). I hope it's the former, or else it seems performance gain
will be quite limited.


It is just a convenient way to develop/debug the JIT. Once a trace is cut for compilation, the JIT'ed instructions are allowed to move across the original bytecode boundaries. For example, we have implemented ld/st scheduling to aggressively hoist loads and sink stores until inserted scheduling barriers or memory instructions that cannot be disambiguated. Similarly, redundant stores to the Dalvik registers can also be eliminated so that not all computed results are written back to memory.

We use the "easy switch" feature to handle heavy bytecode instructions like array initializations. Instead of inflating the codegen routine to handle it inline, we issue a single-step instruction so that all compiled instructions prior to that are optimized as usual but memory state is guaranteed to be updated prior to the array initialization instruction. Then we use the interpreter to handle it (most of the time will be spent in memcpy after all), and offer a chance to jump back to the same compilation unit and continue from the next instruction in the trace.
 
How much overhead does the trace collection add to interpretation?
There's a large part of the framework code that's executed exactly
once during app initialization. Will app start-up be hurt? With method-
based JIT it's quit easy to turn off JIT-sampling until the app has
painted the first screen, etc. However, with trace-based JIT, I
imagine that such a strategy would be much harder to do.


We have two levels of trace collection mechanism in the interpreter. The first level hashes the branch target address to update a counter in a low-cost way, and the second level kicks in only after the first-level counter has saturated to show hotness. As a result, the overhead introduced to the interpreter should be very small.

What would be the worst case scenario for trace-based compilation? For
example, if I have a large switch statement like this

  switch (x) { // case 0 .. 255
  case 1:  very simple code; break;
  case 2:  very simple code; break;
  case 3:  very simple code; break;
  ...
  case 255:  very simple code; break;
 }

can this be handled well?


I am glad you use this code snippet as an example as it shows a benefit of the trace based JIT mechanism. When a switch statement turns hot, we compile all the cases first (ie form an empty dispatch table) where each case occupies a fixed-amount and small overhead in the code cache. Later whenever a corresponding "very simple code; break;" turns hot we compile it and chain the entry in the dispatch table to the newly constructed compilation. In this way we don't waste space and CPU cycles on rarely exercised code in a big switch statement. However, this optimization didn't make it to the Eclair snapshot in time so it is currently absent in the JIT until the next code drop.

Cheers,
-Ben

To unsubscribe from this group, send email to android-platfo...@googlegroups.com.

Loku

unread,
Aug 24, 2011, 9:44:02 AM8/24/11
to Ben Cheng, android-...@googlegroups.com
Hi Ben
Thanks for the great info. Are there any Android Java constructs to
optimize the code or make it faster thru JIT. Like inline or register
constructs in C++/C ?
Thanks
Lokesh

On Nov 21 2009, 12:34 am, Ben Cheng <bcch...@android.com> wrote:
> Hi Tomei,
>
> Please see my replies inline:
>

> > > > android-platfo...@googlegroups.com<android-platform%2Bunsu...@googlegroups.com>


> > <android-platform%2Bunsubscrib­e...@googlegroups.com>
> > > > .
> > > > For more options, visit this group at
> > > >http://groups.google.com/group/android-platform?hl=.
>
> > --
>
> > You received this message because you are subscribed to the Google Groups
> > "android-platform" group.
> > To post to this group, send email to android-...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > android-platfo...@googlegroups.com<android-platform%2Bunsu...@googlegroups.com>

Reply all
Reply to author
Forward
0 new messages