Dalvik Performance Gap

Jack Harvard

unread,

Aug 6, 2012, 11:50:21 AM8/6/12

to android-...@googlegroups.com

A benchmark (AndEBench from Google Play) showed that native vs Java
version of the same application varies in performance by 20+ times on
Android, this shows that there's a large performance gap to improve on
for Dalvik. Before profiling and checking where to improve, wondering
whether 20x difference between native vs. Java is typical or not? Any
data points for other mobile JVMs?

Kristopher Micinski

unread,

Aug 6, 2012, 11:53:53 AM8/6/12

to android-...@googlegroups.com

I'm pretty suspicious of this really being the case, I believe Google
has perf docs on Dalvik, and even *without* JIT, I believe there isn't
a 20x gap (though maybe that's incorrect). *With* JIT, there's a lot
smaller gap. I would really try to verify that this benchmark isn't
just plain crap before pointing fingers at the VM..

Dalvik, performance wise, has always been fairly good, ..

kris

> --
> You received this message because you are subscribed to the Google Groups "android-platform" group.
> To post to this group, send email to android-...@googlegroups.com.
> To unsubscribe from this group, send email to android-platfo...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/android-platform?hl=en.
>

Jack Harvard

unread,

Aug 6, 2012, 12:07:15 PM8/6/12

to android-...@googlegroups.com

Thanks, Kris. That's why I want to find out, is there any such data on
other JVMs like Dalvik. I found no reference on whether 20x is typical
or not. The AndEBench is relatively new, I don't trust it fully, but
it's from an well-known organization EEMBC, so I wanted to give it a
go.

Also, on the JIT, according to Google I/O slides, on average, it
improves performance by 2x to 5x, but it totally depends on how much
code can be JIT-ed from the interpreter trace.

Kristopher Micinski

unread,

Aug 6, 2012, 12:59:20 PM8/6/12

to android-...@googlegroups.com

On Mon, Aug 6, 2012 at 12:07 PM, Jack Harvard <jack.h...@gmail.com> wrote:
> Thanks, Kris. That's why I want to find out, is there any such data on
> other JVMs like Dalvik. I found no reference on whether 20x is typical
> or not. The AndEBench is relatively new, I don't trust it fully, but
> it's from an well-known organization EEMBC, so I wanted to give it a
> go.
>
> Also, on the JIT, according to Google I/O slides, on average, it
> improves performance by 2x to 5x, but it totally depends on how much
> code can be JIT-ed from the interpreter trace.
>

That will depend on the benchmark, but you should also take into
account the fact that the first time the code is run there won't be
any JIT improvement (and perhaps slowdown because of the JIT
process!),

kris

Jack Harvard

unread,

Aug 6, 2012, 1:03:39 PM8/6/12

to android-...@googlegroups.com

Other people also confirmed that the native vs java can be as high as
30 times, as shown here

http://code.google.com/p/android-benchmarks/

and here

http://www.springerlink.com/content/978-3-642-01801-5/#section=31255&page=1&locus=0

Also, there was a discussion on how to improve Dalvik performance

http://stackoverflow.com/questions/1023502/how-would-you-improve-dalvik-androids-virtual-machine

I saw widening gaps on Nexus 7 to 36 times between native/Java.

Kristopher Micinski

unread,

Aug 6, 2012, 1:23:38 PM8/6/12

to android-...@googlegroups.com

I've seen these benchmarks before, hmm, and I think we've even argued
about their validity before on this list..

I think it's pretty hard to draw a fair comparison, simply because
it's hard to get an idea of how real apps *use* code.

I.e., is doing a benchmark on quicksort necessarily indicative of the
way that Android apps spend their time? I feel convinced that if
you're doing a math or processor heavy operation, then it might be
worthwhile to go ahead and write some native code inside your app, but
what's a fair benchmark for the overall ways an app behaves.

You should note that the stack overflow post and journal article
you're citing are old enough that they are almost irrelevant within
the context of performance for Dalvik. I would suspect that you
really need to redo performance tests, and try to characterize the
actual slowdown on *today's* implementations of Dalvik.

Numerical operations are one thing, array operations are another, the
benchmarks you point at do the best job so far of those I've seen,
but (as I said before) I'm not entirely sure they're indicative of the
way real apps work :-)

kris

Jack Harvard

unread,

Aug 7, 2012, 8:09:37 AM8/7/12

to android-...@googlegroups.com

I imagine that Google has done their bit of benchmarking and trading
off different constraints, however, it's not to say that Dalvik has
left no room for improvement, in fact, I think there's still room to
improve in terms of performance, what's not sure is, at what cost,
larger memory, battery drain etc, or most apps have 90% functions call
native library code, rather than written in Java.

https://blogs.oracle.com/javaseembedded/entry/how_does_android_22s_performance_stack_up_against_java_se_embedded

"The results show that although Androids new JIT is an improvement
over its interpreter only implementation, Android is still lagging
behind the performance of our Hotspot enabled Java SE Embedded. As
you can see from the above results, Java SE Embedded can execute Java
bytecodes from 2 to 3 times faster than Android 2.2."

On Mon, Aug 6, 2012 at 6:23 PM, Kristopher Micinski

hagenp

unread,

Aug 9, 2012, 4:16:50 PM8/9/12

to android-...@googlegroups.com

Warning: Long reply ahead.

On Monday, August 6, 2012 5:50:21 PM UTC+2, Jack Harvard wrote:

A benchmark (AndEBench from Google Play) showed that native vs Java
version of the same application varies in performance by 20+ times on
Android, this shows that there's a large performance gap to improve on
for Dalvik.

So... the statement is valid... for this one implementation. Of this one application.

Did you read this?: http://developer.android.com/guide/practices/performance.html

A co-worker baffled me with a remark that a JIT can even outrank an optimized native implementation.
My first reaction was: WHAT? How?
Easy: if a piece of code contains lots of conditional statements, but these are fixed over a long period of time once processing began (like in a long-running loop), the JIT compiler can optimize the conditions away and save time. Any static optimization can just not do this.

[I recall having read in an article that a rather unknown example of "JIT" was the Windows 3.x "blitter" code. The transformation matrix and operations between in-memory-image and screen is pre-computed into an opcode list, because otherwise the condition checks eat up too much performance. UNfortunately I could not find the article just now, if you find the reference, please let me know.]

The drawback with JIT compilers is of course that they also need processing cycles. And memory. That's why you want a fast JIT compiler that has a small memory footprint. You need to find a good balance between a JIT compilation to kick in early enough so to speed up your routines as much as possible, but late enough to prevent wasting time by compiling one-shot statements.

Before profiling and checking where to improve, wondering
whether 20x difference between native vs. Java is typical or not?

Problem is: there is no "typical". 20x seems a bit far off, but you always have to take all of these factors into account:

(a) device architecture (How fast are my operations? If I move data to RAM or Flash, does this improve or degrade performance? Is it better to use two cores or have one hi-speed core? How much battery can I afford?)

(b) application architecture (As I have Java and Native parts anyway, where is the best 'split point'? Can I use optimized libraries, perhaps even hardware-accelerated operations, or do I have to use a more flexible and clean object model?)

(c) algorithm (Is the method to compute my result really suitable? --- Depending on your task, you can have gains of more than 100x, if you switch from a bad algorithm to a suitable one.)

(d) implementation (Is my bytecode structure well suited for my tasks and for processing with my device? Are my static bytecode compiler, executable and variable memory layout and my VM and its JIT compiler efficient and well-parametrized for the task?)

To find the answers, you have to test and to measure.

And once you actually do, you could be surprised... e.g. there might be device builds that use the wrong ARM opcode model, throwing away speedups possible by using the correct settings. There might be bottlenecks in libraries or in applications that you could un-block. Even the Linux kernel might play a big role, e.g. its use of mutexes (see the BKL discussion).

Any data points for other mobile JVMs?

From here I get the impression, it's actually the wrong question to ask:
http://stackoverflow.com/questions/1984856/java-runtime-performance-vs-native-c-c-code

So "the answer" would be to compare the best Java version of an application against the best Native version of an application.
Note the application may not be _ported_ but has to be engineered from scratch to the same specifications.
And the developers for both versions must be top-notch in their field.
Otherwise you compare developer performance, not "Language" performance.

Other interesting reads:
http://www.excelsior-usa.com/jetcs00007.html
http://www.koushikdutta.com/2009/01/dalvik-vs-mono.html

---
just my 2 cents of opinion

Kristopher Micinski

unread,

Aug 9, 2012, 4:27:33 PM8/9/12

to android-...@googlegroups.com

On Thu, Aug 9, 2012 at 4:16 PM, hagenp <hagen....@gmail.com> wrote:
> Warning: Long reply ahead.
>
>
> On Monday, August 6, 2012 5:50:21 PM UTC+2, Jack Harvard wrote:
>>
>> A benchmark (AndEBench from Google Play) showed that native vs Java
>> version of the same application varies in performance by 20+ times on
>> Android, this shows that there's a large performance gap to improve on
>> for Dalvik.
>
> So... the statement is valid... for this one implementation. Of this one
> application.
>
> Did you read this?:
> http://developer.android.com/guide/practices/performance.html
>
> A co-worker baffled me with a remark that a JIT can even outrank an
> optimized native implementation.

That wouldn't surprise me at all, in fact this is frequently cited as
why you should use JIT, because at runtime you can have extra
information, be that in the form of more accurate branch prediction,
better information about the underlying hardware, etc...

(I believe HTC has a binary rewriter that optimizes native arm5 code
to better performing arm9, for example, depending on your device..?
Not exactly JIT, but an example from the previous)

> My first reaction was: WHAT? How?
> Easy: if a piece of code contains lots of conditional statements, but these
> are fixed over a long period of time once processing began (like in a
> long-running loop), the JIT compiler can optimize the conditions away and
> save time. Any static optimization can just not do this.
>

That's not *precisely* true, good static analysis techniques can do a
lot, but yes, in general a dynamic approach will be a lot more precise
for individual runs.

> The drawback with JIT compilers is of course that they also need processing
> cycles. And memory. That's why you want a fast JIT compiler that has a small
> memory footprint. You need to find a good balance between a JIT compilation
> to kick in early enough so to speed up your routines as much as possible,
> but late enough to prevent wasting time by compiling one-shot statements.
>

That's right, and remember that Android has a trace based JIT..

>>
>> Any data points for other mobile JVMs?
>
>
> From here I get the impression, it's actually the wrong question to ask:
> http://stackoverflow.com/questions/1984856/java-runtime-performance-vs-native-c-c-code
>
> So "the answer" would be to compare the best Java version of an application
> against the best Native version of an application.
> Note the application may not be _ported_ but has to be engineered from
> scratch to the same specifications.
> And the developers for both versions must be top-notch in their field.
> Otherwise you compare developer performance, not "Language" performance.
>
> Other interesting reads:
> http://www.excelsior-usa.com/jetcs00007.html
> http://www.koushikdutta.com/2009/01/dalvik-vs-mono.html
>
> ---
> just my 2 cents of opinion
>

You're correct that it's hard to compare against other VMs, however
*mobile* vms might be the right place. It also depends, however, on
the system architecture underlying the platform. The Android platform
and Dalvik were designed for each other, (and, initially without JIT!
Simply assembly optimized interpretation with a high perf and very
cute / small architecture..). If you throw another mobile vm / system
on Android it probably won't perform well, a similar argument can be
made the other way...

kris

hagenp

unread,

Aug 9, 2012, 4:43:32 PM8/9/12

to android-...@googlegroups.com

On Thursday, August 9, 2012 10:27:33 PM UTC+2, Kristopher Micinski wrote:

You're correct that it's hard to compare against other VMs, however
*mobile* vms might be the right place.

"Myriad Dalvik Turbo" looks interesting. They claim to achieve speedups of 5x with a highly optimized DVM.

Also I recall my Samsung i5700 came with an app that allowed to run real J2ME Java apps. But I don't think Samsung published any comparison data.

Jack Harvard

unread,

Aug 11, 2012, 6:12:35 PM8/11/12

to android-...@googlegroups.com

On 9 Aug 2012, at 21:16, hagenp wrote:

> Warning: Long reply ahead.
>
> On Monday, August 6, 2012 5:50:21 PM UTC+2, Jack Harvard wrote:
> A benchmark (AndEBench from Google Play) showed that native vs Java
> version of the same application varies in performance by 20+ times on
> Android, this shows that there's a large performance gap to improve on
> for Dalvik.
> So... the statement is valid... for this one implementation. Of this one application.
>
> Did you read this?: http://developer.android.com/guide/practices/performance.html

Thanks, I did, and went through the Google I/O talks on Dalvik and JIT too, which covers similar content.

> A co-worker baffled me with a remark that a JIT can even outrank an optimized native implementation.
> My first reaction was: WHAT? How?
> Easy: if a piece of code contains lots of conditional statements, but these are fixed over a long period of time once processing began (like in a long-running loop), the JIT compiler can optimize the conditions away and save time. Any static optimization can just not do this.

Yes, but unstable.

>
> [I recall having read in an article that a rather unknown example of "JIT" was the Windows 3.x "blitter" code. The transformation matrix and operations between in-memory-image and screen is pre-computed into an opcode list, because otherwise the condition checks eat up too much performance. UNfortunately I could not find the article just now, if you find the reference, please let me know.]

Point taken, hard to make it absolutely equivalent on two implementations.

> The drawback with JIT compilers is of course that they also need processing cycles. And memory. That's why you want a fast JIT compiler that has a small memory footprint. You need to find a good balance between a JIT compilation to kick in early enough so to speed up your routines as much as possible, but late enough to prevent wasting time by compiling one-shot statements.

Trace-based JIT is a tradeoff for fast performance boost, the JIT itself has more optimisations on the to-do list.

> Before profiling and checking where to improve, wondering
> whether 20x difference between native vs. Java is typical or not?
>
> Problem is: there is no "typical". 20x seems a bit far off, but you always have to take all of these factors into account:
>
> (a) device architecture (How fast are my operations? If I move data to RAM or Flash, does this improve or degrade performance? Is it better to use two cores or have one hi-speed core? How much battery can I afford?)
>
> (b) application architecture (As I have Java and Native parts anyway, where is the best 'split point'? Can I use optimized libraries, perhaps even hardware-accelerated operations, or do I have to use a more flexible and clean object model?)
>
> (c) algorithm (Is the method to compute my result really suitable? --- Depending on your task, you can have gains of more than 100x, if you switch from a bad algorithm to a suitable one.)
>
> (d) implementation (Is my bytecode structure well suited for my tasks and for processing with my device? Are my static bytecode compiler, executable and variable memory layout and my VM and its JIT compiler efficient and well-parametrized for the task?)
>
> To find the answers, you have to test and to measure.

The benchmarking of Java/Native performance difference is not an easy problem to solve, as the aspects that you already mentioned, there have been efforts of standardising Java benchmarks to evaluate all layers of the Java running environment - a Java benchmark called Dacapo (http://dacapobench.org/).

> And once you actually do, you could be surprised... e.g. there might be device builds that use the wrong ARM opcode model, throwing away speedups possible by using the correct settings. There might be bottlenecks in libraries or in applications that you could un-block. Even the Linux kernel might play a big role, e.g. its use of mutexes (see the BKL discussion).
>
> Any data points for other mobile JVMs?

It's more about benchmarking different JVMs (such as Dalvik, JVM mobile), indeed Dacapo has been used for this purpose.

>
> From here I get the impression, it's actually the wrong question to ask:
> http://stackoverflow.com/questions/1984856/java-runtime-performance-vs-native-c-c-code
>
> So "the answer" would be to compare the best Java version of an application against the best Native version of an application.
> Note the application may not be _ported_ but has to be engineered from scratch to the same specifications.
> And the developers for both versions must be top-notch in their field.
> Otherwise you compare developer performance, not "Language" performance.
>
> Other interesting reads:
> http://www.excelsior-usa.com/jetcs00007.html
> http://www.koushikdutta.com/2009/01/dalvik-vs-mono.html

There's been efforts to minimise the difference caused by developer performances, that's why standard benchmarks are created, which involves industry strength review process.

hagenp

unread,

Aug 12, 2012, 4:58:55 AM8/12/12

to android-...@googlegroups.com

On Sunday, 12 August 2012 00:12:35 UTC+2, Jack Harvard wrote:

The benchmarking of Java/Native performance difference is not an easy problem to solve, as the aspects that you already mentioned, there have been efforts of standardising Java benchmarks to evaluate all layers of the Java running environment - a Java benchmark called Dacapo (http://dacapobench.org/).

Very interesting! Hm... to help answering your original question, one would need a really well made "native DaCapo" suite (truly natively developed function-iddentical, not a port) for comparison.

There's been efforts to minimise the difference caused by developer performances, that's why standard benchmarks are created, which involves industry strength review process.

...hmmm.... compilers have been known to be tweaked to improve on standardized benchmarks.

As I see it, there is probably no answer to a question of how fast/slow a VM is 'in general' in comparison to native code. (You always have to test a specific hardware/software stack.) But many thanks for the interesting discussion. :-)

Kristopher Micinski

unread,

Aug 12, 2012, 12:01:35 PM8/12/12

to android-...@googlegroups.com

>> There's been efforts to minimise the difference caused by developer
>> performances, that's why standard benchmarks are created, which involves
>> industry strength review process.
>
> ...hmmm.... compilers have been known to be tweaked to improve on
> standardized benchmarks.
>
> As I see it, there is probably no answer to a question of how fast/slow a VM
> is 'in general' in comparison to native code. (You always have to test a
> specific hardware/software stack.) But many thanks for the interesting
> discussion. :-)
>

In the world of high performance computing, where you have compilers
for low level languages that can optimize the stuffing out of your
code for everything from vectorizing to cache optimizations this is
certainly the case, though I don't think I've ever seen any Java
compiler play this kind of trickery. (Remember, you have to optimize
the compiler / vm combo, not just the compiler or vm.)

kris

Reply all

Reply to author

Forward