The digital sound processing app I've written requires ~50,000 FFTs
which takes less than 2 seconds on a modern
JIT-enabled JVM running J2ME and less than 1 second as a C program on
a similarly powered mobile phone. However an on HTC Hero, the same
Java-based code takes 15 seconds. For optimal user experience, I need
a solution that works in under 2.5 seconds.
To benchmark Android, I ran very simple tests (results below) on an
HTC Hero. The results show the Dalvik VM to be >20 times slower than
J2ME and 25-50 times slower than a C program performing the same
operations on a similarly powered mobile phone.
For example, this simple iteration over an empty method 2 million
times takes 1.4 seconds even though it doesn’t do anything. The same
iteration is performed in milliseconds by a C program and about 50ms
on a similarly powered J2ME phone.
public void performanceTest1() {
for (int i = 0; i < 2000000; i++) {
emptyMethod();
}
}
private int emptyMethod() {
return 0;
}
Doing something a little more complex like calculating the imaginary
component of a complex conjugate 2 million times takes 3.2 seconds.
Again, this takes milliseconds on other mobile phones running J2ME or
C.
public void performanceTest2() {
for (int i = 0; i < 2000000; i++) {
int a = 5;
int b = 5;
int c = 5;
int x = 5;
int y = 5;
y = ((a >> 16) * ((c << 16) >> 16)) + (((a &
0X0000FFFF) * ((c <<
16) >> 16)) >> 16);
y = -y;
y += ((b >> 16) * (c >> 16)) + (((b & 0X0000FFFF) *
(c >> 16)) >>
16);
}
}
Has anyone else seen this problem on Android. My assumption is that
as Dalvik runs interpreted code without a JIT, then the NDK should
avoid these performance issues...but I wanted to post this reality
check to the NDK forum first.
But it's not clear yet whether the Dalvik JIT will be back-ported to
every existing Android phone.
For that reason alone it seems like it would be best to use native
code for FFT type applications on Android.
--
You received this message because you are subscribed to the Google Groups "android-ndk" group.
To post to this group, send email to andro...@googlegroups.com.
To unsubscribe from this group, send email to android-ndk...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/android-ndk?hl=en.
On Jan 6, 10:35 pm, Biosopher <astev...@gracenote.com> wrote:
> For example, this simple iteration over an empty method 2 million
> times takes 1.4 seconds even though it doesn’t do anything. The same
> iteration is performed in milliseconds by a C program and about 50ms
> on a similarly powered J2ME phone.
>
> public void performanceTest1() {
> for (int i = 0; i < 2000000; i++) {
> emptyMethod();
> }
>
> }
>
> private int emptyMethod() {
> return 0;
>
> }
>
I think it will depend on the compiler flags as it would be
automatically optimized away.
> Doing something a little more complex like calculating the imaginary
> component of a complex conjugate 2 million times takes 3.2 seconds.
> Again, this takes milliseconds on other mobile phones running J2ME or
> C.
>
> public void performanceTest2() {
> for (int i = 0; i < 2000000; i++) {
> int a = 5;
> int b = 5;
> int c = 5;
> int x = 5;
> int y = 5;
>
> y = ((a >> 16) * ((c << 16) >> 16)) + (((a &
> 0X0000FFFF) * ((c <<
> 16) >> 16)) >> 16);
> y = -y;
> y += ((b >> 16) * (c >> 16)) + (((b & 0X0000FFFF) *
> (c >> 16)) >>
> 16);
> }
>
> }
>
> Has anyone else seen this problem on Android. My assumption is that
> as Dalvik runs interpreted code without a JIT, then the NDK should
> avoid these performance issues...but I wanted to post this reality
> check to the NDK forum first.
If I understand correctly, then for Android, the code is compiled
using Standard JDK for desktop, while for JavaME, the compiler used is
different and that could the source of difference in performance, as
both would try to optimize in a different way.
Android SDK/ suite does not provide any separate compiler for android
platform and hence it is possible that the source of difference lies
somewhere there.
AFAIK even on JavaME there are several profiles like CDC and CLDC
based on the presence or absence of a floating point support. I don't
have much experience with JavaME per say so let me know if there is
__NO__ difference between Java compiler for desktop and for ME.
Unless you give more information about what device / VM / compiler
flags you are running with etc., it will be hard for any one to guess
why Dalvik VM is slow. From what I could gather from Android docs,
Dalvik was chosen for speed even though it might not be perfectly Java
Compliant. But we cannot really rule out licensing issues with other
VMs for google to avoid them. Ideally google should have posted the
performance difference, if any, for whatever benchmarks they had run
for comparison of VMs.
Since you have not provided the snippet of code which is doing FFT, I
am assuming that it is all integer based, otherwise the performance
difference could be because of a presence VFP.
Overall I think Android definitely requires some more information /
documentation/ benchmark runs to really compare the performance of
Android with other platforms.
HTH,
DivKis
I would appreciate any pointers on optimizing the compiled Java code
in a way that might improve the performance. The Dalvik performance
tests I ran were using the default compiler flags.
As an FYI, I've now rewritten the same code to run native via
Android's NDK and the performance is great in that case.
The performance tests were run on an HTC Hero using the default
compilation flags. As you noted, the FFTs are fixed point using ints
so no slowdown due to floats.
From what I've read, Dalvik was not selected for speed as much as the
need to circumvent Sun's licenses for Java virtual machines. This
explains why Dalvik hasn't yet taken advantage of modern Java
optimizations like Jit.
Others have found similar performance issues with Dalvik:
http://android.serverbox.ch/?page_id=28&cpage=1#comment-61
http://occipital.com/blog/2008/10/31/android-performance-2-loop-speed-and-the-dalvik-vm/
A JIT that performs inlining would turn this from 2 million method
calls into two million integer increments.
A JIT that performs strength-reduction would turn this into "i =
2000000".
So it's meaningless as a performance benchmark. It's really more of a
JIT feature detector.
> Doing something a little more complex like calculating the imaginary
> component of a complex conjugate 2 million times takes 3.2 seconds.
> Again, this takes milliseconds on other mobile phones running J2ME or
> C.
This is the sort of thing that JITs do extremely well. (No, I'm not
going to post performance numbers for the Dalvik JIT, since it's not
shipping yet.)
Have you tried executing the code under J2ME with the JIT disabled
there? ("-Xint" might do the trick.) I'm curious how an apples-to-
apples test compares.
As I mentioned back on Jan 7 at 10:02 am, this performance test was a
highly simplified but very useful test of Dalvik's current
configuration. Of course...a future Dalvik would ideally optimize
this away, but as Dalvik doesn't currently, this test shows an utterly
simple case of how the existing Dalvik underperforms.
The performance test could have been written in a more complex manner
to show the same result, but this one proved a very major point ijn a
simple way: "Dalvik's current implementation introduces considerable
overhead even for very simple operations."
A more complex test like this would have shown the same performance
issue but would have complicated the analysis (was passing the value
slow, was the return value slow, was the method call slow):
public void performanceTest2() {
int val = 0;
for (int i = 0; i < 2000000; i++) {
val += simpleMethod(int val);
}
}
private int emptyMethod(int val) {
return 1;
}
My first simple case showed that for the current Dalvik impementation,
the slow performance is completely due to the method calls....not from
passing arguments and return values around.
I contend that, in a more complex test, the function call wouldn't
have been inlined by the JIT. So you would have been mostly comparing
the cost of computation performed by the method, rather than the cost
of "making a method call" vs. "incrementing an integer". "i++;"
always beats "func(); i++;".
If your code has a lot of trivial methods that can be inlined easily,
then your simple test is an accurate reflection of real-world
differences. In practice, unless you're calling getter/setter methods
in inner loops, it's not a result from which one can draw meaningful
conclusions.
I have benchmarks the show the VM outperforming native code on a
standard benchmark, because the VM can use the floating point hardware
and the NDK is configured for software FP. It is true that, based on
that benchmark, the VM outperforms native code on float-intensive
computations. I would not, however, state that the interpreter is
faster than the native CPU.
Benchmarks are easy to write and execute, but deriving meaning from
the results can be tricky.
Ultimately the only result that matters is the speed of your actual
code on the target platform with the now-shipping OS. For the
computations you're performing on the devices you have, the speed is
unacceptably slow, and writing it in native code is the best solution.
I don't suppose there's any way to flip that switch?
Or is this another thing on the wishlist for NDK 2.X?
David Sauter
HTC Ion/Magic specs: Qualcomm® MSM7200A™, 528 MHz
Droid specs: Arm® Cortex™ A8 processor 550 mHz
If the mHz was the primary difference between the two chips (only 22
mHx), then Android 2.0.1 gave a considerable boost to my NDK
performance!
Does the Android kernel not emulate floating point instructions on processors that lack an FPU? I was under the impression (despite being a compile time option for the kernel) that this is pretty standard for Linux ARM.
Tristan Miller
On Jan 15, 2010 2:44 AM, "Dianne Hackborn" <hac...@android.com> wrote:On Thu, Jan 14, 2010 at 2:12 PM, David Sauter <del...@gmail.com> wrote:
> > > I have benchmarks the show the VM outperforming native code on a > > standard benchmark, becau...
Not until you can specify an appropriate CPU target, or else your app would crash on all of the existing devices that support the baseline ARM native code but not the FPU.
--
Dianne Hackborn
Android framework engineer
hac...@android.com
Note: please don't send private questions to me, as I don't have time to provide private support, and so won't reply to such e-mails. All such questions should be posted on public forums, where I and others can see and answer them.
We are porting the TotalCross VM to Android and i would love to see
the same benchmark using our vm. We plan to have an alpha version in
one month. I believe you will be able to port your code without much
trouble to our api. In the meanwhile, you could run it in Pocket PC or
even in win32 (winxp).
You can download it at: www.totalcross.com
best
guich
Does the Android kernel not emulate floating point instructions on processors that lack an FPU? I was under the impression (despite being a compile time option for the kernel) that this is pretty standard for Linux ARM.
Either way, running the same SO on two different platforms tests the
kernel and CPU, not the dev kit :)
Higher clock rate, dual-issue CPU, faster memory, etc. The Droid (and
Nexus One) feature substantially greater computing horsepower.
Switching from fixed-point math to float will also be pretty dramatic
on a Droid (esp. with double-precision ops).