Using ARMv7a With Your App

343 views
Skip to first unread message

knappador

unread,
Nov 2, 2013, 12:51:07 AM11/2/13
to kivy-...@googlegroups.com
Sharing info.  If you want to use ARMv7a with the faster floating point and other advantages, you can have multiple apks and  let google Play decide which apk to install on the user's device.
http://developer.android.com/google/play/publishing/multiple-apks.html

Using Python-for-Android, inside distribute.sh, you can switch the comment on the following two lines:

        #export ARCH="armeabi"
        export ARCH="armeabi-v7a" # not tested yet.  

It's tested.  Looks good.  I looked up all the compiler options.  You can see them here inside distribute.sh:

        if [ "X$ARCH" == "Xarmeabi-v7a" ]; then
                CFLAGS+=" -march=armv7-a -mfloat-abi=softfp -mfpu=vfp -mthumb"
        fi

These do the following:
  • -march=armv7-a:  use arm7 that supports thumb2 and other instructions supported
  • -mthumb:  compile to thumb/thumb2 that results in a smaller binary (which loads from memory faster and takes up less cache/memory)
  • -mfloat-abi=softfp:  use hardware floating point, but retain calling compatibility (and linking) with soft-fp calling convention
  • -mfpu=vfp:  use vector floating point, which is actually scalar and only kind of good.
To support Neon instructions, it appears the application itself will have to recognize which binary to use and be able to select the right one.  The NDK has tools for doing this with Android's weird make system, but this might not be what I want to use for enabling Cython extensions to be built for Kivy apps.  Neon does provide real vector FP calculations, but will only be useful in somewhat rare circumstances, 

Also, you can considerably reduce the executable size by using -mthumb in combination with -Os that will not increase your binary size for the sake of small optimizations that aren't guaranteed to be faster due to memory-subsystem contention.  Again, in distribute.sh:

        export OFLAG="-Os"   
        #export OFLAG="-O2" 
        #export OFLAG="-O3"  

Thumb and -Os with some more thorough blacklisting (and customization of Kivy) can produce a much smaller application.  I have build Kivy Apps below 4.5Mb with only slight customization.  They unpack faster and load faster.  Both of these provide good user experience.  The ARMV7a with thumb and hardware floating-point might perform faster on layouts, so I will do more testing on my production applications that do animations with widgets.  My latest app has no such animation speed problem, but all animations are through Kivy's OpenGL instruction API, not using layouts or widgets etc.

If anyone has any good ideas about how to use Neon extensions in one binary and select the appropriate binary at runtime, let me know =)

knappador

unread,
Nov 2, 2013, 12:53:46 AM11/2/13
to kivy-...@googlegroups.com
Also, Tegra 2 supports ARMv7a but not Neon.  This is the kind of reason we can't use neon on all ARMv7a processors without some way to detect the Neon support and load the right binary.  There is very little reason to use ARMv7, but to support older devices, you can build an apk with this (ARM-8a will give us the same problem someday) and use the multiple APK support to make things work.

Akshay Arora

unread,
Nov 2, 2013, 3:54:28 PM11/2/13
to kivy-...@googlegroups.com
@knappador nice research, maybe we should add this to the docs too?, would help a lot for people looking to optimize startup and speed.


On Sat, Nov 2, 2013 at 10:23 AM, knappador <knap...@gmail.com> wrote:
Also, Tegra 2 supports ARMv7a but not Neon.  This is the kind of reason we can't use neon on all ARMv7a processors without some way to detect the Neon support and load the right binary.  There is very little reason to use ARMv7, but to support older devices, you can build an apk with this (ARM-8a will give us the same problem someday) and use the multiple APK support to make things work.

--
You received this message because you are subscribed to the Google Groups "Kivy users support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kivy-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mathieu Virbel

unread,
Nov 2, 2013, 4:41:09 PM11/2/13
to kivy-...@googlegroups.com

Yep, me of my friend also tried x86 compiler for x86 android tablet.

Would it be possible for you to add a switch in distribute.sh for generating armv7?

--

knappador

unread,
Nov 2, 2013, 6:47:11 PM11/2/13
to kivy-...@googlegroups.com
Hypothesis:
Armv7a with thumb and -Os provides smallest binary.

Setup:
Build canvas_stress.py into an application.  Each time, I use the same build script and same blacklist (somewhat custom) and different compiler options.  Blacklist isn't totally tuned, so there is some data not reacting.  See notes for findings that affected the results.

Results:
Oflag  ARM version   Thumb     Bytes          MB      
-O2    -march=armv7  -mthumb:  5956509 bytes  5.9MB
-Os    -march=armv7  -mthumb:  5610859 bytes  5.6MB
-Os    -march=armv7  -marm:    5610618 bytes  5.6MB
-O2    -march=armv7a -marm:    5895994 bytes  5.8MB
-O2    -march=armv7a -mthumb:  5710331 bytes  5.7MB
-Os    -march=armv7a -mthumb:  5342759 bytes  5.3MB  (smallest test)
-O3    -march=armv7a -mthumb:  6130323 bytes  6.1MB  (largest test)

Conclusion:
-Os seems to provide largest effect on size.  Sure enough, compared to defaults we get a 10% reduction in apk size using -Os and mthumb with v7a.  Not everything uses -Os and not all of the data is binary, so there is more effect on binary than just 10% and still more size reduction can be obtained.

Notes:
Theses tests revealed inconsistencies in our compiler flag support.  Not all recipes work correctly, and while building I notice that many will list compiler options (Looking at you Python and PIL) such as -Os -mthumb -O3, which in fact overrides the build back to -O3, because the configure script doesn't react to the $OFLAG set in distribute.sh.  I seem to have successfully modified the Python recipe to react to $OFLAG and am committing the change to github.

The blacklist still has the largest affect on the binary by far.  The smallest file is one that doesn't exist.  We could use some tools that do dependency tracking such as snakefood to dynamically create blacklists.  We're using only about 20% of all the files built into APK right now?  

To eliminate unused files, lazy loading is also very important.  Import graphs don't help us if every module imports everything.  I have customized my application.py to allow blacklisting the kivy/data/fonts directory.  Customization was necessary because the file tries to import a lot of setting dependencies that I usually don't care about at all in production applications.  The kivy/data/fonts directory makes us have about 1MB of data extra.

Thumb2 is noted to have better size/performance characteristics than Thumb1.  What we probably want in the end is to select -Os for all of the dependencies such as PIL (always rarely used) and -O2 for a custom Cython file that might be part of the core of the application.  There were not tests done on performance yet.  I need a better benchmarking application to allow for real testing on the device.  Canvas and layout animation tests are probably most needed.  For applications needing raw speed, building Cython after the distribute step is still most important compared to tuning the compiler.
Reply all
Reply to author
Forward
0 new messages