Hi, android-llvm groups
[1] Despite not yielding any benefits in the past, it has been shown that utilizing ARM ETM+AutoFDO can optimize the Linux kernel. I am interested in determining if conducting a trial run on my platform would be beneficial.
[1] https://lpc.events/event/7/contributions/798/
[2][3] Show an simple examples optimize a program using simpleperf and perf
[3] https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md
I encountered an error while using AutoFDO data to "build an optimized binary in Step 4" of the example provided in [2].
error: toolchain/pgo-profiles/sampling/Android.bp:3:1: unrecognized module type "fdo_profile"
Can you assist me in resolving this error?
On the other hand , I would like to inquire about the current feasibility of optimizing the kernel using autofdo+etm on ARM platforms.
Many thanks
Kind Regards
Zack
Hi Zack,
Thanks for the question. I've put my AFDO work on hold since moving
to a Zen2 based workstation since I'm now having numerous issues with
perf on that uarch.
I haven't had time to play with simpleperf on Android for purposes of
AFDO. It's definitely worth looking into. Yabin (cc'ed) might know
how best to collect profiles of the kernel using `simpleperf`.
On Thu, Jul 20, 2023 at 7:16 AM Zack Tsai <fissu...@gmail.com> wrote:
>
> Hi, android-llvm groups
>
>
>
> [1] Despite not yielding any benefits in the past, it has been shown that utilizing ARM ETM+AutoFDO can optimize the Linux kernel. I am interested in determining if conducting a trial run on my platform would be beneficial.
>
>
>
> [1] https://lpc.events/event/7/contributions/798/
>
>
>
> [2][3] Show an simple examples optimize a program using simpleperf and perf
>
>
>
> [2] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/collect_etm_data_for_autofdo.md
>
> [3] https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md
>
>
>
>
>
> I encountered an error while using AutoFDO data to "build an optimized binary in Step 4" of the example provided in [2].
>
>
>
> error: toolchain/pgo-profiles/sampling/Android.bp:3:1: unrecognized module type "fdo_profile"
>
>
>
> Can you assist me in resolving this error?
>
> On the other hand , I would like to inquire about the current feasibility of optimizing the kernel using autofdo+etm on ARM platforms.
>
>
>
>
>
> Many thanks
>
> Kind Regards
>
> Zack
>
> --
> You received this message because you are subscribed to the Google Groups "android-llvm" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to android-llvm...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/android-llvm/540b3849-dc0b-4619-92a2-ce4b70f45fb1n%40googlegroups.com.
--
Thanks,
~Nick Desaulniers
--
You received this message because you are subscribed to the Google Groups "android-llvm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to android-llvm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/android-llvm/CAKwvOdnudShC9XDWo5ucfVAQo7LUhTWr%3DEkQ5w6%2Bn1Ty3%2BSxnA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/android-llvm/084629a7-f952-4cae-8cb5-e9e562b4c017n%40googlegroups.com.
Hi Zack,
> Does this mean that ARM ETM doesn't work? I've tried some benchmark
programs, but I don't see any benefit.
I'll use the execution time as a benefit, if he's significantly shortened.I think what Nick means for sampling includes Intel LBR and ARM ETM, in contrast to instrumented PGO.ARM ETM collects the same kind of profile as Intel LBR and instrumented PGO. So ideally they should achievesimilar performance gain (If not, we can compare the profiles they generated).But in reality it's hard to do it correctly. If we don't see benchmark improvements:1) Maybe we didn't collect enough profile data2) The compiler may ignore the profile data, because of function name mismatch or something else.3) The gain of inlining more functions may be dragged by increased instruction cache misses, or too small to be noticed.4) The benchmark itself may be noisy.What we see in Android is, using ETM and AutoFDO can decrease 1.3% app startup time on average. And it can significantlyimprove performance of some JNI functions. I definitely think we should do more work investigating the performance effectof using AutoFDO, and release our profiles for public verification.
To view this discussion on the web visit https://groups.google.com/d/msgid/android-llvm/CALJ9ZPNybEOb0qfv%2BjNxzXvmSZD0PNbRDZ9aQOTe2zd4zpJNSQ%40mail.gmail.com.