Question about collecting ETM data for AutoFDO.

79 views
Skip to first unread message

Zack Tsai

unread,
Jul 20, 2023, 10:16:54 AM7/20/23
to android-llvm

Hi, android-llvm groups

 

[1] Despite not yielding any benefits in the past, it has been shown that utilizing ARM ETM+AutoFDO can optimize the Linux kernel. I am interested in determining if conducting a trial run on my platform would be beneficial.

 

[1] https://lpc.events/event/7/contributions/798/

 

[2][3] Show an simple examples optimize a program using simpleperf and perf

 

[2] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/collect_etm_data_for_autofdo.md

[3] https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md

 

 

I encountered an error while using AutoFDO data to "build an optimized binary in Step 4" of the example provided in [2].

 

error: toolchain/pgo-profiles/sampling/Android.bp:3:1: unrecognized module type "fdo_profile"

 

Can you assist me in resolving this error?

On the other hand , I would like to inquire about the current feasibility of optimizing the kernel using autofdo+etm on ARM platforms.

 

 

Many thanks

Kind Regards

Zack 

Nick Desaulniers

unread,
Jul 24, 2023, 2:27:29 PM7/24/23
to Zack Tsai, android-llvm, Yabin Cui
Hi Zack,
Thanks for the question. I've put my AFDO work on hold since moving
to a Zen2 based workstation since I'm now having numerous issues with
perf on that uarch.

I haven't had time to play with simpleperf on Android for purposes of
AFDO. It's definitely worth looking into. Yabin (cc'ed) might know
how best to collect profiles of the kernel using `simpleperf`.
> --
> You received this message because you are subscribed to the Google Groups "android-llvm" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to android-llvm...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/android-llvm/540b3849-dc0b-4619-92a2-ce4b70f45fb1n%40googlegroups.com.



--
Thanks,
~Nick Desaulniers

Pirama Arumuga Nainar

unread,
Jul 27, 2023, 6:35:39 PM7/27/23
to Nick Desaulniers, Zack Tsai, android-llvm, Yabin Cui
On Mon, Jul 24, 2023 at 11:27 AM 'Nick Desaulniers' via android-llvm <androi...@googlegroups.com> wrote:
Hi Zack,
Thanks for the question.  I've put my AFDO work on hold since moving
to a Zen2 based workstation since I'm now having numerous issues with
perf on that uarch.

I haven't had time to play with simpleperf on Android for purposes of
AFDO.  It's definitely worth looking into. Yabin (cc'ed) might know
how best to collect profiles of the kernel using `simpleperf`.

On Thu, Jul 20, 2023 at 7:16 AM Zack Tsai <fissu...@gmail.com> wrote:
>
> Hi, android-llvm groups
>
>
>
> [1] Despite not yielding any benefits in the past, it has been shown that utilizing ARM ETM+AutoFDO can optimize the Linux kernel. I am interested in determining if conducting a trial run on my platform would be beneficial.
>
>
>
> [1] https://lpc.events/event/7/contributions/798/
>
>
>
> [2][3] Show an simple examples optimize a program using simpleperf and perf
>
>
>
> [2] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/collect_etm_data_for_autofdo.md
>
> [3] https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md
>
>
>
>
>
> I encountered an error while using AutoFDO data to "build an optimized binary in Step 4" of the example provided in [2].
>
>
>
> error: toolchain/pgo-profiles/sampling/Android.bp:3:1: unrecognized module type "fdo_profile"

The `fdo_profile` module type was recently added in AOSP.  If you're using an older release, skip this step.
>
>
>
> Can you assist me in resolving this error?
>
> On the other hand , I would like to inquire about the current feasibility of optimizing the kernel using autofdo+etm on ARM platforms.

@Yabin Cui Do you have any suggestions on collecting ETM profiles for the kernel?
 
>
>
>
>
>
> Many thanks
>
> Kind Regards
>
> Zack
>
> --
> You received this message because you are subscribed to the Google Groups "android-llvm" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to android-llvm...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/android-llvm/540b3849-dc0b-4619-92a2-ce4b70f45fb1n%40googlegroups.com.



--
Thanks,
~Nick Desaulniers

--
You received this message because you are subscribed to the Google Groups "android-llvm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to android-llvm...@googlegroups.com.

Yabin Cui

unread,
Jul 28, 2023, 8:28:38 PM7/28/23
to Pirama Arumuga Nainar, Nick Desaulniers, Zack Tsai, android-llvm
> On the other hand , I would like to inquire about the current feasibility of optimizing the kernel using autofdo+etm on ARM platforms.
In https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/collect_etm_data_for_autofdo.md#examples, we have included steps for collecting ETM data for the kernel.
But it doesn't include steps using the profile in building android kernels. That's something we want to do in the future.

Thanks,
Yabin

蔡沅信

unread,
Jul 29, 2023, 10:17:51 AM7/29/23
to Yabin Cui, Pirama Arumuga Nainar, Nick Desaulniers, android-llvm
Hi all



>The `fdo_profile` module type was recently added in AOSP.  If you're using an older release, skip this step.

If I skip this step, how do I recompile the final afdo file in my program?




>In https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/collect_etm_data_for_autofdo.md#examples, we have included steps for collecting ETM data for the kernel.
>But it doesn't include steps using the profile in building android kernels. That's something we want to do in the future.

Could you show me how to do a trial run at this stage?


I'm very happy for your help😀 

Many Thanks
Zack

乾淨無病毒。www.avg.com


Yabin Cui <yab...@google.com> 於 2023年7月29日 週六 上午8:27寫道:

Zack Tsai

unread,
Jul 29, 2023, 10:17:55 AM7/29/23
to android-llvm

Hi  Yabin,  Pirama  Arumuga


> The `fdo_profile` module type was recently added in AOSP.  If you're using an older release, skip this step.

After omitting that step, how do I recompile the final afdo file into my new program?

>In https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/collect_etm_data_for_autofdo.md#examples, we have included steps for collecting ETM data for the kernel.
>But it doesn't include steps using the profile in building android kernels. That's something we want to do in the future.

Could you show me how to do a trial run at this stage?


Thanks
Zack 


Yabin Cui 在 2023年7月29日 星期六上午8:28:38 [UTC+8] 的信中寫道:

Yabin Cui

unread,
Jul 31, 2023, 1:46:47 PM7/31/23
to Zack Tsai, android-llvm
Hi Zack,

> After omitting that step, how do I recompile the final afdo file into my new program?

When you use "afdo: true" in Android.bp for etm_test_loop, the build system searches for the corresponding afdo file.
Before fdo_profile is used, I think the search is just based on the file name (binary_name + ".afdo").
You can check build/soong/cc/afdo.go to find how it works.

Could you show me how to do a trial run at this stage?

I don't have detailed steps of using AFDO profile in android kernel building. But it should be adding cflags and ldflags of "-fprofile-sample-use=<the generated profile for kernel>.afdo".

Thanks,
Yabin

蔡沅信

unread,
Aug 2, 2023, 9:57:15 AM8/2/23
to Yabin Cui, android-llvm
Hi Yabin,

> I don't have detailed steps of using AFDO profile in android kernel building. But it should be adding cflags and ldflags of "-fprofile-sample-use=<the generated profile for kernel>.afdo".
It seems like this can work, but I don't see any benefits. May I ask
if you have any methods for evaluating the benefits?
On the other hand, I would like to know how you plan to use AFDO
profile in android kernel building. Can you provide me with some
information?

In my simpleperf, I only have events named cs_etm/autofdo/, but I
don't have an event named cs-etm. Can you please explain the
difference between these two events?

Is it still not possible to profile the entire kernel with this
method? Can it only be done for kernel modules?

Many thanks
Best Regards
Zack


Yabin Cui <yab...@google.com> 於 2023年8月1日 週二 上午1:46寫道:

Nick Desaulniers

unread,
Aug 8, 2023, 11:35:07 AM8/8/23
to 蔡沅信, Yabin Cui, Pirama Arumuga Nainar, android-llvm, Sami Tolvanen, Bill Wendling
On Mon, Aug 7, 2023 at 10:13 PM 蔡沅信 <fissu...@gmail.com> wrote:
>
> Hi Nick
>
> This is about the previous clang pgo patch that needs to be upstreamed
> to the kernel[1]. But it seems to have been rejected
>
> I have a few questions about this thread [1] that I would like to ask you.
>
> Why is it necessary to apply this patch when using clang pgo. Isn't it
> a built-in feature of the compiler?

I think of PGO as two parts; collecting data from the machine, and
extracting the data from the kernel. These are two events that are
occurring separate from one another. As such, the kernel must buffer
the data; the compiler is emitting instrumentation that will write
somewhere. That somewhere needs to be provided by the kernel runtime.
That's one half of the patch.

The other is having a means for userspace to ask the kernel to provide
that data, via sysfs nodes. That is the other half.

> Is it not possible to use the clang option directly when compiling the kernel?

You can, but without the drivers for the above IIRC the build will
fail to link since the linker will not be able to find the definition
of the symbols necessary for the AutoFDO runtime.

> I apologize for not understanding even after reading the entire thread[1].
> If you're willing, could you please provide some additional details to
> help clarify?

Linus thinks sampling is the way to go. That's great for Intel x86
(and maybe PPC), but doesn't work well for any other architecture or
many x86 uarch's.

>
> [1]https://lore.kernel.org/lkml/202106281231.E99B92BB13@keescook/
>
> Many thanks
> Best Regards
> Zack
>
> In my experiments, I put afdo in toolchain/pgo-profiles/sampling/ and
> with afdo:true, the size of the recompiled binary did not change.
> When using simpleperf to check the miss stall ratio with the event
> types 'branch-instructions' and 'branch-misses', there is no
> difference before and after applying afdo.
> 蔡沅信 <fissu...@gmail.com> 於 2023年8月2日 週三 下午8:35寫道:
--
Thanks,
~Nick Desaulniers

蔡沅信

unread,
Aug 9, 2023, 12:01:19 PM8/9/23
to Yabin Cui, Nick Desaulniers, Pirama Arumuga Nainar, android-llvm
Hi Nick

This is about the previous clang pgo patch that needs to be upstreamed
to the kernel[1]. But it seems to have been rejected

I have a few questions about this thread [1] that I would like to ask you.

Why is it necessary to apply this patch when using clang pgo. Isn't it
a built-in feature of the compiler?
Is it not possible to use the clang option directly when compiling the kernel?
I apologize for not understanding even after reading the entire thread[1].
If you're willing, could you please provide some additional details to
help clarify?

[1]https://lore.kernel.org/lkml/202106281231.E99B92BB13@keescook/

Many thanks
Best Regards
Zack

In my experiments, I put afdo in toolchain/pgo-profiles/sampling/ and
with afdo:true, the size of the recompiled binary did not change.
When using simpleperf to check the miss stall ratio with the event
types 'branch-instructions' and 'branch-misses', there is no
difference before and after applying afdo.
蔡沅信 <fissu...@gmail.com> 於 2023年8月2日 週三 下午8:35寫道:


>

蔡沅信

unread,
Aug 21, 2023, 12:20:09 PM8/21/23
to Nick Desaulniers, Yabin Cui, Pirama Arumuga Nainar, android-llvm, Sami Tolvanen, Bill Wendling
Hi Nick&Yabin

I'm sorry to bother you.

I'd like to confirm something so I'm asking you guys, just reply me
when you have time.

> Linus thinks sampling is the way to go. That's great for Intel x86
> (and maybe PPC), but doesn't work well for any other architecture or
> many x86 uarch's.

Does this mean that ARM ETM doesn't work? I've tried some benchmark
programs, but I don't see any benefit.
I'll use the execution time as a benefit, if he's significantly shortened.

On the other hand, does create_llvm_prof have a version that matches
the compiler? If so, how to check?
I'm not using create_llvm_prof from AOSP, because it's not in my
codebase, I build it myself after git clone autfdo[1].

This question is confusing, because I don't see the benefits so I'm
not sure if I'm actually applying afdo,
is there any way to verify? Is there any way to confirm this, like
recompiling the binary file to make it bigger?

[1]https://github.com/google/autofdo

Many thanks
Best Regards
Zack

<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
乾淨無病毒。www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Nick Desaulniers <ndesau...@google.com> 於 2023年8月8日 週二 下午11:35寫道:

Yabin Cui

unread,
Aug 21, 2023, 5:01:33 PM8/21/23
to 蔡沅信, Nick Desaulniers, Pirama Arumuga Nainar, android-llvm, Sami Tolvanen, Bill Wendling
Hi Zack,

> Does this mean that ARM ETM doesn't work? I've tried some benchmark
programs, but I don't see any benefit.
I'll use the execution time as a benefit, if he's significantly shortened.

I think what Nick means for sampling includes Intel LBR and ARM ETM, in contrast to instrumented PGO.
ARM ETM collects the same kind of profile as Intel LBR and instrumented PGO. So ideally they should achieve
similar performance gain (If not, we can compare the profiles they generated).
But in reality it's hard to do it correctly. If we don't see benchmark improvements:
1) Maybe we didn't collect enough profile data
2) The compiler may ignore the profile data, because of function name mismatch or something else.
3) The gain of inlining more functions may be dragged by increased instruction cache misses, or too small to be noticed.
4) The benchmark itself may be noisy.

What we see in Android is, using ETM and AutoFDO can decrease 1.3% app startup time on average. And it can significantly
improve performance of some JNI functions. I definitely think we should do more work investigating the performance effect
of using AutoFDO, and release our profiles for public verification.

> does create_llvm_prof have a version that matches
the compiler? If so, how to check?

I think no. It's totally fine to use create_llvm_prof built from AUTOFDO github.

> This question is confusing, because I don't see the benefits so I'm
not sure if I'm actually applying afdo,
is there any way to verify? Is there any way to confirm this, like
recompiling the binary file to make it bigger?

Generating a bigger binary file is a way to confirm it. Because the profile usually makes more functions inlined.
I also tried adding logs in https://llvm.org/doxygen/SampleProfile_8cpp_source.html before.

Thanks,
Yabin

Stephen Hines

unread,
Aug 21, 2023, 5:25:12 PM8/21/23
to Yabin Cui, 蔡沅信, Nick Desaulniers, Pirama Arumuga Nainar, android-llvm, Sami Tolvanen, Bill Wendling
On Mon, Aug 21, 2023 at 2:01 PM 'Yabin Cui' via android-llvm <androi...@googlegroups.com> wrote:
Hi Zack,

> Does this mean that ARM ETM doesn't work? I've tried some benchmark
programs, but I don't see any benefit.
I'll use the execution time as a benefit, if he's significantly shortened.

I think what Nick means for sampling includes Intel LBR and ARM ETM, in contrast to instrumented PGO.
ARM ETM collects the same kind of profile as Intel LBR and instrumented PGO. So ideally they should achieve
similar performance gain (If not, we can compare the profiles they generated).
But in reality it's hard to do it correctly. If we don't see benchmark improvements:
1) Maybe we didn't collect enough profile data
2) The compiler may ignore the profile data, because of function name mismatch or something else.
3) The gain of inlining more functions may be dragged by increased instruction cache misses, or too small to be noticed.
4) The benchmark itself may be noisy.

What we see in Android is, using ETM and AutoFDO can decrease 1.3% app startup time on average. And it can significantly
improve performance of some JNI functions. I definitely think we should do more work investigating the performance effect
of using AutoFDO, and release our profiles for public verification.

Note that the improvements we've seen in Android are purely focused on userspace AutoFDO. We haven't done any AutoFDO with the kernel, so releasing the profile data or not won't help if your main goal is to better optimize the kernel.

Steve
 

蔡沅信

unread,
Aug 22, 2023, 12:38:31 PM8/22/23
to Stephen Hines, Yabin Cui, Nick Desaulniers, Pirama Arumuga Nainar, android-llvm, Sami Tolvanen, Bill Wendling
Hi Yabin & Steve


>I think what Nick means for sampling includes Intel LBR and ARM ETM, in contrast to instrumented PGO.
>ARM ETM collects the same kind of profile as Intel LBR and instrumented PGO. So ideally they should achieve
>similar performance gain (If not, we can compare the profiles they generated).
>But in reality it's hard to do it correctly. If we don't see benchmark improvements:
>1) Maybe we didn't collect enough profile data
>2) The compiler may ignore the profile data, because of function name mismatch or something else.
>3) The gain of inlining more functions may be dragged by increased instruction cache misses, or too small to be noticed.
>4) The benchmark itself may be noisy.

Thank you for the hint you provided.
According to 1) Is it possible to merge profile data from different workloads? e.g. llvm-profdata merge --output=XXXX.bin XXXX.bin ... --weighted-input=<string> And weights can be set.
According to 3) The compiler uses profile data obtained from the ARM ETM to optimize performance.
It converts frequently accessed branches into inline functions. this will lead to more instructions and cache misses.

Is my understanding correct?


>What we see in Android is, using ETM and AutoFDO can decrease 1.3% app startup time on average. And it can significantly
>improve performance of some JNI functions. I definitely think we should do more work investigating the performance effect
>of using AutoFDO, and release our profiles for public verification.

Is this not about making changes to third-party apps? Are you targeting the entire Android system or just a few Android services?
It would be good if profiles on app could also be released in the future.


>Generating a bigger binary file is a way to confirm it. Because the profile usually makes more functions inlined.
>I also tried adding logs in https://llvm.org/doxygen/SampleProfile_8cpp_source.html before.

Does this mean that the size of the binary file will be different and larger after applying AFDO and recompiling it?
Is there a user manual available for SampleProfile_8cpp? I apologize, but I am not an expert in LLVM. Orz


>Note that the improvements we've seen in Android are purely focused on userspace AutoFDO. We haven't done any AutoFDO with the kernel, so releasing the profile data or not won't help if your main goal is to better optimize the kernel.

Yes, if these profiles are available, Android developers still need them (Some services are provided by the vendor in the user space.)
Has your professional team considered using "Autofdo on the kernel" in the future?



Many thanks
Best Regards
Zack


Stephen Hines <srh...@google.com> 於 2023年8月22日 週二 上午5:25寫道:

乾淨無病毒。www.avg.com
Reply all
Reply to author
Forward
0 new messages