Yes, a relocation based approach will be more robust and can fix the
.llvm_addrsig issue https://sourceware.org/bugzilla/show_bug.cgi?id=23817
The relocation type doesn't matter. Your implementation uses R_X86_64_32 and from/to/value/from/to/value/from/to/value
An alternative design is R_X86_64_NONE + value/value/value, i.e. from/to do not occupy space in the content.
We will get a 3x space saving.
We need to change the section type since changing representation is incompatible
and the sections from old object files should be ignored.
The new section type will be ignored by old LLVM tools as well.
>With this approach post processing tools that handle relocations correctly work for this section also.
>
>One thing is section is marked with SHF_EXCLUDE.
>From spec
>"This section is excluded from input to the link-edit of an executable or shared object. This flag is ignored if the SHF_ALLOC flag is also set, or if relocations exist against the section."
>
>So technically speaking it needs to be kept, and presumably relocations applied, but LLD follows gold and ld and discards sections marked as SHF_EXCLUDE even with relocations. So, I think this approach should be fine. https://reviews.llvm.org/D24966
This is Solaris's spec (since 1996), not standard ELF's. It can be advisory but
our behavior (mostly GNU ABI+Linux ABI+LLVM extensions) does not necessarily
follow it. I cannot find a definition in the x86-64 psABI or a GNU ABI
document.
I think the behavior as implemented in gold and LLD is more useful.
>Finally, this bug seems similar to https://sourceware.org/bugzilla/show_bug.cgi?id=23817. Proposed solution for that was also to use relocations.
>
>Implementation: https://reviews.llvm.org/D103212
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Preversing the structure is not needed because the symbol representation
is changed anyway.
You just need to change the value in
llvm/llvm/include/llvm/BinaryFormat/ELF.h
The name doesn't need to change.
Do you have measurement how well SHT_LLVM_CALL_GRAPH_PROFILE optimizes?
My understanding is that with ThinLTO+PGO it is has very tiny benefit.
The value reading old format is low. I don't expect users want to mix
newer object files with old object files and want to have optimization
from the old object files. SHT_LLVM_CALL_GRAPH_PROFILE_DEPRICATED is
just adding maintenance cost.
Changing the value but retaining the name is sufficient for the various
compatibility issues.
Reid's size concern is valid. Have you measured the size overhead?
We can use the SHT_REL format for SHT_LLVM_CALL_GRAPH_PROFILE
relocations, which brings down the per entry cost from 24 bytes to 16
bytes for ELFCLASS64. (It is a bit unfortunate that the Linux kernel
does not support ELFCLASS32 executables on a 64-bit architecture.
For most usage we are using the small code model and don't benefit
from 64-bit addresses/sizes.)
>
Do you have measurement how well SHT_LLVM_CALL_GRAPH_PROFILE optimizes?
> My understanding is that with ThinLTO+PGO it is has very tiny benefit.
With PLO(BOLT) the benefit is zero, but with PGO and without PLO, it was still visible. We’ve seen ~1% RPS improvement in the past when measuring mysql.
PGO does not cover profile guided function layout, either the linker or PLO need to take care of that part. We’ve been using standalone HFSort on top of PGO, and we wanted to fold that into either PLO or linker – the profile guided function layout exists in lld, but we ran into bugs hence the RFC and Alexander’s fix.
Thanks,
Wenlei
Circling back on this. https://reviews.llvm.org/D104080 mostly looks good.
.llvm.call-graph-profile mostly only affects -fprofile-use and
-fprofile-sample-use(-fauto-profile) produced output files.
From the measurement for clang-13 the size looks really matter.
Also note that for for instrumentation PGO, -fprofile-generate= object
files are much larger than -fprofile-use=, so -fprofile-use= slightly
size increases don't matter.
> The size went up from 107KB to 322KB, aggregate of all the input sections.