Re: Request for a new relaxed relocation types to support Basic Block Sections

81 views
Skip to first unread message

Sriraman Tallam

unread,
Jan 11, 2020, 12:47:08 PM1/11/20
to x86-6...@googlegroups.com, Rahman Lavaee, Krzysztof Pszeniczny, Han Shen, H.J. Lu, Rui Ueyama, Xinliang David Li
Resending after getting permissions to send to x86-64-abi@

TLDR;  We would like to add two new relocations: R_X86_64_PC32_JUMPX 
and R_X86_64_PC8_JUMPX  to support basic block sections (like function and data sections) in the LLVM compiler.

Basic Block Sections
----------------------------

Modern compilers support compiling with function and data sections via options -ffunction-sections and -fdata-sections, respectively, which places each function and data item in a separate section in the native object file. This allows the linker to do many link time optimizations like dead data and code elimination, identical code folding, and function reordering. Without function sections,  performing these optimizations at link time is significantly harder.  We are working on supporting basic block sections, which compiles every basic block into its own section. With this support, arbitrary reordering of basic blocks at link time is feasible.

This allows profile-guided basic block reordering at link-time and experiments show that this can further speed-up benchmarks that are already heavily optimized with PGO and LTO. We are introducing a new compiler option, -fbasicblock-sections, which
places every basic block in a unique ELF text section in the object file along with a symbol labelling the basic block. The linker can then order the basic block sections in any arbitrary sequence which when done correctly can encapsulate block layout, function layout and function splitting optimizations. However, the following need to be addressed for this to be feasible:

* The compiler must not allow any implicit fall-through between any two adjacent basic blocks as they could be reordered at link time to be non-adjacent. In other words, the compiler must make a fall-through between adjacent basic blocks explicit by retaining the direct jump instruction that jumps to the next basic block. These branches can only be removed later in the linking phase after the final ordering is performed as determined by Propeller.

* All inter-basic block branch targets would now need to be resolved by the linker as they cannot be calculated during compile time. This is done using static relocations which bloats the size of the object files. Further, the compiler tries to use short branch instructions on some ISAs for branch offsets that can be accommodated in one byte. This is not possible with basic block sections as the offset is not determined at compile time, and long branch instructions have to be used everywhere.

* The linker needs to perform a relaxation pass on all the branch instructions after laying out basic blocks. This relaxation removes
explicit fall-through branches between adjacent basic blocks and shrinks jump instructions whose offsets can be accommodated in smaller equivalent instructions.

In order for the linker to perform the above mentioned relaxation easily, it would be great if the compiler could use special relocations for branch instructions across basic block sections.   The linker could then specifically look for these relocations and perform
relaxations.

After discussing this initially with H.J. Lu, we would like to propose two new relocations :

1) R_X86_64_PC32_JUMPX
2) R_X86_64_PC8_JUMPX

These relocations will only be associated with jcc or jmp instructions.  Further, R_X86_64_PC8_JUMPX is a 8-bit relocation that
can only appear as the last bytes of a basic block section and as part of a jump instruction.  The linker would either extend it to 32-bits if the jump offset cannot be fit in a single byte or resolve it as a 1 byte relocation.

For a detailed RFC description of basic block sections and its use in the Propeller post link optimizer framework, please see this:
https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf

Thanks
Sri

Florian Weimer

unread,
Jan 21, 2020, 2:16:25 AM1/21/20
to 'Sriraman Tallam' via X86-64 System V Application Binary Interface, Sriraman Tallam, Rahman Lavaee, Krzysztof Pszeniczny, Han Shen, H.J. Lu, Rui Ueyama, Xinliang David Li
* via:

> After discussing this initially with H.J. Lu, we would like to propose
> two new relocations :
>
> 1) R_X86_64_PC32_JUMPX
> 2) R_X86_64_PC8_JUMPX
>
> These relocations will only be associated with jcc or jmp
> instructions. Further, R_X86_64_PC8_JUMPX is a 8-bit relocation that
> can only appear as the last bytes of a basic block section and as part
> of a jump instruction. The linker would either extend it to 32-bits
> if the jump offset cannot be fit in a single byte or resolve it as a 1
> byte relocation.
>
> For a detailed RFC description of basic block sections and its use in
> the Propeller post link optimizer framework, please see this:
> https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf

How does this interact with ongoing work to mitigate the performance
impact of the microcode update for the Intel Jcc erratum?

I see some discussion of debugging issues in the document. How far has
this work progressed? Is it still possible to produce correct DWARF
with this optimization?

Thanks,
Florian

Sriraman Tallam

unread,
Jan 21, 2020, 1:42:20 PM1/21/20
to Florian Weimer, 'Sriraman Tallam' via X86-64 System V Application Binary Interface, Rahman Lavaee, Krzysztof Pszeniczny, Han Shen, H.J. Lu, Rui Ueyama, Xinliang David Li, David Blaikie
On Mon, Jan 20, 2020 at 11:16 PM Florian Weimer <fwe...@redhat.com> wrote:
>
> * via:
>
> > After discussing this initially with H.J. Lu, we would like to propose
> > two new relocations :
> >
> > 1) R_X86_64_PC32_JUMPX
> > 2) R_X86_64_PC8_JUMPX
> >
> > These relocations will only be associated with jcc or jmp
> > instructions. Further, R_X86_64_PC8_JUMPX is a 8-bit relocation that
> > can only appear as the last bytes of a basic block section and as part
> > of a jump instruction. The linker would either extend it to 32-bits
> > if the jump offset cannot be fit in a single byte or resolve it as a 1
> > byte relocation.
> >
> > For a detailed RFC description of basic block sections and its use in
> > the Propeller post link optimizer framework, please see this:
> > https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf
>
> How does this interact with ongoing work to mitigate the performance
> impact of the microcode update for the Intel Jcc erratum?

Referring to : https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf

We were not aware of this erratum until now. We have to support
option -mbranches-within-32B-boundaries in the linker and the linker
must align branch instructions to not cross 64 byte boundaries just
like the assembler would. We will add support for this.
>
> I see some discussion of debugging issues in the document. How far has
> this work progressed? Is it still possible to produce correct DWARF
> with this optimization?

+David Blaikie

The DWARF information with respect to v4 seems correct. We are
looking at fixing DWARF v5 related issues now. The Debug Info
questions were more about efficiently generating debug ranges as we
generate a lot of them and this can be collapsed easily. We are
working on it.

Thanks
Sri

>
> Thanks,
> Florian
>
> --
> You received this message because you are subscribed to the Google Groups "X86-64 System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to x86-64-abi+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/x86-64-abi/87eevtth35.fsf%40oldenburg2.str.redhat.com.

Sriraman Tallam

unread,
Feb 5, 2020, 4:51:31 PM2/5/20
to David Blaikie, Florian Weimer, 'Sriraman Tallam' via X86-64 System V Application Binary Interface, Rahman Lavaee, Krzysztof Pszeniczny, Han Shen, H.J. Lu, Rui Ueyama, Xinliang David Li
Making sure x86-64-abi has this conversation as David is not subscribed.

On Tue, Jan 21, 2020 at 11:00 AM David Blaikie <bla...@google.com> wrote:
> Code relaxation (anything that changes the length of a machine code sequence post-compile/during link time) presents another complication in terms of DWARF that I don't think we've really discussed/looked into. Specifically it means a lot of the DWARF size optimizations (especially those in DWARFv5) may have very different tradeoffs and higher costs (because many of them rely on being able to reduce the number of relocations by using machine code label differences computed at compile time). If all those label differences aren't resolved at compile time, then that means many more relocations, and in the case of Split DWARF, moving label differences currently in the .dwo file back into the .o file so they can be readjusted at link time.
>
> If you can provide me with a sample/prototype where I can observe such relaxation I can start to work on some DWARF patch support or advise on how to do so - it might be somewhat complicated to implement.
>
> - Dave

Sriraman Tallam

unread,
Feb 7, 2020, 10:25:01 AM2/7/20
to David Blaikie, Florian Weimer, 'Sriraman Tallam' via X86-64 System V Application Binary Interface, Rahman Lavaee, Krzysztof Pszeniczny, Han Shen, H.J. Lu, Rui Ueyama, Xinliang David Li
David Blaikie's response:

"This seems compatible with DWARF. Producers would need to be careful
with regards to correctness if using Split DWARF (label differences
wouldn't be usable in the Split DWARF sections) & if using label
differences outside Split DWARF would have to ensure size relocations
are used for them (assemblers that produce such relaxing relocations
should produce size relocations for label differences over the
relaxable instructions in general) & may want to be selective about
using such label differences in general since they won't have the same
benefits/tradeoffs when they require a relocation."

Sriraman Tallam

unread,
Feb 11, 2020, 3:25:56 PM2/11/20
to David Blaikie, Fangrui Song, Florian Weimer, 'Sriraman Tallam' via X86-64 System V Application Binary Interface, Rahman Lavaee, Krzysztof Pszeniczny, Han Shen, H.J. Lu, Rui Ueyama, Xinliang David Li
+Fangrui Song

How do we take this forward? Should we send a patch with the added relocations?

Thanks
Sri

H.J. Lu

unread,
Feb 11, 2020, 3:31:46 PM2/11/20
to Sriraman Tallam, David Blaikie, Fangrui Song, Florian Weimer, 'Sriraman Tallam' via X86-64 System V Application Binary Interface, Rahman Lavaee, Krzysztof Pszeniczny, Han Shen, Rui Ueyama, Xinliang David Li
On Tue, Feb 11, 2020 at 12:25 PM Sriraman Tallam <tmsr...@google.com> wrote:
>
> +Fangrui Song
>
> How do we take this forward? Should we send a patch with the added relocations?

Please join

https://gitlab.com/x86-psABIs/x86-64-ABI

open an issue and create a merge request to add new relocations.
--
H.J.

Sriraman Tallam

unread,
Feb 11, 2020, 3:35:41 PM2/11/20
to H.J. Lu, David Blaikie, Fangrui Song, Florian Weimer, 'Sriraman Tallam' via X86-64 System V Application Binary Interface, Rahman Lavaee, Krzysztof Pszeniczny, Han Shen, Rui Ueyama, Xinliang David Li
On Tue, Feb 11, 2020 at 12:31 PM H.J. Lu <hjl....@gmail.com> wrote:
>
> On Tue, Feb 11, 2020 at 12:25 PM Sriraman Tallam <tmsr...@google.com> wrote:
> >
> > +Fangrui Song
> >
> > How do we take this forward? Should we send a patch with the added relocations?
>
> Please join
>
> https://gitlab.com/x86-psABIs/x86-64-ABI
>
> open an issue and create a merge request to add new relocations.

Thanks!, will do.
Sri
> To view this discussion on the web visit https://groups.google.com/d/msgid/x86-64-abi/CAMe9rOqn9yqJBYchxby9vKWEzuig9ru6ytZKENG0m9-24Yj5SA%40mail.gmail.com.

Ali Bahrami

unread,
Feb 11, 2020, 3:56:03 PM2/11/20
to x86-6...@googlegroups.com
On 2/11/20 1:35 PM, 'Sriraman Tallam' via X86-64 System V Application Binary Interface wrote:
>> Please join
>>
>> https://urldefense.com/v3/__https://gitlab.com/x86-psABIs/x86-64-ABI__;!!GqivPVa7Brio!KfLRjMhkJA9j79tkUCNtHeBuRj4bjNUxMxkOXebF3tvJDsqYyLUbqNikUyhc8i1W$
>>
>> open an issue and create a merge request to add new relocations.
> Thanks!, will do.


Is your work really finished, and ready for this?
Once a relation is in the ABI, it's there forever,
and everyone has to support it whether it pans out or not.
Perhaps you would want to build the system using them,
and then move to formalize it after it's working and
has proven itself?

From your description, I am expecting the number of
relocations to go up dramatically, and since relocation
processing is a big part of link-edit time, I wonder
how well this is going to scale on large programs.
You might find that you need to alter some details, or
that you need something else. Why not wait until you
know for sure?

Sorry if I've misread the situation. I'm just a cursory
bystander in this, watching it go by.

- Ali

Ali Bahrami

unread,
Feb 11, 2020, 4:01:59 PM2/11/20
to x86-6...@googlegroups.com
On 2/11/20 1:55 PM, Ali Bahrami wrote:
> Once a relation is in the ABI, it's there forever,

I'm not sure how my fingers warped "relocation" into "relation".
I trust it was clear, but sorry for any confusion.

Over the years, we've found that relocation processing
can be a big bottleneck in linking, hence my comments here.

- Ali

Sriraman Tallam

unread,
Feb 11, 2020, 4:44:06 PM2/11/20
to Ali Bahrami, 'Sriraman Tallam' via X86-64 System V Application Binary Interface
Hello Ali,

On Tue, Feb 11, 2020 at 12:56 PM Ali Bahrami <Ali.B...@oracle.com> wrote:
>
> On 2/11/20 1:35 PM, 'Sriraman Tallam' via X86-64 System V Application Binary Interface wrote:
> >> Please join
> >>
> >> https://urldefense.com/v3/__https://gitlab.com/x86-psABIs/x86-64-ABI__;!!GqivPVa7Brio!KfLRjMhkJA9j79tkUCNtHeBuRj4bjNUxMxkOXebF3tvJDsqYyLUbqNikUyhc8i1W$
> >>
> >> open an issue and create a merge request to add new relocations.
> > Thanks!, will do.
>
>
> Is your work really finished, and ready for this?
> Once a relation is in the ABI, it's there forever,
> and everyone has to support it whether it pans out or not.
> Perhaps you would want to build the system using them,
> and then move to formalize it after it's working and
> has proven itself?

I agree. I do not intend to have the patches adding the relocations
committed until our work is approved and checked in to LLVM.

>
> From your description, I am expecting the number of
> relocations to go up dramatically, and since relocation
> processing is a big part of link-edit time, I wonder
> how well this is going to scale on large programs.
> You might find that you need to alter some details, or
> that you need something else. Why not wait until you
> know for sure?

This is part of the cost of basic block sections which we looked at
quite a bit over the last few months. Using profiles, we are able to
significantly reduce the number of basic blocks that need sections and
hence need these relocations. For instance, for large programs, less
than 3% of the basic blocks would end up in unique sections requiring
these relocations. Even for the clang benchmark, which was an extreme
candidate, 7% of basic blocks required relocations.

>
> Sorry if I've misread the situation. I'm just a cursory
> bystander in this, watching it go by.

No you have not misread it. These are great questions!

Thanks
Sri

>
> - Ali
>
> --
> You received this message because you are subscribed to the Google Groups "X86-64 System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to x86-64-abi+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/x86-64-abi/f87a3222-fc6f-04a2-d232-99421aee8569%40Oracle.COM.
Reply all
Reply to author
Forward
0 new messages