Large Code Model Relaxations

14 views
Skip to first unread message

Farid Zakaria

unread,
Apr 16, 2026, 5:01:18 PM (6 days ago) Apr 16
to X86-64 System V Application Binary Interface
(I know I have been messaging a bunch recently. I am just focused on this work and you all have been very kind with the feedback)

I recently discussed in another thread (https://groups.google.com/g/x86-64-abi/c/RsJDf06xMJ0/m/p9PETDDqCgAJ) that we have started to explore the large code-model and optimizing our way down to small code-model performance (or near).

One of the optimizations that seem most appropriate are relaxations; and I noticed they were missing from the ABI.

Is there an appetite to support 3 additional relaxations?
I would like to propose three new relocation types for the x86-64 ABI to enable linker relaxation of large code model (-mcmodel=large) sequences.

This *should* allow binaries compiled with the large code model to recover most
of the performance cost when targets are within +/-2 GiB, without breaking ABI compatibility.

R_X86_64_LARGE_PC64

; before
movabs $symbol, %rax
; after: when symbol is within +/-2 GiB of %rip
lea    symbol(%rip), %rax
nop3      
                   # 3-byte NOP padding to maintain code layout

R_X86_64_LARGE_CALL64

; before
movabs $target, %r11  
call   *%r11  
; after: when symbol is within +/-2 GiB of %rip
call   target
noop7 

R_X86_64_LARGE_GOTPCREL

; before
movabs $symbol@GOT, %rax
movq   (%rax), %rax
; after option 1:  If the GOT slot is reachable via RIP-relative addressing, relax via standard GOTPCREL
movq   symbol@GOTPCREL(%rip), %rax
nop6
; after option 2: Symbol is reachable and local
lea    symbol(%rip), %rax
nop6

I tested this out internally and coupled with BOLT (to remove the NOPS) & I was able to achieve performance pretty tolerable from the baseline. I am still evaluating additional workloads.

The one additional challenge with these new relaxations are they require two adjacent instructions (movabs + call) to be relaxed as a unit. This introduces a new constraint: the compiler must guarantee adjacency. LLVM has intrinsics to couple instructions together for the optimizer.

H.J. Lu

unread,
Apr 16, 2026, 5:51:01 PM (6 days ago) Apr 16
to Farid Zakaria, X86-64 System V Application Binary Interface
I prefer the multiple GOT approach.


--
H.J.

Farid Zakaria

unread,
Apr 16, 2026, 6:53:23 PM (6 days ago) Apr 16
to X86-64 System V Application Binary Interface
Interesting....
I kind of took the lukewarm reception to thunks as probably the eventual uphill climb for multiple-GOT -- which would be a much larger ABI change I presumed.
I started to look into the large code-model due to my feeling of lack of consensus on how to handle everything (i.e. multiple GOT, r11 as a required scratch register etc..) started to look at the problem from the other direction.

H.J. Lu

unread,
Apr 16, 2026, 7:27:51 PM (6 days ago) Apr 16
to Farid Zakaria, X86-64 System V Application Binary Interface
On Fri, Apr 17, 2026 at 6:53 AM 'Farid Zakaria' via X86-64 System V
Application Binary Interface <x86-6...@googlegroups.com> wrote:
>
> Interesting....
> I kind of took the lukewarm reception to thunks as probably the eventual uphill climb for multiple-GOT -- which would be a much larger ABI change I presumed.
> I started to look into the large code-model due to my feeling of lack of consensus on how to handle everything (i.e. multiple GOT, r11 as a required scratch register etc..) started to look at the problem from the other direction.
>
>

Multiple GOT should solve both large data and text sizes. Multiple GOT
is supported by MIPS. We can leverage that.

--
H.J.

Jan Beulich

unread,
Apr 17, 2026, 2:05:20 AM (6 days ago) Apr 17
to Farid Zakaria, X86-64 System V Application Binary Interface
On 16.04.2026 23:01, 'Farid Zakaria' via X86-64 System V Application Binary Interface wrote:
> Is there an appetite to support 3 additional relaxations?
> I would like to propose three new relocation types for the x86-64 ABI to
> enable linker relaxation of large code model (-mcmodel=large) sequences.
>
> This *should* allow binaries compiled with the large code model to recover
> most
> of the performance cost when targets are within +/-2 GiB, without breaking
> ABI compatibility.
>
> R_X86_64_LARGE_PC64
>
> ; before
> movabs $symbol, %rax
> ; after: when symbol is within +/-2 GiB of %rip
> lea symbol(%rip), %rax
> nop3 # 3-byte NOP padding to maintain code layout

Is this really a win, performance wise?

> R_X86_64_LARGE_CALL64
>
> ; before
> movabs $target, %r11
> call *%r11
> ; after: when symbol is within +/-2 GiB of %rip
> call target
> noop7

Some care may be needed with CALLs. When putting the NOP(s) last, the
return address put on the stack will change. This may or may not be
confusing to some specialized code. If it wasn't 7 bytes to cover,
I'd suggest using a couple of redundant prefixes, but 7 of them is
possibly too much.

> The one additional challenge with these new relaxations are they require
> two adjacent instructions (movabs + call) to be relaxed as a unit. This
> introduces a new constraint: the compiler must guarantee adjacency. LLVM
> has intrinsics to couple instructions together for the optimizer.

R_X86_64_LARGE_PC64 doesn't have this constraint, and we may want to
avoid such a constraint for the other two by instead introducing
relocation pairs (some other architectures have such, iirc). For
R_X86_64_LARGE_CALL64 this may mean using two more bytes (so the
indirect CALL can be separately replaced by a direct one). (In fact,
I wonder if it wouldn't be possible to leave the MOVABS alone, in
which case both relocations could be entirely independent.)

For R_X86_64_LARGE_GOTPCREL the MOVABS would be replaced by MOV or
LEA (plus however big of a NOP or redundant prefixes are still
needed), while the original MOV would simply become NOP.

Separate question: How does use of MOVABS fit with PIC/PIE?

Jan

Farid Zakaria

unread,
Apr 17, 2026, 12:33:06 PM (5 days ago) Apr 17
to Jan Beulich, X86-64 System V Application Binary Interface
H.J. Lu,

I think a pursuit of both is likely necessary. Coming at the problem
from both ends BUT your support in pushing multiple-GOT & thunks would
be very much welcomed.
I hit a roadblock trying to upstream the work into LLVM due to the
constraint of it needing it to be codified in the x86-64 ABI.
Perhaps, we can collaborate on my PR for thunks
https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/67 & also
draft something similar for multiple GOT.
(Might as well also do the third one for these new relocations?)

Wdyt?

On Thu, Apr 16, 2026 at 11:05 PM Jan Beulich <jbeu...@suse.com> wrote:
>
> >
> On 16.04.2026 23:01, 'Farid Zakaria' via X86-64 System V Application Binary Interface wrote:
> > Is there an appetite to support 3 additional relaxations?
> > I would like to propose three new relocation types for the x86-64 ABI to
> > enable linker relaxation of large code model (-mcmodel=large) sequences.
> >
> > This *should* allow binaries compiled with the large code model to recover
> > most
> > of the performance cost when targets are within +/-2 GiB, without breaking
> > ABI compatibility.
> >
> > R_X86_64_LARGE_PC64
> >
> > ; before
> > movabs $symbol, %rax
> > ; after: when symbol is within +/-2 GiB of %rip
> > lea symbol(%rip), %rax
> > nop3 # 3-byte NOP padding to maintain code layout
>
> Is this really a win, performance wise?

I didn't benchmark each individual relocation type but rather all of
them together.
I ran it on an extensive workload we rely on internally here
(https://github.com/facebook/hhvm) and I am expanding the benchmark to
other workloads.


> > R_X86_64_LARGE_CALL64
> >
> > ; before
> > movabs $target, %r11
> > call *%r11
> > ; after: when symbol is within +/-2 GiB of %rip
> > call target
> > noop7
>
> Some care may be needed with CALLs. When putting the NOP(s) last, the
> return address put on the stack will change. This may or may not be
> confusing to some specialized code. If it wasn't 7 bytes to cover,
> I'd suggest using a couple of redundant prefixes, but 7 of them is
> possibly too much.
>
> > The one additional challenge with these new relaxations are they require
> > two adjacent instructions (movabs + call) to be relaxed as a unit. This
> > introduces a new constraint: the compiler must guarantee adjacency. LLVM
> > has intrinsics to couple instructions together for the optimizer.
>
> R_X86_64_LARGE_PC64 doesn't have this constraint, and we may want to
> avoid such a constraint for the other two by instead introducing
> relocation pairs (some other architectures have such, iirc). For
> R_X86_64_LARGE_CALL64 this may mean using two more bytes (so the
> indirect CALL can be separately replaced by a direct one). (In fact,
> I wonder if it wouldn't be possible to leave the MOVABS alone, in
> which case both relocations could be entirely independent.)

You might need to expand this thought a little bit more for me.
Why would we need two relocations here? The problem is that the
optimizer may still interleave code within the instructions
reducing the % of successful relaxations possible.

Arthur Eubanks

unread,
Apr 20, 2026, 5:21:37 PM (2 days ago) Apr 20
to Farid Zakaria, Jan Beulich, X86-64 System V Application Binary Interface
We at Google have a slightly different approach to the large binary problem (still nailing down some details), but it does depend on multi-GOT and thunks. Big +1 on the thunk change.

--
You received this message because you are subscribed to the Google Groups "X86-64 System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to x86-64-abi+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/x86-64-abi/CAJ17LMZ1V6xzv%2BW%2BPOWKsdVx3E0Rdo08Oof6c7W1SAAHKTdp5w%40mail.gmail.com.

Farid Zakaria

unread,
Apr 20, 2026, 5:27:30 PM (2 days ago) Apr 20
to X86-64 System V Application Binary Interface
Arthur, what was the idea again? I am forgetting it's been a while and it seemed minor the distinction to me at the time.

Is there something naiive I am missing with the large code-model & new relaxations by the way? If all the relocations are close enough to be relaxed; I'm surprised the penalty was so minimal. 
Aside from the new relocations to indicate relaxations this all mostly "ABI" compliant already. It seems appealing for now since it's likely I can move it along in the short term.
(One large internal patch I had to support was auto expanding .eh_frame & .gcc_except sdata4 -> sdata8 if they are beyond 2GiB relocation because we will be mixing small & large code).

Going into this benchmarking, I was expecting the performance to be a lot more abysmal >15%.
(Even without relaxations, the performance degradation was a lot less than I expected)

Anyways, I suspect I'll be pursuing both options.
I'm happy to spin-up another meeting to catch up and discuss where we currently are at. 

Arthur Eubanks

unread,
Apr 21, 2026, 2:32:24 AM (yesterday) Apr 21
to Farid Zakaria, X86-64 System V Application Binary Interface
Arthur, what was the idea again? I am forgetting it's been a while and it seemed minor the distinction to me at the time.
The main difference was that we only partition .ltext with GOTs so that there is always a GOT within 2GB, and make all global accesses go through the GOT so they can be relaxed when the binary is small. We think this is a lot simpler implementation wise. Will send out an RFC with details soon.

Michael Matz

unread,
Apr 21, 2026, 9:59:05 AM (yesterday) Apr 21
to Farid Zakaria, X86-64 System V Application Binary Interface
Hello,

On Thu, 16 Apr 2026, 'Farid Zakaria' via X86-64 System V Application Binary Interface wrote:

> Is there an appetite to support 3 additional relaxations? I would like
> to propose three new relocation types for the x86-64 ABI to enable
> linker relaxation of large code model (-mcmodel=large) sequences.

Oh god, no. Just no. The large code model should just quietly die. And
I say this as member of the team that originally introduced it in the
psABI for "but we need it, for real!" reasons (that never actually
materialized).

The thunking approach is IMO just better.


Ciao,
Michael.

Farid Zakaria

unread,
Apr 21, 2026, 11:42:10 AM (yesterday) Apr 21
to Michael Matz, X86-64 System V Application Binary Interface
Michael,

I would appreciate any backstory or justification for edification please.

Personally, I think "code models" as a whole don't need to even exist
and the original RFC
(https://docs.google.com/document/d/1UspcVqzPNg99IDWkLlkVp5NdIYtNk0TENr3kmR_w8uQ/edit?tab=t.0#heading=h.jzqywllczpj8)
we proposed at Meta
(or the one similar by Arthur at Google), makes code models irrelevant
since the code becomes "elastic".

Nonetheless, the realities of trying to solve the immediate need has
faced me to look at other solutions in the interim while we work to
push through the ABI changes necessary.
In that spirit of looking elsewhere, the large code model does seem
appealing and that's why I'm curious for more backstory about its
unloved position :)
Reply all
Reply to author
Forward
0 new messages