Making medium code-model handle large binaries

50 views
Skip to first unread message

Farid Zakaria

unread,
Jan 12, 2026, 11:39:09 PMJan 12
to X86-64 System V Application Binary Interface
Hi everyone,

I got recommended by Fangrui Song (MaskRay) to kickstart a conversation hear.

At work, we are starting to produce very large binaries. I have been looking into relocation overflows, largely from LLVM's lld and how to solve them without moving over to the large code-model.

I have been investigating the medium code-model but despite documentation hinting it's meant for binaries whose data may be larger than 2GiB, I have been facing relocation overflows in other sections such as .gcc_except_table.

In LLVM for instance, I noticed that although the object format supports multiple byte-encodings (sdata4 and sdata8), it was was always fixed to sdata4.

I wanted to propose moving sdata4 to sdata8 for a variant of the medium code-model, potentially with another supplementary flag. While the x86-64 code model specifications do not explicitly mandate a specific DW_EH_PE (DWARF Exception Handling) encoding, coordinating with GCC/binutils ensures cross-toolchain compatibility and alignment on encoding expectations would be ideal (especially if we choose an additional gating flag).

Looking ahead, I seek to expand the support of medium code-model to handle large binaries. I have explored the implementation of range extension thunks. This will allow the linker to handle branch targets that exceed the 32-bit reach of CALL/JUMP instructions, without incurring overhead on all call instructions (many will be in-reach, so generating the multi-instruction sequence on the compiler side is wasteful for all of them in the large code-model). I have also additional implementation that leverages multiple GOTs for those relocations as well so that they remain within 2GiB (with large-data-threshold of 0)

On the LLVM side, I have opened:
https://github.com/llvm/llvm-project/pull/174637
https://github.com/llvm/llvm-project/pull/174486

I would like to also upstream the additional GOT handling as well depending on reception.

Thanks! Look forward to the discussion.

Jan Beulich

unread,
Jan 13, 2026, 2:20:39 AMJan 13
to Farid Zakaria, X86-64 System V Application Binary Interface
On 13.01.2026 05:39, Farid Zakaria wrote:
> In LLVM for instance, I noticed that although the object format supports
> multiple byte-encodings (sdata4 and sdata8), it was was always fixed to
> sdata4.

One question is - are the EH encodings the only thing which needs dealing
with?

> I wanted to propose moving sdata4 to sdata8 for a variant of the medium
> code-model, potentially with another supplementary flag. While the x86-64
> code model specifications do not explicitly mandate a specific DW_EH_PE
> (DWARF Exception Handling) encoding, coordinating with GCC/binutils ensures
> cross-toolchain compatibility and alignment on encoding expectations would
> be ideal (especially if we choose an additional gating flag).

Well, afaict it won't work without an extra flag (command line option or
directive), as presently gas isn't aware of the code mode at all. With
such a flag, the EH aspect looks pretty straightforward to deal with
(assuming it is intended to be sdata8 uniformly, rather than conditional
upon further criteria). Hence also the question above whether this is all
that needs sorting.

Jan

Florian Weimer

unread,
Jan 13, 2026, 6:48:07 AMJan 13
to 'Jan Beulich' via X86-64 System V Application Binary Interface, Farid Zakaria, Jan Beulich
* via:

> On 13.01.2026 05:39, Farid Zakaria wrote:
>> In LLVM for instance, I noticed that although the object format supports
>> multiple byte-encodings (sdata4 and sdata8), it was was always fixed to
>> sdata4.
>
> One question is - are the EH encodings the only thing which needs dealing
> with?

If that's the case, I think the linker should do the translation. I
think it already rewrites the close-related EH frame data anyway.

Or do we need to support individual object files that contain more than
2 GiB of code?

Thanks,
Florian

James Y Knight

unread,
Jan 13, 2026, 1:16:11 PMJan 13
to Farid Zakaria, X86-64 System V Application Binary Interface
On Mon, Jan 12, 2026 at 11:39 PM Farid Zakaria
<farid.m...@gmail.com> wrote:
>
> Hi everyone,
>
> I got recommended by Fangrui Song (MaskRay) to kickstart a conversation hear.
>
> At work, we are starting to produce very large binaries. I have been looking into relocation overflows, largely from LLVM's lld and how to solve them without moving over to the large code-model.

Thaks for sending this out, I'm really happy to see this proposal! At
Google, we had recently also started exploring this exact same
problem, and have internally discussed doing the same things you've
just proposed: range extension thunks for calls, using a large EH
encoding, and supporting multiple GOTs. The aim for us to compile code
under a single code-model, which will achieve close-to-optimal
performance on small binaries, yet allow developers to never run into
an error when the binary ends up large.

> I have been investigating the medium code-model but despite documentation hinting it's meant for binaries whose data may be larger than 2GiB, I have been facing relocation overflows in other sections such as .gcc_except_table.

However, I don't think this makes sense to contemplate as modification
of the "medium" code model. "Medium" is specified to permit only
specific "large data" sections to grow the binary beyond 2GB.
Everything else in the binary must remain within the 2GB limit (both
text, and other data). Only specified user data should be placed in
the "large data" section, not things like eh_frame -- those should all
remain in the small section near the text and still fit within 2GB.

Instead, I believe what we need to do here is make a new variant of
the "large" code model which is actually usable.

As I'm sure Farid is aware, but maybe other readers are not: the large
code model as currently specified has a truly excessive performance
cost, which makes it nearly useless. Most critically, the
function-call sequence is crazy expensive: we emit every single call
as an indirect call. That's basically a non-starter if performance is
at all important. To make matters worse, retrieving the address from
the GOT is _also_ significantly more expensive than the usual "mov"
w/GOTPCREL reloc, since we cannot assume the GOT is within 2GB of the
caller. (See e.g. https://godbolt.org/z/YE3P85PYs for comparisons).

If we add support in linkers to emit range-extension thunks, we can go
back to emitting a simple call/jmp for function calls. This would
solve most of the performance issue already. If we also then support
emitting multiple GOTs, we go back to using GOTPCREL to get the
address of functions or accessing global data. This "new-large" code
model would thus generate code very similar to "small" -- with the
linker doing all the necessary work.

That raises the question: after this work, could we eliminate code
models? I believe the answer is: almost, but not entirely. The main
difference that remains is: under the small code model, a
known-TU-local or known-DSO-local symbol can be accessed directly,
rather than first retrieving the address with a separate mov
sym@GOTPCREL. The linker's GOTPCREL relaxation will transform the mov
instruction into a lea where possible in the final binary -- but that
still results in an extra instruction.

But that is a low enough cost that I think it _would_ be viable to use
"new-large" as a default code model. (That's definitely what I'd be
aiming to do in Google's internal buildsystem)

Separately -- stepping back to the question of CFI emission:

Regardless of any of the discussion of new code models, or code model
modifications, it seems to me that it's just a _bug_ in the existing
"large" code model support that we do not emit 8-byte relocations for
CFI. LLVM's integrated assembler does so today under -mcmodel=large,
but that is only possible because the "is large code model" flag gets
passed internally to the assembler API (kinda breaking an abstraction
barrier). There exists no asm directive, nor any assembler
command-line flag to specify that. So, with `clang -c
-fno-integrated-as -mcmodel=large` or `gcc -c -mcmodel = large`, you
get 32-bit references in the FDE to the function address -- despite
getting 64-bit relocations for the references to the personality
function and LSDA!

Given that every _other_ part of the CFI specifies an encoding
explicitly in the assembler syntax, it feels like the best option
would be to do so for the FDE encoding, too. That is, either a new
directive to set the encoding for the current FDE, like
".cfi_fde_encoding <encoding>", or a new optional parameter to the
start directive, like ".cfi_startproc [<fde-encoding>]".

Rafael Ávila de Espíndola

unread,
Jan 13, 2026, 3:50:18 PMJan 13
to James Y Knight, Farid Zakaria, X86-64 System V Application Binary Interface
> Given that every _other_ part of the CFI specifies an encoding
> explicitly in the assembler syntax, it feels like the best option
> would be to do so for the FDE encoding, too. That is, either a new
> directive to set the encoding for the current FDE, like
> ".cfi_fde_encoding <encoding>", or a new optional parameter to the
> start directive, like ".cfi_startproc [<fde-encoding>]".

It has been ages since I worked on this, but I agree: One really nice
thing about CFI is declaring what is needed directly at the
source. Anything that depends on assembler flags feels like a hack in
comparison.

As for having the linker rewrite it, I still hope to have multiple
.eh_frame sections in COMDATS one days, so I would prefer to have the
linker do less, not more, in this area.

Thanks,
Rafael

Farid Zakaria

unread,
Jan 13, 2026, 7:34:29 PMJan 13
to X86-64 System V Application Binary Interface
Thank you everyone for the thorough responses.
I kind of buried a little too much in the original posting but I enjoyed reading all of the responses :P


> Regardless of any of the discussion of new code models, or code model
modifications, it seems to me that it's just a _bug_ in the existing
"large" code model support that we do not emit 8-byte relocations for
CFI. 

Yes -- I hope that to be an easy accepted fix (https://github.com/llvm/llvm-project/pull/174486).


> Only specified user data should be placed in
the "large data" section, not things like eh_frame -- those should all
remain in the small section near the text and still fit within 2GB.

Our .text sections are near,  as a result we hit the limit since the distance with the .text and the .gcc_except_table itself can then surpass 2GiB making sdata8 useful.
Seems lik a bug to fail relocation overflow at that point, if the rest is within 2GiB and that data-structure supports sdata8.

Since I packed a lot in the original posting, to prioritize , I was hoping the first discussion item is whether we can add a flag to move sdata4 to sdata8 for medium code-model for the sections that support variable encodings.


Arthur Eubanks

unread,
Jan 16, 2026, 5:13:38 PMJan 16
to X86-64 System V Application Binary Interface
Regarding sdata4 to sdata8, aside from verifying ecosystem support for it (hopefully things already work), it would be nice to show some binary size numbers. If it's negligible then I'm in favor.


Back to the proposed medium/large code model changes, we often have precompiled small code model libraries that don't use -mlarge-data-threshold=0 and have PC32 relocations from text to data. We'd like to link those into binaries that can get very large, and mixing the precompiled text with our compiled-from-source code can cause relocation overflows. I'd propose putting the large code model text into a large text section .ltext which doesn't contribute to relocation pressure. lld just recently got support for placing .ltext outside of the small sections. Recent clang versions already put text built with the large code model in .ltext, but gcc does not: https://godbolt.org/z/K74nbjjcv.

This wouldn't be necessary if everyone were happy with the code model James proposes with multiple GOTs, using relaxable GOT sequences for getting address of known-DSO-local variables, and range extension thunks, but the slight performance hit mentioned makes me think we won't be able to align everyone on this new code model especially since most people don't run into huge binaries.

In the case where people don't want to support multiple GOTs, we could also have a large code model variant that uses range extension thunks but still uses the current large code model instruction sequence for accessing globals. It's pretty gnarly, but function call performance is typically much more important than global data access performance and the performance may be somewhat tolerable...

Farid Zakaria

unread,
Jan 21, 2026, 2:00:32 PM (11 days ago) Jan 21
to X86-64 System V Application Binary Interface
I'll try to pull some. As with anything, I suggested a flag to control it -- so I think that makes it "safe" to put forward?

We have the same small-code problem that will be the long-tail such as precompiled MKL library -- or anything that is hardcoded assembly.

I'm open to new code-model "largeish" or just a few additional knobs to include with the medium-code model.
At Meta, I am trying to spin-up a working group to try and tackle this again.

Are there others interested? We seem to have similar interests here.
I could spin-up a separate thread or start a discussion meeting.

Also,
I finally got Google Groups access for my work account (@meta.com) -- so I might post soon from there.
Reply all
Reply to author
Forward
0 new messages