On Mon, Jan 12, 2026 at 11:39 PM Farid Zakaria
<
farid.m...@gmail.com> wrote:
>
> Hi everyone,
>
> I got recommended by Fangrui Song (MaskRay) to kickstart a conversation hear.
>
> At work, we are starting to produce very large binaries. I have been looking into relocation overflows, largely from LLVM's lld and how to solve them without moving over to the large code-model.
Thaks for sending this out, I'm really happy to see this proposal! At
Google, we had recently also started exploring this exact same
problem, and have internally discussed doing the same things you've
just proposed: range extension thunks for calls, using a large EH
encoding, and supporting multiple GOTs. The aim for us to compile code
under a single code-model, which will achieve close-to-optimal
performance on small binaries, yet allow developers to never run into
an error when the binary ends up large.
> I have been investigating the medium code-model but despite documentation hinting it's meant for binaries whose data may be larger than 2GiB, I have been facing relocation overflows in other sections such as .gcc_except_table.
However, I don't think this makes sense to contemplate as modification
of the "medium" code model. "Medium" is specified to permit only
specific "large data" sections to grow the binary beyond 2GB.
Everything else in the binary must remain within the 2GB limit (both
text, and other data). Only specified user data should be placed in
the "large data" section, not things like eh_frame -- those should all
remain in the small section near the text and still fit within 2GB.
Instead, I believe what we need to do here is make a new variant of
the "large" code model which is actually usable.
As I'm sure Farid is aware, but maybe other readers are not: the large
code model as currently specified has a truly excessive performance
cost, which makes it nearly useless. Most critically, the
function-call sequence is crazy expensive: we emit every single call
as an indirect call. That's basically a non-starter if performance is
at all important. To make matters worse, retrieving the address from
the GOT is _also_ significantly more expensive than the usual "mov"
w/GOTPCREL reloc, since we cannot assume the GOT is within 2GB of the
caller. (See e.g.
https://godbolt.org/z/YE3P85PYs for comparisons).
If we add support in linkers to emit range-extension thunks, we can go
back to emitting a simple call/jmp for function calls. This would
solve most of the performance issue already. If we also then support
emitting multiple GOTs, we go back to using GOTPCREL to get the
address of functions or accessing global data. This "new-large" code
model would thus generate code very similar to "small" -- with the
linker doing all the necessary work.
That raises the question: after this work, could we eliminate code
models? I believe the answer is: almost, but not entirely. The main
difference that remains is: under the small code model, a
known-TU-local or known-DSO-local symbol can be accessed directly,
rather than first retrieving the address with a separate mov
sym@GOTPCREL. The linker's GOTPCREL relaxation will transform the mov
instruction into a lea where possible in the final binary -- but that
still results in an extra instruction.
But that is a low enough cost that I think it _would_ be viable to use
"new-large" as a default code model. (That's definitely what I'd be
aiming to do in Google's internal buildsystem)
Separately -- stepping back to the question of CFI emission:
Regardless of any of the discussion of new code models, or code model
modifications, it seems to me that it's just a _bug_ in the existing
"large" code model support that we do not emit 8-byte relocations for
CFI. LLVM's integrated assembler does so today under -mcmodel=large,
but that is only possible because the "is large code model" flag gets
passed internally to the assembler API (kinda breaking an abstraction
barrier). There exists no asm directive, nor any assembler
command-line flag to specify that. So, with `clang -c
-fno-integrated-as -mcmodel=large` or `gcc -c -mcmodel = large`, you
get 32-bit references in the FDE to the function address -- despite
getting 64-bit relocations for the references to the personality
function and LSDA!
Given that every _other_ part of the CFI specifies an encoding
explicitly in the assembler syntax, it feels like the best option
would be to do so for the FDE encoding, too. That is, either a new
directive to set the encoding for the current FDE, like
".cfi_fde_encoding <encoding>", or a new optional parameter to the
start directive, like ".cfi_startproc [<fde-encoding>]".