Hello,
On Tue, 3 Mar 2026, 'Farid Zakaria' via X86-64 System V Application Binary Interface wrote:
> All very good insights!
> I was unfortunately misinformed that %r10 could also be a scratch register
> :(
> Thank you for pointing that out Michael.
>
> I liked the suggestion you offered:
> .align 8
> 1: .quad target - 1b
> thunk:
> lea 1b(%rip), %r11
> addq (%r11), %r11
> jmpq *%r11
>
> If we want to use dynamic relocations I think we can also use a non-PIC
> approach.
> thunk:
> movabsq $target, %r11 # Load the absolute 64-bit address directly into
> r11
> jmpq *%r11 # Jump to it
I think it would be good to avoid dynamic relocs for what you want.
_Especially_ if the places to relocate are not grouped in a table like a
GOT. Your movabsq thunk for instance intersperses places to reloc with
instruction bytes and hance wastes space in unshared CoW pages (not much,
of course). See also below about some more things related to dynamic
relocs.
Actually I think the specific layout and contents of these thunks doesn't
really need specification in the psABI, what matters is that they can be
jumped to and then magically transfer control to the wanted destination
with clobbering only r11. The specific instruction sequence doesn't
matter psABI-wise. Even the existence of these thunks is borderline
implementation detail. Obviously it's good to discuss the details in
this forum, but not all of those then need to go into the document, or
only into an informative section.
Into the psABI need to go only the things that matter for
interoperability. E.g. if we need new ELF things: new flag constants, new
tags, new section types, new relocations, etc. Right now it looks like
you can extend the linker such that it takes .o files produced by current
compilers/assemblers, combine them into a final ELF file that uses
the thunks and that's loadable by current ld.so's. Even when they need
text segments larger than 2GB. That's without dynamic relocs and if
that's correct it's an indication that no normative extensions to the
psABI are needed.
When they need dynamic relocs we may need to think how that interworks
with current practice of having the read-exec segment not be writable
(outside TEXTREL, boo!). We then would have multiple disjoint ranges that
are writable during relocation processing, and then therefore a need to
somehow mark them, or handle them with (perhaps multiple) GNU_RELRO
segments or suchlike. Either way we'd need to do _something_ for
interoperability, and that would need specifying. I'd prefer to not have
to think about that :-)
> > Do the thunks need names in the first place? Duplication can result not
> only
> > from STT_SECTION symbols, but also from other STB_LOCAL ones afaict.
>
> I don't think they need names as the ABI but having them definitely helps
> with testing and validation.
Similar to names for PLT slots (those can even be synthesized on the fly
by disassemblers). No psABI material but helpful.
> > ... like this theoretical upper bound there can also be as many CALLs/JMPs
> > in a 1Gb region. To me, setting a fixed partition size looks inflexible.
> > By
> > using as big a region as possible, the number of thunks needed my reduce.
>
> I am open to alternate layouts. I largely plan to take inspiration from
> what ARM has already done in lld but I don't think it needs to be cemented
> at all in the ABI.
I think concrete sizes and layouts of, and strategies how to create such
partitions are no psABI material, they are implementation details.
Ciao,
Michael.