--
You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/20240322042904.ck7sn5ndig4ep5kh%40google.com.
* via Generic System V. Application Binary Interface:
> Its content begins with a ULEB128-encoded value `count * 4 + shift`
> (34-bit or 66-bit unsigned), where:
I want to mention again that ULEB128 is really terrible to decode on
many CPUs.
It would probably be best if we could avoid any form of unary encoding
(so that decoders don't need a CLZ-like instructions).
> * Delta offset and flags (ULEB128): Holds `delta_offset * 8 + flags` (35-bit or 67-bit unsigned), where:
> + `delta_offset`: Difference in `r_offset` from the previous entry (`Elf_Addr`), right shifted by `shift
> `.
> + `flags & 1`: Indicate if delta symbol index is present.
> + `flags & 2`: Indicate if delta type is present.
> + `flags & 4`: Indicate if delta addend is present.
> * Delta symbol index (SLEB128, if present): The difference in symbol index from the previous entry
> (32-bit signed).
> * Delta type (SLEB128, if present): The difference in relocation type from the previous entry (32-bit
> signed).
32-bit signed does not permit encoding sequences like 0x0, 0xffff_ffff
or 0xffff_ffff, 0x0. Is this your intent?
I think this also applies to the initial 35-bit or 67-bit estimate
because the delta_offset is affected by this as well.
The delta-based encoding of relocation types works on x86-64, but it's
really not ideal on AArch64 because there, R_AARCH64_ABS64 (257) and
AARCH64_GLOB_DAT (1025) are common, and their difference is greater than
127. Maybe this could be fixed by introducing an alternative value for
R_AARCH64_ABS64 that is greater than 1024.
On the other hand, this encoding needs two bits for encoding the
*length* of the type delta, when in practice, barely more than one bit
is needed to encode the entire type field. That seems quite wasteful.
Thanks,
Florian
Many dynamic relocations have zero addends:
Usually only RELATIVE/IRELATIVE and potentially TPREL/TPOFF might require non-zero addends. Switching from DT_RELA to DT_REL offers a minor size advantage.
I considered defining two separate dynamic tags (DT_CREL and DT_CRELA) to distinguish between implicit and explicit addends. However, this would have introduced complexity:
I placed the delta addend bit next to offset bits so that it can be reused for offsets. Thanks to Stefan O'Rear's for making me believe that my original thought of reserving a single bit flag (addend_bit) within the CREL header is elegant. Dynamic loaders prioritizing simplicity can hardcode the desired addend_bit value.
ld.lld -z crel defaults to implicit addends (addend_bit==0), but the option of using in-relocation addends is available with -z crel -z rela.
DT_AARCH64_AUTH_RELR vs CREL
The AArch64 PAuth ABI introduces DT_AARCH64_AUTH_RELR as a variant of RELR for signed relocations. However, its benefit seems limited.
In a release build of Clang 16, using -z crel -z rela resulted in a .crel.dyn section size of only 1.0% of the file size. Notably, enabling implicit addends with -z crel -z rel further reduced the size to just 0.3%. While DT_AARCH64_AUTH_RELR will achieve a noticeable smaller relocation size if most relative relocations are encoded with it, the advantage seems less significant considering CREL's already compact size.
Furthermore, DT_AARCH64_AUTH_RLEL introduces additional complexity to the linker due to its 32-bit addend limitation: the in-place 64 value encodes a 32-bit schema, giving just 32 bits to the implicit addend. If the addend does not fit into 32 bits, DT_AARCH64_AUTH_RELR cannot be used. CREL with addends would avoid this complexity.
I have filed Quantifying the benefits of DT_AARCH64_AUTH_RELR.
Hello,
On Mon, 1 Apr 2024, 'Fangrui Song' via Generic System V Application Binary Interface wrote:
> Regarding decoder complexity, the decoder can be implemented in just a
> few lines when decode[US]LEB128 are available:
How to decode/encode things isn't the only dimension of complexity:
* the RELA format can be mmaped and accessed (and modified!) in place. CREL
needs decoding and encoding.
* REL(A) can be random-accessed, CREL cannot.
I.e. the very need for decode/encode itself is already complexity.
I have a fancy for optimal encoding of stuff myself, and had I been around
when ELF was designed I certainly would have had something to say about
the RELA format. I probably would have made the addend optional per
reloc, and found better encoding of type and symbol. _Without_ giving up
on random access and mmap-ability. (Ending up in something not quite as
small as CREL but still reasonable small)
I needed to resist the temptation to suggest improvements for CREL here
and there :-) Because: yes, whenever I look at ELF files and see section
sizes of REL(A) sections I go "meh, what a design". OTOH, I'm not
often worried about sizes of .o files. (I do worry sometimes, for the
reasons you already stated, and for instance because LTO uses temporary .o
files and the way we're doing it with GCC is partly influenced by the wish
to not need too large tmp space for those, and smaller RELs would help a
little with that).
So, it remains a tradeoff, and I haven't convinced myself (yet?) that
another relocation format is really the best thing in the grand scheme of
things. Or that CREL, as it's proposed, is really the best solution if a
new reloc format should be introduced.
(And yes: the timing for CREL is really bad, only a couple moons after
RELR :-( )
So, hmm, I don't know...
Ciao,
Michael.
--
You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/bd926fa3-7f3e-4a04-b551-aaf0fe7430dcn%40googlegroups.com.
I wonder if it might be a good idea to restrict this proposal to only relocatable object files and not executable files.
The requirements are somewhat different both in terms of expected processing, and in terms of expected sorts of relocations needed.
E.g. Apple has come up with an interesting scheme for MachO to encode (at least most) relocations in a "chained fixups" encoding for executable binaries. This encoding makes it easy to apply a single page's relocations on demand during page-in from backing file, inside the kernel. I'm not familiar with all the details of the encoding, but, relocations are indexed by page offset within the segment to make it easy to find all of the relocations for a given page.The purpose is to reduce startup time/memory overhead, by avoiding the need to apply relocations to unused parts of the program, AND allows pages to be dropped under memory pressure without having to write them out to swap, as they don't get marked dirty.I'm not suggesting anyone implement this sort of scheme for ELF now, but, maybe it'd be good to avoid the disruption of an executable-file relocation format change unless it could also enable this sort of optimization?For the relocatable object file case (which is where it seems like CREL is also of the most benefit?), a page-indexed encoding would be useless, so the above concern is irrelevant if CREL is proposed only for relocatable object files.