RELLEB: A compact relocation format

154 views
Skip to first unread message

Fangrui Song

unread,
Mar 10, 2024, 2:49:45 AM (6 days ago) Mar 10
to Generic System V Application Binary Interface
Hi, I'd like to propose a LEB128-based relocation format to replace REL and RELA formats.

The natural size guideline for control structures presents significant drawbacks for relocations.
Since relocations are typically processed sequentially, they don't gain the same random-access advantages.
The large 24-byte Elf64_Rela structure highlights the drawback.

I've written a detailed analysis of the size problem and my solution at https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf ,
and developed a LLVM/Clang/lld prototype at https://github.com/MaskRay/llvm-project/tree/demo-relleb (clang -mrelleb, llvm-readelf -r, ld.lld)

Below, you'll find an overview of the size issue, followed by excerpts from my blog post that detail the "RELLEB relocation format" and my "RELLEB proposal for the generic ABI."

---

In contrast, the WebAssembly object file format uses LEB128 encoding for relocations and other constrol structures, offering a significant size advantage over ELF.

In my Clang+lld prototype,
The total relocation section size has decreased from 28768176 to 5318839, 18.4% of the original size.
The .o size has decreased by 20.5%.

Applying zstd (SHF_COMPRESSED) would reduce SHT_RELLEB section size by 26.4%.
The overall decrease of only 1.088% in `.o` file sizes indicates that RELLEB offers a reasonable compromise.

---

## Use cases

* Dynamic relocations
* Marker relocations (R_RISCV_RELAX, etc)
* `.llvm_addrsig`
* `.llvm.call-graph-profile`
* `.debug_names`

Refer to https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf#use-cases

---

## RELLEB relocation format

The 1990 ELF paper _ELF: An Object File to Mitigate Mischievous Misoneism_ says "ELF allows extension and redefinition for other control structures."
Inspired by WebAssembly, let's explore RELLEB, a new and more compact relocation format designed to replace RELA.
Our emphasis is on simplicity over absolute minimal encoding. See the end of the article for a detailed format description.

A `SHT_RELLEB` section (preferred name: `.relleb<name>`) holds compact relocation entries that decode to `Elf32_Rela` or `Elf64_Rela` depending on the object file class (32-bit or 64-bit).
Its content begins with a ULEB128-encoded relocation count, followed by entries encoding `r_offset`, `r_type`, `r_symidx`, and `r_addend`.

Here are key design choices:

*Relocation count (ULEB128)*:

This allows for efficient retrieval of the relocation count without decoding the entire section.
While a `uint32_t` (like [`SHT_HASH`](https://www.sco.com/developers/gabi/latest/ch5.dynamic.html#hash)) could be used, ULEB128 aligns with subsequent entries, removes endianness differences, and offers a slight size advantage in most cases when the number of symbols can be encoded in one to three bytes.

*Delta encoding for `r_offset` (ULEB128)*:

Section offsets can be large, and relocations are typically ordered. Storing the difference between consecutive offsets offers compression potential.
In most cases, a single byte will suffice.
While there are exceptions (general dynamic TLS model of s390/s390x uses a local "out-of-order" pair: `R_390_PLT32DBL(offset=o) R_390_TLS_GDCALL(offset=o-2)`), we are optimizing for the common case.
Switching to SLEB128 would increase the total `.o` size by 0.1%.

For ELFCLASS32, `r_offsets` members are calculated using modular arithmetic modulo 4294967296.

*Delta encoding for `r_type` (SLEB128)*:

Some psABIs utilize relocation types greater than 128.
AArch64's static relocation types begin at 257 and dynamic relocation types begin at 1024, necessitating two bytes with ULEB128/SLEB128 encoding in the absence of delta encoding.
Delta encoding allows all but the first relocation's type to be encoded in a single byte.
An alternative design is to define a base type in the header and encode types relative to the base type, which would introduce slight complexity.

If the AArch32 psABI could be redesigned, allocating `[0,64)` for Thumb relocation types and `[64,*)` for ARM relocation types would optimize delta encoding even further.

While sharing a single type code for multiple relocations would be efficient, it would require reordering relocations.
This conflicts with order requirements imposed by several psABIs and could complicate linker implementations.

*Symbol index and addend presence (SLEB128)*:

ULEB128 optimizes for the common case when the symbol index is encodable in one or two bytes.
Using SLEB128 and delta encoding instead of ULEB128 for the symbol index field would increase the total size by 0.4%.
Potential gains for dynamic relocations with consecutive indexes are outweighed by the loss in static relocations and added complexity, hence avoiding delta encoding.
We indicate addend omission using the sign bit (see below).

*Delta encoding for addend (SLEB128)*:

Consecutive static relocations often have identical addends, especially on architectures with frequent zero addends (AArch64, PowerPC, RISC-V, etc).
Addend omission optimizes this scenario.
Additionally, it benefits architectures like AArch32 and x86, which often have identical negative addends (call instructions) for consecutive relocations.

```c
// Many R_X86_64_PLT32(g-4)
void f() { g(); g(); g(); ... }
```

Dynamic relocations (except `R_*_RELATIVE` and `R_*_IRELATIVE`) typically have zero addends, also benefiting from our scheme.

The sign bit of the previous member flags whether the addend has changed relative to the prior entry.
When the addend changes, we use an addend delta. This offers a slight size advantage (about 0.20%) and optimizes for cases like:

```asm
.quad .data + 0x78
.quad .data + 0x80
.quad .data + 0x88
...
```

Note: when the bitwise NOT code path is taken, the zero delta addend is not utilized.

### RELLEB for dynamic relocations

Redacted. My blog post presents a comparison with Android packed relocations.


## RELLEB proposal for the generic ABI

In <https://www.sco.com/developers/gabi/latest/ch4.sheader.html>, make the following changes.

_In Figure 4-9: Section Types,sh_type, append a row_

`SHT_RELLEB` | 20

_Add text:_

SHT_RELLEB - The section holds compact relocation entries with explicit addends. An object file may have multiple relocation sections. ''Relocation'' below for details.

_In Figure 4-16: Special Sections, append a row_

`.rellebname` | `SHT_RELLEB` | see below

_Change the text below:_

.relname, .relaname, and .rellebname

These sections hold relocation information, as described in ''Relocation''. If the file has a loadable segment that includes relocation, the sections' attributes will include the SHF_ALLOC bit; otherwise, that bit will be off. Conventionally, name is supplied by the section to which the relocations apply. Thus a relocation section for .text normally would have the name .rel.text, .rela.text, or .relleb.text.

_In Figure 4-23: Relocation Entries, add:_

```c
typedef struct {
  Elf32_Addr r_offset;
  Elf32_Word r_type;
  Elf32_Word r_symidx;
  Elf32_Sxword r_addend;
} Elf32_Relleb;

typedef struct {
  Elf64_Addr r_offset;
  Elf64_Word r_type;
  Elf64_Word r_symidx;
  Elf64_Sxword r_addend;
} Elf64_Relleb;
```

_Add text above "A relocation section references two other sections":_

A `SHT_RELLEB` section holds compact relocation entries that decode to `Elf32_Relr` or `Elf64_Relr` depending on the object file class (32-bit or 64-bit).
Its content begins with a ULEB128-encoded relocation count, followed by entries encoding `r_offset`, `r_type`, `r_symidx`, and `r_addend`.
Note that the `r_info` member in traditional REL/RELA formats has been split into separate `r_type` and `r_symidx` members, allowing `uint32_t` relocation types for ELFCLASS32 as well.

* Delta offset: (ULEB128-encoded) The difference in `r_offset` relative to the previous entry. The difference is truncated to 32-bit/64-bit for ELFCLASS32/ELFCLASS64, respectively.
* Delta type: (SLEB128-encoded) The difference in relocation type relative to the previous entry.
* Symbol table index and addend presence: (SLEB128-encoded) If the `r_offset` is equal to the previous `r_addend`, the encoded value represents the symbol index; otherwise, the bitwise NOT of the encoded value indicates the symbol index.
* Delta addend: (SLEB128-encoded, only if indicated in the previous member) The difference in `r_addend` relative to the previous entry. The difference is truncated to 32-bit/64-bit for ELFCLASS32/ELFCLASS64, respectively.

Example C++ encoder for `Elf64_Relleb`:

```cpp
  uint64_t offset = 0, addend = 0;
  uint32_t type = 0;
  encodeULEB128(relocs.size(), os);
  for (const Reloc &rel : relocs) {
    encodeULEB128(rel.offset - offset, os);
    encodeSLEB128(static_cast<int32_t>(rel.type - type), os);
    type = rel.type;
    if (addend == rel.addend) {
      encodeSLEB128(rel.symidx, os);
    } else {
      encodeSLEB128(~symidx, os);
      encodeSLEB128(rel.addend - addend, os);
      addend = rel.addend;
    }
  }
```

For the first relocation entry, the previous offset, type, and addend members are treated as zero.

_In Figure 5-10: Dynamic Array Tags, d_tag, add:_

`DT_RELLEB` | 38 | `d_ptr` | optional | optional

_Add text below:_

* `DT_RELLEB` - This element is similar to `DT_RELA`, except its table uses the RELLEB format.

Florian Weimer

unread,
Mar 11, 2024, 9:42:23 AM (5 days ago) Mar 11
to 'Fangrui Song' via Generic System V Application Binary Interface
* via Generic System V. Application Binary Interface:

> Hi, I'd like to propose a LEB128-based relocation format to replace
> REL and RELA formats.

Do you intend this for run-time relocations as well?

The leb128 encoding is comparatively hard to decode on many CPUs because
they lack a bit gather (parallel bit extract) instruction. It would be
easier to decode if the unary length count is in the first byte
(although that limits the overall magnitude of values, which shouldn't
matter for this application).

Thanks,
Florian

Rui Ueyama

unread,
Mar 11, 2024, 2:08:59 PM (5 days ago) Mar 11
to Generic System V Application Binary Interface
The proposal overall seems reasonable to me, but I'm more interested in how we can transition from the existing relocation format to the new one. Historically, we have a lot of proposals that are good in theory which, however, have failed to be a viable option due to the lack of a migration strategy. I think we need to answer the following questions:

1. How do we make the older linkers print out user-friendly error messages for object files containing only .relleb relocation tables? I believe if we don't do anything, .relleb sections would just be ignored, and such object files would be treated as if they don't contain any relocations, resulting in broken output files.

2. How do we make the new relocation format the standard one for object files in the long term? If there's no migration strategy, people would conservatively stick with the existing format for the sake of compatibility, and the efforts to make the new relocation format usable would become less effective.

I don't know the answer for (1), but for (2), I have an idea to make it happen. Even today, psABIs are gradually evolving, and it is already required to use a newer linker to use new features such as TLSDESC or x86's GOT32X relocations. We can bundle .relleb with such new features -- i.e., when a feature that needs recent versions of the linker is requested, we can safely enable .relleb. For example, we can bundle .relleb with RISC-V's TLSDESC. Or maybe with new relocations for Intel's new instruction format, APX. This bundling strategy might be able to provide a gradual migration path.

Fangrui Song

unread,
Mar 11, 2024, 7:55:23 PM (5 days ago) Mar 11
to gener...@googlegroups.com
On 2024-03-11, Florian Weimer wrote:
>* via Generic System V. Application Binary Interface:
>
>> Hi, I'd like to propose a LEB128-based relocation format to replace
>> REL and RELA formats.
>
>Do you intend this for run-time relocations as well?
>
>The leb128 encoding is comparatively hard to decode on many CPUs because
>they lack a bit gather (parallel bit extract) instruction. It would be
>easier to decode if the unary length count is in the first byte
>(although that limits the overall magnitude of values, which shouldn't
>matter for this application).

I have added more description to https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf#relleb-for-dynamic-relocations

RELLEB is not efficient for dynamic relocations, as it does not utilize two properties:

* There are much fewer relocation types.
* The offsets are often adjacent by Elf_Addr. No two dynamic relocations can share the same offset.

Nevertheless, RELLEB improves the non-relative portion, achieving a decent 16.3% reduction.
However, it's overshadowed by Android packed relocations (6.3%).

My blog post proposes a generalized RELR encoding that looks like:

```
// R_*_GLOB_DAT/absolute address relocation group
Encode(length of the group)
Encode(R_*_GLOB_DAT)
Use RELR to encode offsets
Encode symbol indexes separately

// R_*_JUMP_SLOT group
Encode(length of the group)
Encode(R_*_JUMP_SLOT)
Use RELR to encode offsets
Encode symbol indexes separately

...
```

However, relative relocations usually outnumber non-relative relocations.
Perhaps non-relative dynamic relocations are not worth compacting using an
optimal format?

On 2024-03-11, Rui Ueyama wrote:
>The proposal overall seems reasonable to me, but I'm more interested in how
>we can transition from the existing relocation format to the new one.
>Historically, we have a lot of proposals that are good in theory which,
>however, have failed to be a viable option due to the lack of a migration
>strategy. I think we need to answer the following questions:
>
>1. How do we make the older linkers print out user-friendly error messages
>for object files containing only .relleb relocation tables? I believe if we
>don't do anything, .relleb sections would just be ignored, and such object
>files would be treated as if they don't contain any relocations, resulting
>in broken output files.

Thanks for mentioning this.

GNU ld allows certain unknown section types:

* [SHT_LOUSER,SHT_HIUSER] and non-SHF_ALLOC
* [SHT_LOOS,SHT_HIOS] and non-SHF_OS_NONCONFORMING

but reports errors and stops linking for others (unless --no-warn-mismatch is
specified). When linking a relocatable file using SHT_RELLEB, you might
encounter errors like the following:

% clang -mrelleb -fuse-ld=bfd a.c b.c
/usr/bin/ld.bfd: unknown architecture of input file `/tmp/a-1e0778.o' is incompatible with i386:x86-64 output
/usr/bin/ld.bfd: unknown architecture of input file `/tmp/b-9963f0.o' is incompatible with i386:x86-64 output
/usr/bin/ld.bfd: error in /tmp/a-1e0778.o(.eh_frame); no .eh_frame_hdr table will be created
/usr/bin/ld.bfd: error in /tmp/b-9963f0.o(.eh_frame); no .eh_frame_hdr table will be created
clang: error: linker command failed with exit code 1 (use -v to see invocation)

gold, lld, and mold do not report errors. I have filed:

https://github.com/llvm/llvm-project/issues/84812
https://github.com/rui314/mold/issues/1215

In addition, when there is one .eh_frame section with CIE pieces but no
relocation, _bfd_elf_parse_eh_frame will report an error.

>2. How do we make the new relocation format the standard one for object
>files in the long term? If there's no migration strategy, people would
>conservatively stick with the existing format for the sake of
>compatibility, and the efforts to make the new relocation format usable
>would become less effective.
>
>I don't know the answer for (1), but for (2), I have an idea to make it
>happen. Even today, psABIs are gradually evolving, and it is already
>required to use a newer linker to use new features such as TLSDESC or x86's
>GOT32X relocations. We can bundle .relleb with such new features -- i.e.,
>when a feature that needs recent versions of the linker is requested, we
>can safely enable .relleb. For example, we can bundle .relleb with RISC-V's
>TLSDESC. Or maybe with new relocations for Intel's new instruction format,
>APX. This bundling strategy might be able to provide a gradual migration
>path.

For both existing and new architectures, we toolchain maintainers can encourage
them to implement the new relocation format. We try to implement the new
relocation format for important components in the ecosystem so that new
architectures are motivated to start with RELLEB.

We probably would never force any existing architectures to drop REL/RELA.
REL/RELA will never go away anyway.
>--
>You received this message because you are subscribed to a topic in the Google Groups "Generic System V Application Binary Interface" group.
>To unsubscribe from this topic, visit https://groups.google.com/d/topic/generic-abi/yb0rjw56ORw/unsubscribe.
>To unsubscribe from this group and all its topics, send an email to generic-abi...@googlegroups.com.
>To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/6995749a-1c3b-47fc-8df5-1716aec0224bn%40googlegroups.com.

727785714

unread,
Mar 11, 2024, 8:08:34 PM (4 days ago) Mar 11
to Rui Ueyama
您好,您的来件我已收到,我会尽快处理!
     谢谢!
 

727785714

unread,
Mar 11, 2024, 8:10:26 PM (4 days ago) Mar 11
to Fangrui Song' via Generic System V Application Binary Interface <>
您好,您的来件我已收到,我会尽快处理!
     谢谢!
 

Rui Ueyama

unread,
Mar 12, 2024, 4:28:18 AM (4 days ago) Mar 12
to Generic System V Application Binary Interface
I didn't suggest we would drop REL/RELA. What I wanted to say is that we need a strategy to enable gradual adoption of the new feature, instead of forcing toolchain maintainers to make a hard decision to do so.

If we do not have a strategy, chances are the new feature would be adopted only within a big organization/company where a developer team has complete control over their toolchains, and the world outside of it would never adopt it because of the fear of breaking backward compatibility.

So, my suggestion is to mandating linker developers support .relleb before supporting future generic/psABI extensions, and encourage toolchain developers to enable .relleb by default if other future generic/psABI extensions are explicitly enabled by the user. This way, we can ensure gradual adoption of .relleb even in an environment where no one controls the toolchain choices.

727785714

unread,
Mar 12, 2024, 4:28:33 AM (4 days ago) Mar 12
to Rui Ueyama
您好,您的来件我已收到,我会尽快处理!
     谢谢!
 

Fangrui Song

unread,
Mar 14, 2024, 9:50:24 PM (2 days ago) Mar 14
to gener...@googlegroups.com
On Mon, Mar 11, 2024 at 8:28 PM Rui Ueyama <rui...@gmail.com> wrote:
>
> I didn't suggest we would drop REL/RELA. What I wanted to say is that we need a strategy to enable gradual adoption of the new feature, instead of forcing toolchain maintainers to make a hard decision to do so.
>
> If we do not have a strategy, chances are the new feature would be adopted only within a big organization/company where a developer team has complete control over their toolchains, and the world outside of it would never adopt it because of the fear of breaking backward compatibility.
>
> So, my suggestion is to mandating linker developers support .relleb before supporting future generic/psABI extensions, and encourage toolchain developers to enable .relleb by default if other future generic/psABI extensions are explicitly enabled by the user. This way, we can ensure gradual adoption of .relleb even in an environment where no one controls the toolchain choices.

When a psABI adds certain drastic changes, we can strongly encourage
someone to implement the new relocation format first:)
For instance, https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/423
proposes vendor-specific relocations
and the Elf32_Rel/Elf32_Rela relocation type constraint has become
more noticeable.

-----

The sign bit of symidx indicated addend omission.
I have changed the design and
https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf
to omit both type and addend.

* Delta offset: (ULEB128-encoded) The difference in r_offset relative
to the previous entry, represented as a 32-bit or 64-bit unsigned
integer for ELFCLASS32/ELFCLASS64, respectively.
* Symbol table index and type/addend presence: (SLEB128-encoded) If
the current type and addend are equal to the previous type and addend,
the encoded value represents the symbol index and the following two
members are omitted; otherwise, the bitwise NOT of the encoded value
(a 64-bit signed integer) indicates the symbol index and the following
two members are present. For example, the bitwise NOT of symbol index
0xffffffff is -0x100000000 (64-bit) instead of 0 (32-bit).
* Delta type: (SLEB128-encoded, only if not omitted) The difference in
relocation type relative to the previous entry, represented as a
32-bit signed integer.
* Delta addend: (SLEB128-encoded, only if not omitted) The difference
in r_addend relative to the previous entry, represented as a 32-bit or
64-bit signed integer for ELFCLASS32/ELFCLASS64, respectively.

Example C++ encoder:

```
// encodeULEB128(uint64_t, raw_ostream &os);
// encodeSLEB128(int64_t, raw_ostream &os);

Elf_Addr offset = 0, addend = 0;
uint32_t type = 0;
encodeULEB128(relocs.size(), os);
for (const Reloc &rel : relocs) {
encodeULEB128(rel.offset - offset, os);
int64_t symidx = rel.symidx;
if (type == rel.type && addend == rel.addend) {
encodeSLEB128(symidx, os);
} else {
encodeSLEB128(~symidx, os);
encodeSLEB128(static_cast<int32_t>(rel.type - type), os);
type = rel.type;
encodeSLEB128(std::make_signed_t<Elf_Addr>(rel.addend - addend), os);
addend = rel.addend;
}
}
```

Now, the new encoding works for static relocations and non-relative
dynamic relocations better.
(RISC-V static relocations might be less optimized now, because its
adjacent relocations often changes types, rendering type/addend
omission to happen less.)

RELLEB compresses the non-relative dynamic relocations down to 9.7%
(0x00239d / 0x016e90).
Android packed relocations compresses the portion down to 6.3%.
Android's advantage is due to RELOCATION_GROUPED_BY_INFO_FLAG sharing
r_info.

% llvm-readelf -S clang | grep ' \.rel.*\.'
[ 8] .rela.dyn RELA 0000000000f28998 f28998
e177d0 18 A 3 0 8
[ 9] .rela.plt RELA 0000000001d40168 1d40168
0025b0 18 AI 3 26 8
% llvm-readelf -S clang.relr | grep ' \.rel.*\.'
[ 8] .rela.dyn RELA 0000000000f28998 f28998
016e90 18 A 3 0 8
[ 9] .relr.dyn RELR 0000000000f3f828 f3f828
02ccd0 08 A 0 0 8
[10] .rela.plt RELA 0000000000f6c4f8 f6c4f8
0025b0 18 AI 3 27 8
% llvm-readelf -S clang.relr+android | grep ' \.rel.*\.'
[ 8] .rela.dyn ANDROID_RELA 0000000000f28998 f28998
001707 01 A 3 0 8
[ 9] .relr.dyn RELR 0000000000f2a0a0 f2a0a0
02ccd0 08 A 0 0 8
[10] .rela.plt RELA 0000000000f56d70 f56d70
0025b0 18 AI 3 27 8
% llvm-readelf -S clang.relr+rellebd | grep ' \.rel.*\.'
[ 8] .relleb.dyn RELLEB 0000000000f28998 f28998
00239d 00 A 3 0 8
[ 9] .relr.dyn RELR 0000000000f2ad38 f2ad38
02ccd0 08 A 0 0 8
[10] .rela.plt RELA 0000000000f57a08 f57a08
0025b0 18 AI 3 27 8
> To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/6637a2de-94c7-413d-a71b-d1ee5031331fn%40googlegroups.com.



--
宋方睿

727785714

unread,
Mar 14, 2024, 9:50:43 PM (2 days ago) Mar 14
to Fangrui Song' via Generic System V Application Binary Interface <>
您好,您的来件我已收到,我会尽快处理!
     谢谢!
 

Reply all
Reply to author
Forward
0 new messages