CREL: A compact relocation format

73 views
Skip to first unread message

Fangrui Song

unread,
Sep 25, 2024, 2:37:02 PMSep 25
to gener...@googlegroups.com
A previous version was named RELLEB and posted at
https://groups.google.com/g/generic-abi/c/yb0rjw56ORw .
Since the name has changed, creating a new post to avoid confusion.

LLVM 19.1 introduces full support for static CREL relocations across
the assembler, linker, and binary utilities.
However, it currently uses non-standard values (SHT_CREL = 0x40000014
and DT_CREL = 0x40000026).

For wider compatibility, I'd like to switch to the standard values:
SHT_CREL = 0x14 and DT_CREL = 0x26.

While I measured a size decrease of 18.0% for both x86-64 -O3 -g and
aarch64 -O3 -g, certain sanitizer -g builds have seen larger
reductions.
(e.g. 26+% for asan/tsan an internal large executable).

Chromium is going to adopt CREL static relocations. I've received
interest from several vendors about the dynamic relocations.

---

https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf#crel-proposal-for-the-generic-abi

In <https://www.sco.com/developers/gabi/latest/ch4.sheader.html>, make
the following changes.

_In Figure 4-9: Section Types,sh_type, append a row_

`SHT_CREL` | 20

_Add text:_

SHT_CREL - The section holds compact relocation entries with explicit
addends. An object file may have multiple relocation sections. See
''Relocation'' below for details.

_In Figure 4-16: Special Sections, append_

`.crelname` | `SHT_CREL` | see below

_Change the text below:_

.relname, .relaname, and .crelname

These sections hold relocation information, as described in
''Relocation''. If the file has a loadable segment that includes
relocation, the sections' attributes will include the SHF_ALLOC bit;
otherwise, that bit will be off. Conventionally, name is supplied by
the section to which the relocations apply. Thus a relocation section
for .text normally would have the name .rel.text, .rela.text, or
.crel.text.

_In Figure 4-23: Relocation Entries, add:_

```c
typedef struct {
Elf32_Addr r_offset;
Elf32_Word r_symidx;
Elf32_Word r_type;
Elf32_Sword r_addend;
} Elf32_Crel;

typedef struct {
Elf64_Addr r_offset;
Elf64_Word r_symidx;
Elf64_Word r_type;
Elf64_Sxword r_addend;
} Elf64_Crel;
```

_Add text above "A relocation section references two other sections":_

A `SHT_CREL` section holds compact relocation entries that decode to
`Elf32_Crel` or `Elf64_Crel` depending on the object file class
(32-bit or 64-bit).
Its content begins with a ULEB128-encoded value `count * 8 +
addend_bit * 4 + shift` (35-bit or 67-bit unsigned), where:

* `count`: Relocation count (32-bit or 64-bit unsigned).
* `addend_bit`: 1 indicates that relocation entries encode addends. 0
indicates implicit addends (stored in the location to be modified).
* `shift`: The shift value (0 to 3) applies to `delta_offset` in
relocation entries.

Relocation entries (which encode `r_offset`, `r_symidx`, `r_type`, and
`r_addend`) follow the header.
Note: `r_info` in traditional REL/RELA formats has been split into
`r_symidx` and `r_type`, allowing `uint32_t` relocation types for
ELFCLASS32 as well.

* Delta offset and flags (ULEB128): Holds `delta_offset * (addend_bit
? 8 : 4) + flags` (35-bit or 67-bit unsigned), where:
+ `delta_offset`: Difference in `r_offset` from the previous entry
(truncated to `Elf32_Addr` or `Elf64_Addr`), right shifted by `shift`.
+ `flags`: 0 to 7 if `addend_bit` is 1; otherwise 0 to 3.
+ `flags & 1`: Indicate if delta symbol index is present.
+ `flags & 2`: Indicate if delta type is present.
+ `flags & 4`: Indicate if delta addend is present.
* Delta symbol index (SLEB128, if present): The difference in symbol
index from the previous entry, truncated to a 32-bit signed integer.
* Delta type (SLEB128, if present): The difference in relocation type
from the previous entry, truncated to a 32-bit signed integer.
* Delta addend (SLEB128, if present): The difference in addend from
the previous entry, truncated to a 32-bit or 64-bit signed integer
depending on the object file class.

ULEB128 or SLEB128 encoded values use the canonical representation
(i.e., the shortest byte sequence).
For the first relocation entry, the previous offset, symbol index,
type, and addend members are treated as zero.

Encoding/decoding delta offset and flags does not need multi-precision
arithmetic. We can just unroll and special case the first iteration.
The header can be encoded/decoded in a similar way. An implementation
can assume that the relocation count cannot be larger than 2**61 and
simplify the code.

Example C++ encoder:

```cpp
// encodeULEB128(uint64_t, raw_ostream &os);
// encodeSLEB128(int64_t, raw_ostream &os);

const uint8_t addendBit = config->isRela ? 4 : 0, flagBits =
config->isRela ? 3 : 2;
Elf_Addr offsetMask = 8, offset = 0, addend = 0;
uint32_t symidx = 0, type = 0;
for (const Elf_Crel &rel : relocs)
offsetMask |= rel.r_offset;
int shift = std::countr_zero(offsetMask)
encodeULEB128(relocs.size() * 8 + addendBit + shift, os);
for (const Elf_Crel &rel : relocs) {
Elf_Addr deltaOffset = (rel.r_offset - offset) >> shift;
uint8_t b = (deltaOffset << flagBits) + (symidx != rel.r_symidx) +
(type != rel.r_type ? 2 : 0) + (addend != rel.r_addend ? 4 : 0);
if (deltaOffset < (0x80 >> flagBits)) {
os << char(b);
} else {
os << char(b | 0x80);
encodeULEB128(deltaOffset >> (7 - flagBits), os);
}
if (b & 1) {
encodeSLEB128(static_cast<int32_t>(rel.r_symidx - symidx), os);
symidx = rel.r_symidx;
}
if (b & 2) {
encodeSLEB128(static_cast<int32_t>(rel.r_type - type), os);
type = rel.r_type;
}
if (b & 4 & addendBit) {
encodeSLEB128(std::make_signed_t<Elf_Addr>(rel.r_addend - addend), os);
addend = rel.r_addend;
}
}
```

Example C++ decoder:

```cpp
// uint64_t decodeULEB128(uint8_t *&p);
// int64_t decodeSLEB128(uint8_t *&p);

const auto hdr = decodeULEB128(p);
const size_t count = hdr / 8, flagBits = hdr & 4 ? 3 : 2, shift = hdr % 4;
Elf_Addr offset = 0, addend = 0;
uint32_t symidx = 0, type = 0;
for (size_t i = 0; i != count; ++i) {
const uint8_t b = *p++;
offset += b >> flagBits;
if (b >= 0x80)
offset += (decodeULEB128(p) << (7 - flagBits)) - (0x80 >> flagBits);
if (b & 1)
symidx += decodeSLEB128(p);
if (b & 2)
type += decodeSLEB128(p);
if (b & 4 & hdr)
addend += decodeSLEB128(p);
rels[i] = {offset << shift, symidx, type, addend};
}
```

Both encoder and decoder can be simplified if the desired `addend_bit`
is hardcoded, making `flagBits` an integer literal.

_In Figure 5-10: Dynamic Array Tags, d_tag, add:_

`DT_CREL` | 38 | `d_ptr` | optional | optional

_Add text below:_

* `DT_CREL` - This element is similar to `DT_REL`, except its table
uses the CREL format. The relocation count can be inferred from the
header.

_Update `DT_PLTREL` and `DT_PLTRELSZ`:_

* `DT_PLTRELSZ`: This element holds the total size, in bytes, of the
relocation entries associated with the procedure linkage table. If an
entry of type `DT_JMPREL` is present and the `DT_PLTREL` entry value
is `DT_REL` or `DT_RELA`, a `DT_PLTRELSZ` must accompany it.
* `DT_PLTREL`: This member specifies the type of relocation entry to
which the procedure linkage table refers. The `d_val` member holds
`DT_REL`, `DT_RELA`, or `DT_CREL`, as appropriate. All relocations in
a procedure linkage table must use the same relocation type.

Florian Weimer

unread,
Sep 26, 2024, 5:12:48 AMSep 26
to Fangrui Song, gener...@googlegroups.com
* Fangrui Song:

> ULEB128 or SLEB128 encoded values use the canonical representation
> (i.e., the shortest byte sequence).
> For the first relocation entry, the previous offset, symbol index,
> type, and addend members are treated as zero.

I would like to reiterate that ULEB128/SLEB128 is a badly designed bit
encoding, due to unnecessary difficult decoding (an efficent
implementation need a parallel bit extract instruction). I don't think
we should promote its use in the ELF specification.

Existing loaders ignore DT_CREL and won't perform relocations, resulting
in obscure crashes. I'd appreciate a generic ABI solution for this
problem (marking dynamic tags as critical/must be supported).

Thanks,
Florian

Fangrui Song

unread,
Sep 26, 2024, 1:23:47 PMSep 26
to gener...@googlegroups.com, Fangrui Song
https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf#leb128-among-variable-length-integer-encodings
discusses the choice of LEB128 & sign extension over zig-zag encoding.

* Implementation simplicity. Very few instructions even when the slow
path is considered
* Sign extension is better than zig-zag encoding
* Natural extension to 67-bit delta offsets and flags
* 1-byte LEB128 dominates

I do not measure a performance difference for assembler & linker, even
if LEB128 requires more instructions than naturally aligned entities.

You can find on
https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf#crel-for-dynamic-relocations
analysis about dynamic relocations.
I don't think parallel decoding is useful (that said, parallel LEB128
decoding is possible, though I'd not do it in dynamic loaders).
Reply all
Reply to author
Forward
0 new messages