Compact section header table

236 views

Skip to first unread message

Fangrui Song

unread,

Nov 2, 2025, 6:10:33 PM11/2/25

to Generic System V Application Binary Interface

Previously posted at https://groups.google.com/g/generic-abi/c/-KqpoMYPYhc/m/-uGpNhkyCgAJ ("Light ELF: exploring potential size reduction") on April 1, 2024. Re-post here as this is serious and I've since switched to a little-endian variant of PrefixVarint.

---

## Background: The ELF Section Header Table

In ELF files, the section header table is an array of 64-byte `Elf64_Shdr` structures (or 40-byte `Elf32_Shdr` for 32-bit).
Each structure describes name, type, flags, address, offset, size, link, info, alignment, and entry size.

```c
typedef struct {
uint32_t sh_name; // Name (string table index)
uint32_t sh_type; // Type
uint64_t sh_flags; // Flags
uint64_t sh_addr; // Virtual address
uint64_t sh_offset; // File offset
uint64_t sh_size; // Size in bytes
uint32_t sh_link; // Link to another section
uint32_t sh_info; // Extra information, possibly a section header table index
uint64_t sh_addralign; // Alignment
uint64_t sh_entsize; // Entry size if section holds fixed-size entries
} Elf64_Shdr; // Total: 64 bytes
```

The key inefficiency is that most fields are frequently zero or can use default values (e.g., `sh_type=SHT_PROGBITS` for most code/data sections), yet each header consumes a fixed 64 bytes.
With `-ffunction-sections` creating hundreds or thousands of sections per compilation unit, this overhead accumulates rapidly.

When building llvm-project with `-O3 -ffunction-sections -fdata-sections -Wa,--crel,--allow-experimental-crel`, the section header tables occupy 17.6% of the total `.o` file size.
In a `-g -Wa,--crel,--allow-experimental-crel` build, the section header tables occupy 13.9% of the total `.o` file size.

This overhead multiplies when the compiler creates a metadata section for every code section (e.g., for sanitizer, code coverage, or stack unwinding metadata).
```
.group
.text.f0
.meta.f0

.group
.text.f1
.meta.f1
```

## Solution overview

Building on my previous work on [compact relocations (CREL)](/blog/2024-03-09-a-compact-relocation-format-for-elf), I propose an alternative section header table format that achieves significant space savings through two key techniques:

1. **Variable-length integer encoding**: Instead of fixed-width fields, values are encoded using 1-9 bytes depending on magnitude
2. **Presence flags**: A bitmap indicates which fields differ from default values, allowing omission of zeros and common defaults

The format is backward compatible: existing tools continue using the traditional format (`e_shentsize != 0`), while updated tools can opt into the compact format (`e_shentsize == 0`).

The remainder of this article presents a formal specification suitable for inclusion in the ELF specification, followed by design rationale, evaluation, and discussion of alternatives.

## Proposal for the ELF specification

### Specification Modifications

In _Chapter 2, ELF Header_, section _Contents of the ELF Header_, modify the description of `e_shentsize`:

<blockquote>
e_shentsize

This member holds a section header's size in bytes, or the value zero to indicate a compact section header table format.
When non-zero, a section header is one entry in the section header table; all entries have the same size.
When zero, the section header table uses the compact encoding format described below.
</blockquote>

In _Chapter 3, Sections_, add a new subsection _3.3. Compact Section Header Table_ after _3.2. Section Header Table Entry_ and shift subsequent sections.

3.3. Compact Section Header Table

When `e_shentsize` equals zero, the section header table uses a compact encoding format.

Variable-Length Integer Encoding

The compact format employs a variable-length integer (VarInt) encoding scheme that encodes 64-bit unsigned integers using 1 to 9 bytes.

This encoding is a [bijective](https://en.wikipedia.org/wiki/Bijective_numeration) variant of Little Endian Base 128 where the length information is determined by counting trailing zero bits in the first byte.
Specifically, if the first byte has *n-1* trailing zeros, then the encoded integer occupies *n* bytes total.
The special case of a zero first byte signals a 9-byte encoding.

The format supports the following encodings, where 'x' represents value bits:

```
xxxxxxx1: 7 value bits, 1 byte
xxxxxx10 xxxxxxxx: 14 value bits, 2 bytes
xxxxx100 xxxxxxxx xxxxxxxx: 21 value bits, 3 bytes
xxxx1000 xxxxxxxx xxxxxxxx xxxxxxxx: 28 value bits, 4 bytes
xxx10000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx: 35 value bits, 5 bytes
xx100000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx: 42 value bits, 6 bytes
x1000000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx: 49 value bits, 7 bytes
10000000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx: 56 value bits, 8 bytes
00000000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx: 64 value bits, 9 bytes
```

The remaining bits in the first byte, plus all subsequent bytes, contain the actual value in little-endian order.

**Example:** Consider encoding the value 147 (decimal).
In binary, 147 = 10010011, which requires 8 significant bits.
Since 7 bits fit in one byte but 8 bits require two bytes, we use the 2-byte format (xxxxxx10 xxxxxxxx) which provides 14 value bits.

- Byte 0: 6 low-order bits of the value, shifted left by 2, with the length tag `10` in the two least significant bits
- Byte 1: Remaining high-order bits of the value

Calculation: Byte 0 = ((147 & 0x3f) << 2) | 0x02 = 0x4e; Byte 1 = (147 >> 6) = 0x02.
The encoded representation is `0x4e 0x02`.

**Example 2:** Consider encoding the value 0xfedcba9876543210, which requires 64 significant bits, exceeding the 56 bits available in the 8-byte format.
Therefore, we must use the 9-byte format (00000000 xxxxxxxx...).

- Byte 0: 0x00 (the tag indicating 9-byte format)
- Bytes 1-8: The 64-bit value in little-endian byte order

The encoded representation is `0x00 0x10 0x32 0x54 0x76 0x98 0xba 0xdc 0xfe`.

Compact Section Header Table Format

The compact section header table, located at file offset `e_shoff`, begins with a VarInt-encoded section count, immediately followed by that many compact section headers.

Each compact section header begins with a single-byte `presence` field indicating which `Elf_Shdr` members are explicitly encoded.
Fields are stored in the following order:

* `sh_name`: VarInt encoded (always present)
* `sh_offset`: VarInt encoded (always present)
* `sh_type`: VarInt encoded if `presence & 0x01`; otherwise defaults to `SHT_PROGBITS`
* `sh_flags`: VarInt encoded if `presence & 0x02`; otherwise defaults to 0
* `sh_addr`: VarInt encoded if `presence & 0x04`; otherwise defaults to 0
* `sh_size`: VarInt encoded if `presence & 0x08`; otherwise defaults to 0
* `sh_link`: VarInt encoded if `presence & 0x10`; otherwise defaults to 0
* `sh_info`: VarInt encoded if `presence & 0x20`; otherwise defaults to 0
* `sh_addralign`: VarInt encoded as a log2 value if `presence & 0x40`; otherwise defaults to 1
* `sh_entsize`: VarInt encoded if `presence & 0x80`; otherwise defaults to 0

The `sh_addralign` field is encoded as the base-2 logarithm of the alignment value.
A default value of 1 (representing 2⁰) indicates no alignment constraint.
In the traditional format, `sh_addralign` may be 0 or a positive integral power of two, where both 0 and 1 mean no alignment constraint.
The compact encoding does not distinguish between these cases, treating all unspecified alignments as 1, which preserves the intended semantics.

### Reference implementation

The following pseudocode illustrates the decoding process for a section header:

```cpp
// getVarInt(const uint8_t *&p);

const uint8_t *sht = base + ehdr->e_shoff;
const uint8_t *p = sht + offsets[i];
uint8_t presence = *p++;
Elf_Shdr shdr = {};
shdr.sh_name = getVarInt(p);
shdr.sh_offset = getVarInt(p);
shdr.sh_type = presence & 0x01 ? getVarInt(p) : ELF::SHT_PROGBITS;
shdr.sh_flags = presence & 0x02 ? getVarInt(p) : 0;
shdr.sh_addr = presence & 0x04 ? getVarInt(p) : 0;
shdr.sh_size = presence & 0x08 ? getVarInt(p) : 0;
shdr.sh_link = presence & 0x10 ? getVarInt(p) : 0;
shdr.sh_info = presence & 0x20 ? getVarInt(p) : 0;
shdr.sh_addralign = presence & 0x40 ? 1UL << getVarInt(p) : 1;
shdr.sh_entsize = presence & 0x80 ? getVarInt(p) : 0;
```

The following C code provides a reference implementation for encoding and decoding VarInt values:

```c
typedef uint64_t uu64 [[gnu::aligned(1)]];

// Normalize to little-endian, i.e. byte swap on big-endian.
static inline uint64_t norm_le64(uint64_t val) {
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
return __builtin_bswap64(val);
#else
return val;
#endif
}

// Write variable-length unsigned 64-bit integer.
unsigned put_le_prefix_varint(unsigned char *buf, uint64_t x) {
// Fast path for n == 1
if (x < 128) {
buf[0] = (x << 1) | 1;
return 1;
}

unsigned sig_bits = 64 - stdc_leading_zeros(x);
unsigned n = (sig_bits + 6) / 7;
if (n > 8) {
// 9 bytes: 00000000 xxxxxxxx ...
buf[0] = 0x00;
*(uu64 *)(buf + 1) = norm_le64(x);
return 9;
}

uint64_t tagged = norm_le64((x << n) | ((uint64_t)1 << (n - 1)));
memcpy(buf, &tagged, n);
return n;
}

// Read variable-length unsigned 64-bit integer.
uint64_t get_le_prefix_varint(unsigned char **buf) {
// Fast path for n == 1
uint8_t b0 = (*buf)[0];
if (b0 & 1) {
*buf += 1;
return b0 >> 1;
}

if (b0 == 0x00) {
// 9 bytes: 00000000 xxxxxxxx ...
*buf += 9;
return norm_le64(*(uu64 *)(*buf - 8));
}

unsigned n = stdc_trailing_zeros(b0) + 1;
uint64_t x = 0;
memcpy(&x, *buf, n);
*buf += n;
x = (norm_le64(x) >> n) & (((uint64_t)1 << (7 * n)) - 1);
return x;
}
```

## Why this VarInt encoding?

The unsigned prefix-based variable-length integer encoding is identical to the one in [MLIR bytecode](https://mlir.llvm.org/docs/BytecodeFormat/#signed-variable-width-integers), rather than the more common Little Endian Base 128 (LEB128) encoding used in DWARF and WebAssembly.

**Advantages over LEB128:**

* The encoded length is determined from the first byte alone, eliminating the need to scan subsequent bytes for continuation bits.
This allows an efficient implementation that dispatches on all available lengths.
* The maximum length is 9 rather than 10.

**Trade-offs:**

* The common one-byte path require a shift operation (to extract 7 bits from an 8-bit byte), whereas LEB128 can use the byte directly. However, this minor overhead is offset by significantly faster decoding for multi-byte values, which are common in section header tables.
* LEB128 is self-synchronizing, a property we don't need.

This format can be viewed as a little-endian variant of Chromium's PrefixVarint, which places the length tag in the most significant bits.
While PrefixVarint offers advantages for big-endian systems (single-byte efficiency matching LEB128), the little-endian approach adopted here is better suited for modern architectures.
PrefixVarint requires `stdc_leading_ones` for branch-free decoding of multi-byte values (2≤n≤8), which is less efficient than `stdc_trailing_zeros`, and necessitates byte-swapping on little-endian architectures.
<https://gist.github.com/MaskRay/80b705d903688870bac96da64e7e243b> provides implementations for both variants.

This proposal does not introduce variable-length signed integers. If we ever introduce them, we should use sign extension instead of zig-zag encoding.

## Performance characteristics

The variable-length encoding means O(1) random access to arbitrary section headers is not directly supported—accessing the *i*-th header requires decoding all preceding headers.
However, this limitation is often addressed by applications building their own in-memory data structures after an initial scan.
Alternatively, for simpler applications, a prescan can be performed to determine the starting offset of each header beforehand.

## Alternatives considered

Several alternative approaches were evaluated during the design of this format:

### WebAssembly-style inline metadata

The WebAssembly object file format implemented by LLVM embeds section metadata directly with each section, eliminating the section header table entirely:

```
# start section foo
section_id
size
content
# end section

# start_section bar
section_id
size
content
# end section
```

**Pros**: Could eliminate the `sh_offset` field entirely, as sections are discovered sequentially.

**Cons**: Requires scanning the entire file to build a section index, making access to offsets expensive. This is particularly problematic for linkers that need to quickly locate specific sections. Paging in the section header table is more efficient for a parallel linking strategy.

### DWARF-style Abbreviation Tables

Inspired by DWARF's `.debug_abbrev`, this approach would define a small set of section "shapes" with predefined field layouts (e.g., a shape for typical `SHT_PROGBITS` sections), and each header would reference a shape and fill in only the varying fields.

While this could achieve better compression for files with many similar sections, it adds complexity with an additional indirection layer and abbreviation table.

```
struct Csht_Template {
varint sh_type; // SHT_PROGBITS, SHT_NOBITS, etc.
varint sh_flags; // Compressed common flags
varint sh_addralign_log2; // log2(alignment)
varint sh_entsize; // Usually 0
};

struct Csht_Entry {
uint8_t template_id : 7; // Which template
uint8_t has_overrides : 1;

varint sh_name;
varint sh_offset;
varint sh_size;

// Optional overrides (only if has_overrides = 1), indicating which fields follow
uint8_t override_mask;
// ... variable override data
};
```

### Mach-O `.subsections_via_symbols`

To work around the limitation of section count, Mach-O uses symbols as markers in the section data stream to describe subsection boundaries.

**Cons**:
Mach-O's subsection feature imposes restrictions on label differences.
Coalescing symbols or adjusting values in sections would change subsection boundaries, leading to complexity in binary manipulation tools.
Additionally, there is a loss of flexibility as subsection properties (type, flags, linked-to section) cannot be changed at least.

## Evaluation

To validate the design and measure its effectiveness in practice, I implemented a working prototype in Clang and lld.
The implementation is available at <https://github.com/MaskRay/llvm-project/tree/demo-cshdr>.

An earlier implementation using LEB128 is available at <https://github.com/MaskRay/llvm-project/tree/demo-cshdr-2024>.

The following table shows measurements from building llvm-project with different options

`.o size` |sht size | build
-----------+-----------+-------------------------------------------------
142435560 | 20767296 | -O3
117643000 | 20767296 | -O3 -Wa,--crel,--allow-experimental-crel
101229095 | 4351215 | -O3 -Wa,--crel,--allow-experimental-crel,--cshdr
1595953680 | 22474624 | -O3 -g
1278305368 | 22474624 | -O3 -g -Wa,--crel,--allow-experimental-crel
1260602248 | 4766392 | -O3 -g -Wa,--crel,--allow-experimental-crel,--cshdr
2627892024 | 294078080 | -g
2112127000 | 294078080 | -g -Wa,--crel,--allow-experimental-crel
1888962427 | 70911315 | -g -Wa,--crel,--allow-experimental-crel,--cshdr

---

The write-up is also available at the updated https://maskray.me/blog/2024-03-31-a-compact-section-header-table-for-elf . I'll update the blog post as needed.

Rafael Ávila de Espíndola

unread,

Nov 6, 2025, 6:51:01 AM11/6/25

to Fangrui Song, Generic System V Application Binary Interface

Hi,

It has been some time since I worked on a linker, so I can't comment on
the specifics, but I can say the results look awesome. The flexibility
in what gets put in its own section is one of ELFs good design decisions
IMHO, and this makes it better!

Thanks,
Rafael

> --
> You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/generic-abi/011815ed-3082-4bf8-89ae-f9cd0c62fbc7n%40googlegroups.com.

Reid Kleckner

unread,

Nov 6, 2025, 1:23:31 PM11/6/25

to gener...@googlegroups.com, Arthur Eubanks, Fangrui Song

Thanks for the updated proposal!

I want to emphasize that I think this work is really valuable. There are many potential creative applications of comdat groups for tracking per-function metadata in a way that smoothly interoperates with standard comdat deduplication, linker GC, and other operations, but they are often deemed infeasible due to the excessive overhead of ELF object file sections.

If you're building C++ with `-ffunction-sections`, the number of .text sections is already large and O(#functions), and if you store per-function metadata, such as coverage counters, unwind info, debug info, EH tables, etc into a separate section, that approach will multiply the existing number of .text sections, doubling it at a minimum. Reducing per-section object file overhead can enable valuable new functionality. To make this more concrete, there are active conversations about how to improve .eh_frame, and some designs would be a lot more palatable if ELF object file sections were smaller.

This multiplier is something the PE/COFF ecosystem suffers from particularly, because there every function can potentially generate 5 sections: .text/.pdata/.xdata/.debug$S/.text$x, and at least historically, MSVC did not transparently enable /bigobj to raise the section index field width from 16bits to 32bits, so you were often limited to a paltry 4k function instantiations per file. Hopefully with /bigobj, ~500M functions in a single object is enough for most users. The advantage of this model is that unwind info and debug info are transparently discarded by the linker, so you don't have dead data bloating the final output that has to be tombstoned and ignored by debuggers.

Another virtue of this proposal is that it is just a transparent re-encoding of existing ELF data structures, so it has minimal impact on the rest of the binary tool ecosystem. When applied to object files, this is just a handshake between the assembler and linker, and it should have no impact on linked images and their ABI.

To view this discussion visit https://groups.google.com/d/msgid/generic-abi/87o6pftm5f.fsf%40espindo.la.

Roland McGrath

unread,

Nov 6, 2025, 2:12:27 PM11/6/25

to gener...@googlegroups.com

I think the overall proposal is quite strong. I think you've already proven the efficacy of SHT_CREL quite well and this shares a lot with that. As Reid mentioned, a crucial and wise choice here is to make this a semantically transparent pure encoding change. This is entirely information-preserving, with the one pedantic exception of losing the distinction between 0 and 1 in sh_addralign, which was in a way always a wart anyway, and that semantic normalization is already specified; the fact that some bit patterns of straightforwardly invalid ELF data (such as sh_addralign that's not a power of two) cannot be represented seems like more of a boon to tools and users than a limitation. Whatever the shortcoming and ambiguities of ELF linking semantics, it's far simpler to leave the semantics entirely out of a revamp of the encoding--in every implementation, the new support will be a very thin addition at just the physical encoding / decoding layer.

A meta-point just to smooth the discussion: please ASAP choose a short canonical abbreviation for the LEB128 variant encoding so we can refer to it compactly (pun intended!), perhaps CLEB128 ("counted" or "compact"), perhaps preemptively CULEB128 or UCLEB128 in anticipation of CSLEB128 / SCLEB128 being of use somewhere in the future.

Now, just a couple of substantive points.

I don't have particular reservations about the CLEB128 format (I'll just use that name for now). The stated advantages over LEB128 make sense. I think that if we were starting over (at the dawn of DWARF or whenever) we probably would choose this over LEB128 for those reasons. I'm not sure how compelling its advantages over LEB128 are so as to justify doing something new instead of just using LEB128 because it's already well-established (not that I personally need much convincing; I'm fine with it for new formats). To that end, I think in the discussion of its merits it would be worthwhile to consider the use of SLEB128 for semantically unsigned values. For 64-bit address values, it's pretty common that if any of the high 8 bits are set, then in fact all of the high 8 bits will be set (often the high 16 bits). If, in practice, there are only a very tiny number of 10-byte encoding cases if you just use SLEB128 in place of ULEB128, then the compactness argument for the new format loses steam. (I think the constant-time length determination is the best feature of the new format, personally. But I'd be hard-pressed to argue that it merits adopting a new format in places where the tools in question almost certainly already have some LEB128 decoding to do elsewhere and thus must fundamentally already handle the scan-to-determine-length decoding structure anyway.)

My other nit is just with the bootstrapping and compatibility issues of the new shdr format, in particular e_shentsize. My first thought was just that we might use a nonzero value like 0xffff instead as a more clear, intentional marker and then anticipate using 0xfffe and so on later as a way to locally version the compact shdr format itself. However, I then realized a larger problem with e_shentsize. The PN_XNUM encoding for e_phnum that's too large relies on the normal e_shentsize and shdr[0] having the full canonical size and layout, though it only actually uses one field of it. So how about an alternative encoding to indicate the new format, like this: ehdr fields are as now, but with e_shnum 1; shdr[0] uses sh_flags and/or sh_type to indicate the new format is in use (and future versioning for it perhaps), and sh_offset / sh_size to point to where the new-format table sits; shdr[0].sh_info stays free for the PN_XNUM protocol. That adds sizeof(Shdr) overhead back to each file to buy the compatibility and flexibility, but that's a tiny bit of what will be saved overall.

Fangrui Song

unread,

Nov 9, 2025, 12:23:38 AM11/9/25

to Generic System V Application Binary Interface

Thanks to all for the encouraging feedback!

CLEB128 (Counted LEB128) is a fantastic, unambiguous name. We should adopt it.

Note: PrefixVarInt as implemented in Chromium base/prefixvarint.h is a big-endian variant, so we want to avoid reusing PrefixVarInt.
Additionally, while the length information is encoded in prefix bytes, "prefix" might be misinterpreted as referring to the most significant bits rather than the least significant bits.

CLEB128 actually represents a family of encodings. The 10-byte encoding that strictly mirrors LEB128 would be:

00000000 xxxxxxx1 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx 0000000x: 64 value bits, 10 bytes

However, since we're encoding uint64_t, we could instead use the following, sacrificing further length extension:

00000000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx: 64 value bits, 9 bytes

For clarity in documentation, I prefer "signed CLEB128" or "unsigned CLEB128", while using [SU]Leb128 in code to avoid too many consecutive uppercase letters in identifiers.

I decided to switch from LEB128 to CLEB128 to make erroneous states (non-canonical LEB128) unrepresentable by construction, which is a valuable property.

(
CREL's 67-bit delta offsets and flags require 10-byte (70-bit) LEB128 or CLEB128 encoding.
While the LEB128 code is slightly simpler, CREL is still experimental (enabled via clang -Wa,--crel,--allow-experimental-crel), so we could switch it over to improve consistency .
)

---

**On signed CLEB128 for addresses:**

Signed variants of variable-length integer encodings are typically implemented using either sign extension or zig-zag encoding.
Sign extension is better for both LEB128 and CLEB128 as the sign extension also folds entirely with the existing shift operations.
(https://news.ycombinator.com/item?id=11263378 (WebAssembly and Protocol Buffers encoding discussion) is incorrect on this aspect)

Both approaches sacrifice one bit to represent the sign. This means that while the unsigned variant allows one-byte encoding to cover the range [0,128), a signed variant only covers [-64,64)—and the negative range is rarely used in practice.

**On e_shentsize and PN_XNUM:** This is a critical catch :)
We also face the e_shnum limitation: "If the number of sections is greater than or equal to SHN_LORESERVE (0xff00)... [use the] sh_size field of the section header at index 0."
I encountered this issue with my -DCMAKE_BUILD_TYPE=Debug build of llvm-project (using -g -ffunction-sections):

```
% fllvm-readelf -S lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilder.cpp.o
...
[69805] .crel.debug_line CREL 0000000000000000 d8bccc 01209a 01 I 69808 69804 1
[69806] .debug_line_str PROGBITS 0000000000000000 b3cfb2 0030f3 01 MS 0 0 1
[69807] .llvm_addrsig LLVM_ADDRSIG 0000000000000000 d9dd66 00a883 00 E 69808 0 1
[69808] .symtab SYMTAB 0000000000000000 b842d0 134070 18 1 26870 8
[69809] .symtab_shndx SYMTAB SECTION INDICES 0000000000000000 cb8340 033568 04 69808 0 4
```

The reader in llvm/include/llvm/Object/ELF.h has to decode shdr[0].sh_size and use that value to re-decode the full section header table.
(The following patch modifies the recently-landed PN_XNUM support code https://github.com/llvm/llvm-project/pull/165284 )
```
uint32_t NumSections = getHeader().e_shnum;
if (getHeader().e_shentsize == 0) {
bool Uninit = FakeSections.empty();
// When e_shnum is 0, getShNum() calls sections() to read section 0,
// which may populate FakeSections with a single entry. Clear and rebuild to
// ensure consistency.
if (Expected<uint64_t> ShNumOrErr = getShNum())
NumSections = *ShNumOrErr;
else
return ShNumOrErr.takeError();
if (Uninit) {
FakeSections.clear();
auto *P = reinterpret_cast<const uint8_t *>(base() + SectionTableOffset);
for (uint32_t i = 0; i != NumSections; ++i) {
Elf_Shdr Shdr = {};
uint8_t Presence = *P++;
Shdr.sh_name = decodeUPLeb128AndIncUnsafe(P);
Shdr.sh_offset = decodeUPLeb128AndIncUnsafe(P);
Shdr.sh_type =
Presence & 1 ? decodeUPLeb128AndIncUnsafe(P) : ELF::SHT_PROGBITS;
Shdr.sh_flags = Presence & 2 ? decodeUPLeb128AndIncUnsafe(P) : 0;
Shdr.sh_addr = Presence & 4 ? decodeUPLeb128AndIncUnsafe(P) : 0;
Shdr.sh_size = Presence & 8 ? decodeUPLeb128AndIncUnsafe(P) : 0;
Shdr.sh_link = Presence & 16 ? decodeUPLeb128AndIncUnsafe(P) : 0;
Shdr.sh_info = Presence & 32 ? decodeUPLeb128AndIncUnsafe(P) : 0;
Shdr.sh_addralign =
Presence & 64 ? uintX_t(1) << decodeUPLeb128AndIncUnsafe(P) : 1;
Shdr.sh_entsize = Presence & 128 ? decodeUPLeb128AndIncUnsafe(P) : 0;
FakeSections.push_back(Shdr);
}
}
return FakeSections;
}
```

I believe this current approach is likely simpler than the alternative of using e_shnum=1.

Ali Bahrami

unread,

Nov 10, 2025, 12:48:06 PM11/10/25

to gener...@googlegroups.com

Hi,

On Sun, Nov 2, 2025, at 4:10 PM, Fangrui Song wrote:

When building llvm-project with `-O3 -ffunction-sections -fdata-sections -Wa,--crel,--allow-experimental-crel`, the section header tables occupy 17.6% of the total `.o` file size.

My first thought on reading this, is that the size of a section header

is normally small compared to the size of the function or data it

represents, so this must be a case where a vast number of tiny things

are being generated. That's not a normal scenario for most links, so

we're being asked to cater specifically to an uncommon case. I'm not

saying it's not common for you, but just generally.

My second thought was that it's complex, pulls in LEB-like concepts

that I don't love for ELF, and is bespoke for this one particular case

of section headers, rather than taking a more general approach.

Roland's point about shdr[0] being needed by the runtime linker

for e_phnum is also very important. It's not OK to add any overhead

at runtime.

Frankly, it doesn't bother me that these objects have relatively large

section headers compared to their content, and if it was up to me,

I'd leave things as they are, preferring stability, and compatibility

with all the existing code. Making already small objects a bit smaller

is useful, but not the most important thing to me. I've been outvoted on

this argument before, and don't expect to win it now, but I'll put it out there.

Perhaps more helpfully, I wonder if you've considered applying

compression to this case, rather than developing a unique

scheme specific to section headers?

As an experiment, I extracted the section header data from a

175M object with extended headers that I happened to have lying

around, into a file named shdr:

% elfdump bigprog-rel.2.o | more

ELF Header

ei_magic: { 0x7f, E, L, F }

ei_class: ELFCLASS32 ei_data: ELFDATA2LSB

ei_osabi: ELFOSABI_NONE ei_abiversion: 0

e_machine: EM_386 e_version: EV_CURRENT

e_type: ET_REL

e_flags: 0

e_entry: 0 e_ehsize: 52 e_shstrndx: SHN_XINDEX (see shdr[0].sh_link)

e_shoff: 0xa898ca0 e_shentsize: 40 e_shnum: 0 (see shdr[0].sh_size)

e_phoff: 0 e_phentsize: 0 e_phnum: 0

Section Header[0]: (ELF Ehdr extensions)

sh_addr: 0 sh_flags: 0

sh_size: 166519 (e_shnum) sh_type: [ SHT_NULL ]

sh_offset: 0 sh_entsize: 0

sh_link: 83262 (e_shstrndx) sh_info: 0 (e_phnum)

sh_addralign: 0

The section header is 6.3M, which is 4% of the on disk file. I then

compressed it with zlib and zstd:

% cp shdr shdr_zlib

% gzip shdr_zlib

% cp shdr shdr_zstd

% zstd shdr_zstd

shdr_zstd : 8.30% ( 6.35 MiB => 540 KiB, shdr_zstd.zst)

You can see that compresses very well:

% ls -alh shdr shdr_zlib.gz shdr_zstd.zst

-rw-r--r-- 1 alib staff 6.35M Nov 10 09:05 shdr

-rw-r--r-- 1 alib staff 902K Nov 10 09:07 shdr_zlib.gz

-rw-r--r-- 1 alib staff 540K Nov 10 09:06 shdr_zstd.zst

My rough calculation is that this reduces section header data

to under 1% of the total object. It will probably be more for your case,

due to the small section sizes, so I don't really know if this beats

the results posted for the original proposal, but perhaps it's

good enough? If so, we can reuse existing code already developed

to deal with compresed sections.

We would have to decide how to indicate that section headers

have been compressed. I'll defer that for now, but an e_flags bit

could be assigned, or we could use a precious e_ident[] entry holding

compression flags if we wanted a bitfield (is there anything else left

to compress?).

I would imagine that the layout of the section header in such a case

would have to be something like:

shdr[0]

Compression header (Elf32_Chdr, or Elf64_Chdr)

compressed bytes

Where shdr[0] is there for the benefit of the runtime linker

primarily, but also for other code that deals with extended

section headers. The compressed bytes would also include

shdr[0], containing identical data to the stub one above the

compression header, to simplify access once decompressed.

We would also need to consider how libelf would deal with these.

My initial guess is that it would unconditionally decompress such

a case and not expose it to the libelf consumer at all, unlike the

way we handle compressed section data. That would minimize the

need to change other tools.

This doesn't win any prizes for simplicity or elegance either, but

compression itself is a more general solution, and if we ignore the

internal complexity of libz/libzstd, simpler overall.

- Ali

Fangrui Song

unread,

Nov 11, 2025, 2:56:38 AM11/11/25

to Generic System V Application Binary Interface

On Monday, November 10, 2025 at 9:48:06 AM UTC-8 Ali Bahrami wrote:

Hi,

On Sun, Nov 2, 2025, at 4:10 PM, Fangrui Song wrote:
When building llvm-project with `-O3 -ffunction-sections -fdata-sections -Wa,--crel,--allow-experimental-crel`, the section header tables occupy 17.6% of the total `.o` file size.

My first thought on reading this, is that the size of a section header
is normally small compared to the size of the function or data it
represents, so this must be a case where a vast number of tiny things
are being generated. That's not a normal scenario for most links, so
we're being asked to cater specifically to an uncommon case. I'm not
saying it's not common for you, but just generally.

This proposal optimizes for -ffunction-sections -fdata-sections -Wl,--gc-sections, which is increasingly common rather than uncommon.
These -f options are among the most-used GCC and Clang options.
All large projects I'm aware of enable section GC, as do all corporate users I know, from embedded systems to large server executables.
Section GC is also the foundation for section reordering optimizations.
While Linux distributions don't generally enable -ffunction-sections -fdata-sections -Wl,--gc-sections by default, the practice is widespread in production environments.

My second thought was that it's complex, pulls in LEB-like concepts
that I don't love for ELF, and is bespoke for this one particular case
of section headers, rather than taking a more general approach.

Variable-length integer encoding is widely used across object file formats.
It's present in DWARF, WebAssembly binary format, Mach-O export trie, .llvm_addrsig, BOLT address translation tables, RISC-V attribute sections, GSYM, and more.
I would prefer a better format than LEB128 for new use cases. cshdr can be a start

Roland's point about shdr[0] being needed by the runtime linker
for e_phnum is also very important. It's not OK to add any overhead

at runtime.

I don't believe the runtime overhead argument applies here.
The only use case we've found for PN_XNUM is core dumpers.
Core dumpers don't generate section headers (except the index 0 entry to provide sh_info), so they don't need to handle the compact section header table format.
There is no motivation for dynamic loaders to support more than PN_XNUM program headers.

Debuggers parse both section headers and program headers. They decode section headers as usual, then decode program headers.
When PN_XNUM is used, shdr[0].sh_info provides the number of program headers—this is orthogonal to whether the compact section header table format is used.

Frankly, it doesn't bother me that these objects have relatively large
section headers compared to their content, and if it was up to me,
I'd leave things as they are, preferring stability, and compatibility
with all the existing code. Making already small objects a bit smaller
is useful, but not the most important thing to me. I've been outvoted on
this argument before, and don't expect to win it now, but I'll put it out there.

Perhaps more helpfully, I wonder if you've considered applying
compression to this case, rather than developing a unique
scheme specific to section headers?

With general compression, users must pay the cost of fully decompressing the entire section header table upfront. This could be significant for binary utilities that don't want to depend on zstd.

While SHF_COMPRESSED sections exist, many utilities don't need to read the compressed content, so the decompression cost isn't always paid.

But with compressed section headers, every tool must decompress the entire table just to read the file. The decompression cost can't be avoided.

If a runtime library ever needs section headers for executable and shared object files, implementing the compact section header table format would be far easier and safer than integrating a general compression library.

Additionally, many utilities maintain their own in-memory representation of the section header table. Decompressing first means they need both this in-memory representation and a separate buffer for the decompressed content, doubling the memory overhead.

I posted "Light ELF: exploring potential size reduction" on April 1, 2024, anticipating that this kind of proposal would face objections.
However, it's a valuable thought experiment. If we ever bump the ELF version, I would certainly propose dropping the traditional section header table and keeping the compact one.
The semantic-aware encoding preserves the benefits while avoiding pitfalls like sh_addralign, and provides sh_flags with 32 additional bits.
With CREL and CSHDR combined, ELF will be more compact than the WebAssembly binary format.

I have built llvm-mc with three build configurations: default, crel, and crel+cshdr, and then compressed the section header tables with zstd.

#!/bin/zsh
conf() { configure-llvm $@ -DCMAKE_{EXE,SHARED}_LINKER_FLAGS='-Wl,-z,pack-relative-relocs -fuse-ld=lld' -DCMAKE_CXX_ARCHIVE_CREATE="/tmp/Rel/bin/llvm-ar qc --thin <TARGET> <OBJECTS>"; }

conf s2-custom-none -DCMAKE_{C,CXX}_FLAGS=''
conf s2-custom-crel -DCMAKE_{C,CXX}_FLAGS='--start-no-unused-arguments -Wa,--crel,--allow-experimental-crel --end-no-unused-arguments'
conf s2-custom-cshdr -DCMAKE_{C,CXX}_FLAGS='--start-no-unused-arguments -Wa,--crel,--allow-experimental-crel,--cshdr --end-no-unused-arguments'

for i in none crel cshdr; do ninja -C /tmp/out/s2-custom-$i llvm-mc opt; done

After removing the CMakeFiles/3.28.3 directories, I used a Ruby program to extract uncompressed and compressed section header table data.

#!/usr/bin/env ruby
require 'find'
directory = ARGV[0]
files = total_sht = total_compressed = total_compressed_with_hdr = total_compressed_with_hdr2 = total_file = 0

Find.find(directory) do |path|
next unless File.file?(path) && path.end_with?('.o')
elf_header = `/tmp/Rel/bin/llvm-readelf -h "#{path}" 2>/dev/null`
next if $?.exitstatus != 0

sht_start = nil
elf_header.lines.each do |line|
sht_start = line.split(':').last.strip.to_i if line.include?("Start of section headers:")
end
next unless sht_start

file_size = File.size(path)
sht_size = file_size - sht_start

# Extract and compress section header table
log = []
File.open(path, 'rb') do |file|
file.seek(sht_start)
sht_bytes = file.read(sht_size)

compressed = `dd if="#{path}" bs=1 skip=#{sht_start} count=#{sht_size} 2>/dev/null | zstd -3 -c 2>/dev/null`.bytesize
log << "#{path}: #{sht_size} -> #{compressed}"

total_sht += sht_size
total_compressed += compressed
total_compressed_with_hdr += compressed + 24
total_compressed_with_hdr2 += compressed + 64 + 24
total_file += file_size
files += 1
end
log.sort!
puts log.join('\n')
end

adjusted_file_size_compressed = total_file - total_sht + total_compressed
adjusted_file_size_with_hdr = total_file - total_sht + total_compressed_with_hdr
adjusted_file_size_with_hdr2 = total_file - total_sht + total_compressed_with_hdr2

puts "Files: #{files}"
puts "Total uncompressed: #{total_sht} bytes (#{(total_sht.to_f/total_file*100).round(1)}%)"
puts "Total compressed: #{total_compressed} bytes (#{(total_compressed.to_f/adjusted_file_size_compressed*100).round(1)}%)"
puts "Total (compressed + 24): #{total_compressed_with_hdr} bytes (#{(total_compressed_with_hdr.to_f/adjusted_file_size_with_hdr*100).round(1)}%)"
puts "Total (compressed + 64 + 24): #{total_compressed_with_hdr2} bytes (#{(total_compressed_with_hdr2.to_f/adjusted_file_size_with_hdr2*100).round(1)}%)"
puts "Total file size: #{total_file} bytes"

Then I analyzed the three build directories.

% tail -n 5 none.txt crel.txt cshdr.txt
==> none.txt <==
Total uncompressed: 21161984 bytes (14.6%)
Total compressed: 2636086 bytes (2.1%)
Total (compressed + 24): 2680126 bytes (2.1%)
Total (compressed + 64 + 24): 2797566 bytes (2.2%)
Total file size: 145217704 bytes

==> crel.txt <==
Total uncompressed: 21161984 bytes (17.6%)
Total compressed: 2603098 bytes (2.6%)
Total (compressed + 24): 2647138 bytes (2.6%)
Total (compressed + 64 + 24): 2764578 bytes (2.7%)
Total file size: 119929440 bytes

==> cshdr.txt <==
Total uncompressed: 4435961 bytes (4.3%)
Total compressed: 2702497 bytes (2.7%)
Total (compressed + 24): 2746585 bytes (2.7%)
Total (compressed + 64 + 24): 2864153 bytes (2.8%)
Total file size: 103205593 bytes

Overall File Size Reduction:

- Baseline: 145,217,704 bytes
- CREL only: 119,929,440 bytes (~17.4% reduction)
- CREL + CSHDR: 103,205,593 bytes (~28.9% reduction from baseline)

Both compressed traditional headers and cshdr make section header overhead mostly negligible, but cshdr achieves comparable savings with much simpler encoding.

Moreover, as a byte-oriented format, cshdr can be further compressed, even though the result would be slightly larger than compressing the section header table directly.

Fangrui Song

unread,

Nov 11, 2025, 3:52:04 AM11/11/25

to Generic System V Application Binary Interface

The fact that compressed cshdr is slightly larger than compressed traditional headers reminds me of an observation I made about DWARF a few years ago:

> Compressed `.debug_rnglists` is larger than compressed `.debug_ranges`. `.debug_rnglists` uses a manually tuned byte-oriented encoding, which is less amenable to LZ77 compression.

We could use delta encoding to make sh_offset more compressible, but I don't want to introduce signed CLEB128 just for that purpose.
.strtab string order is arbitrary, and I don't want to require a specific ordering solely for better compressibility.

For completeness, I've also measured -O3 -fno-unique-section-names -ffunction-sections -fdata-sections builds. In this case, zstd+cshdr is better than zstd+normal SHT.

This demonstrates that the cshdr encoding is a good filter (think of Branch/Call/Jump (BCJ) filter) for relocatable files.

#!/bin/zsh
conf() { configure-llvm $@ -DCMAKE_{EXE,SHARED}_LINKER_FLAGS='-Wl,-z,pack-relative-relocs -fuse-ld=lld' -DCMAKE_CXX_ARCHIVE_CREATE="/tmp/Rel/bin/llvm-ar qc --thin <TARGET> <OBJECTS>"; }

conf s2-custom-none -DCMAKE_{C,CXX}_FLAGS='-fno-unique-section-names'
conf s2-custom-crel -DCMAKE_{C,CXX}_FLAGS='-fno-unique-section-names --start-no-unused-arguments -Wa,--crel,--allow-experimental-crel --end-no-unused-arguments'
conf s2-custom-cshdr -DCMAKE_{C,CXX}_FLAGS='-fno-unique-section-names --start-no-unused-arguments -Wa,--crel,--allow-experimental-crel,--cshdr --end-no-unused-arguments'

for i in none crel cshdr; do ninja -C /tmp/out/s2-custom-$i llvm-mc opt; done

% tail -n 7 none.txt crel.txt cshdr.txt
==> none.txt <==
Files: 1811
Total uncompressed: 20766528 bytes (14.8%)
Total compressed (zstd): 2054869 bytes (1.7%)
Total compressed (xz): 1783432 bytes (1.5%)
Total (compressed + 24): 2098333 bytes (1.7%)
Total (compressed + 64 + 24): 2214237 bytes (1.8%)
Total file size: 140650008 bytes

==> crel.txt <==
Files: 1811
Total uncompressed: 20766528 bytes (17.9%)
Total compressed (zstd): 2026193 bytes (2.1%)
Total compressed (xz): 1718940 bytes (1.8%)
Total (compressed + 24): 2069657 bytes (2.1%)
Total (compressed + 64 + 24): 2185561 bytes (2.2%)
Total file size: 115855776 bytes

==> cshdr.txt <==
Files: 1811
Total uncompressed: 4283714 bytes (4.3%)
Total compressed (zstd): 2011285 bytes (2.1%)
Total compressed (xz): 1762488 bytes (1.8%)
Total (compressed + 24): 2054749 bytes (2.1%)
Total (compressed + 64 + 24): 2170653 bytes (2.2%)
Total file size: 99372962 bytes

Ali Bahrami

unread,

Nov 11, 2025, 1:31:41 PM11/11/25

to gener...@googlegroups.com

Hi,

On Tue, Nov 11, 2025, at 12:56 AM, Fangrui Song wrote:

This proposal optimizes for -ffunction-sections -fdata-sections -Wl,--gc-sections, which is increasingly common rather than uncommon.

That was clear. With that starting point: For the section header sizes to form a

significant fraction of the file, the functions themselves would have to be small.

If they are small, then while the resulting object may be larger than it theoretically

has to be, it will usually still be small in an absolute sense. That leaves what I assume

is a corner case, objects that are really huge due mainly to the section headers of a

vast number of small functions.

Variable-length integer encoding is widely used across object file formats.

That doesn't make it ELF-like, or a good fit. One can always find examples

of places where something is done. In particular, DWARF and ELF serve

different purposes, and make different tradeoffs.

Roland's point about shdr[0] being needed by the runtime linker
for e_phnum is also very important. It's not OK to add any overhead
at runtime.

I don't believe the runtime overhead argument applies here.
The only use case we've found for PN_XNUM is core dumpers.

Core dumpers don't generate section headers (except the index 0 entry to provide sh_info), so they don't need to handle the compact section header table format.With general compression, users must pay the cost of fully decompressing the entire section header table upfront.

I'm not worried about core dumpers. I agree that PN_XNUM shouldn't be

used outside of cores, nor do I expect executable or shared objects to have

more than a handful of program headers. PN_XNUM is legal in other object

types though, and so our kernel, runtime linker, and libproc, all check. Currently,

their code examines shdr[0] if PN_XNUM is seen, and drives on. While I guess

they could instead error out in those cases, I'd rather not do that, and I definitely

don't want to pull decompression code into all of those contexts. Allowing for

shdr[0] to always be visible supports their existing implementation.

This could be significant for binary utilities that don't want to depend on zstd.

I think that most binary utilities that look at section headers, outside of the

PN_XNUM case with shdr[0], use libelf. It is is no big deal for libelf to depend

on zstd, and indeed it already does, to support section compression.

While SHF_COMPRESSED sections exist, many utilities don't need to read the compressed content, so the decompression cost isn't always paid.

That's definitely so for section data, though note that on input, a link-editor will

decompress everything, and then possibly recompress it on output. Some costs

are buried.

But with compressed section headers, every tool must decompress the entire table just to read the file. The decompression cost can't be avoided.

Utilities that need to read the section header table beyond shdr[0]

usually want to see the whole table. In addition, such utilities usually

use libelf, and libelf itself needs section headers to operate. Hence, the

"have to decompress the entire table" aspect seems necessary in either

scheme. My assumption is that not many objects really need to compress

the section header table, and that it's not unreasonable for those cases to

pay a modest cost for it.

If a runtime library ever needs section headers for executable and shared object files, implementing the compact section header table format would be far easier and safer than integrating a general compression library.

We'll have to disagree here. Multiple implementations of a bespoke scheme

invites the possibility of incompatible differences. In contrast, I trust that

libz or libzstd will do exactly the same thing on any platform, and that any

failure to do that will be caught and fixed very quickly. In contrast, I've seen

ELF mismatches take years to uncover, and backward compatibility makes

those very hard to repair.

Additionally, many utilities maintain their own in-memory representation of the section header table. Decompressing first means they need both this in-memory representation and a separate buffer for the decompressed content, doubling the memory overhead.

Which is also an argument for leaving the section headers alone

for the vast majority of cases. However, for those cases that want

smaller object files, it's not a big ask for them to spend a little more

memory to access it. Unless the section headers start trending into

hundreds of megabytes, it seems unimportant on machines made

in the last 20 years.

I posted "Light ELF: exploring potential size reduction" on April 1, 2024, anticipating that this kind of proposal would face objections.
However, it's a valuable thought experiment. If we ever bump the ELF version, I would certainly propose dropping the traditional section header table and keeping the compact one.
The semantic-aware encoding preserves the benefits while avoiding pitfalls like sh_addralign, and provides sh_flags with 32 additional bits.

Thought experiments are important, so that's great. You can expect a

healthy debate when you move to make those proposals, but let's

not do that now.

Both compressed traditional headers and cshdr make section header overhead mostly negligible, but cshdr achieves comparable savings with much simpler encoding.
Moreover, as a byte-oriented format, cshdr can be further compressed, even though the result would be slightly larger than compressing the section header table directly.

The cshdr approach doesn't seem simpler, and the option to

squeeze it further seems unnecessary. Order of magnitude

reductions can be useful in some situations, but additional

rounding error improvements don't move the dial much.

- Ali

Fangrui Song

unread,

Nov 12, 2025, 3:46:47 AM11/12/25

to Generic System V Application Binary Interface

Hi Ali,

Thanks for the detailed response. Let me address your points:

On Tuesday, November 11, 2025 at 10:31:41 AM UTC-8 Ali Bahrami wrote:

> That was clear. With that starting point: For the section header sizes to form a
> significant fraction of the file, the functions themselves would have to be small.
> If they are small, then while the resulting object may be larger than it theoretically
> has to be, it will usually still be small in an absolute sense. That leaves what I assume
> is a corner case, objects that are really huge due mainly to the section headers of a
> vast number of small functions.

I respectfully disagree that this is a corner case. Section header tables occupy 17.6% of the total .o file size in my CREL llvm-project build, and 16.8% for a Chromium non-CREL build. The ratio consistently exceeded 10% for many applications I measured when I was at Google. With a rough estimate of 480 bytes per function in a Clang build, the 64-byte sizeof(Elf64_Shdr) overhead represents a significant fraction of each function's total cost.

> That doesn't make it ELF-like, or a good fit. One can always find examples
> of places where something is done. In particular, DWARF and ELF serve
> different purposes, and make different tradeoffs.

If we insist on the natural size and alignment guidelines for control structures described in the original ELF paper, then variable-length encoding is indeed not traditionally "ELF-like." However, I believe such space savings are essential to keeping ELF relevant in the decades ahead. Variable-length encoding is a natural fit for this problem, allowing efficient encoding without the complexity of general compression.

> I'm not worried about core dumpers. I agree that PN_XNUM shouldn't be
> used outside of cores, nor do I expect executable or shared objects to have
> more than a handful of program headers. PN_XNUM is legal in other object
> types though, and so our kernel, runtime linker, and libproc, all check. Currently,
> their code examines shdr[0] if PN_XNUM is seen, and drives on. While I guess
> they could instead error out in those cases, I'd rather not do that, and I definitely
> don't want to pull decompression code into all of those contexts.
>

> I think that most binary utilities that look at section headers, outside of the
> PN_XNUM case with shdr[0], use libelf. It is is no big deal for libelf to depend
> on zstd, and indeed it already does, to support section compression.

> Utilities that need to read the section header table beyond shdr[0]
> usually want to see the whole table. In addition, such utilities usually
> use libelf, and libelf itself needs section headers to operate. Hence, the
> "have to decompress the entire table" aspect seems necessary in either
> scheme.

Regarding libelf: I'm not sure why a specific library implementation should drive an ELF specification proposal. There are multiple libelf implementations—https://src.illumos.org/source/xref/illumos-gate/usr/src/head/libelf.h this one is likely close to the Solaris version, while Linux generally uses the one from elfutils.
Not all implementations support every ELF feature, nor should they be required to for a specification to be valid.

More importantly, many tools don't use libelf at all. llvm-readelf, GNU readelf, and numerous custom tools parse ELF directly.
For these tools, decoding a compact section header table is significantly easier than integrating a compression library and handling the decompression workflow.
When I have time, I plan to contribute cshdr support to binutils and hope that elfutils maintainers will also adopt it.
I suppose that is mostly sufficient - very few tools need to deal with relocatable files.

> We'll have to disagree here. Multiple implementations of a bespoke scheme
> invites the possibility of incompatible differences. In contrast, I trust that
> libz or libzstd will do exactly the same thing on any platform, and that any
> failure to do that will be caught and fixed very quickly.

That's precisely why a clear specification is essential. The encoding rules for cshdr are deterministic and precisely specified—all implementations must follow the specification. I've provided a reference implementation that demonstrates just how straightforward the format is:

for (uint32_t i = 0; i < nshdr; i++) {

uint8_t presence = *p++;
Elf_Shdr shdr = {};
shdr.sh_name = getVarInt(p);
shdr.sh_offset = getVarInt(p);

shdr.sh_type = presence & 0x01 ? getVarInt(p) : SHT_PROGBITS;

shdr.sh_flags = presence & 0x02 ? getVarInt(p) : 0;
shdr.sh_addr = presence & 0x04 ? getVarInt(p) : 0;
shdr.sh_size = presence & 0x08 ? getVarInt(p) : 0;
shdr.sh_link = presence & 0x10 ? getVarInt(p) : 0;
shdr.sh_info = presence & 0x20 ? getVarInt(p) : 0;

shdr.sh_addralign = presence & 0x40 ? (uint64_t)1 << *p++ : 1;

shdr.sh_entsize = presence & 0x80 ? getVarInt(p) : 0;

// use shdr
}

Given this simplicity, I don't expect incompatible differences to arise between implementations.

> Which is also an argument for leaving the section headers alone
> for the vast majority of cases. However, for those cases that want
> smaller object files, it's not a big ask for them to spend a little more
> memory to access it. Unless the section headers start trending into
> hundreds of megabytes, it seems unimportant on machines made
> in the last 20 years.

Many developers and users work on resource-constrained systems or in build environments with strict memory and disk limits.
People shipping prebuilt archive files would benefit from smaller section header tables.
In large link jobs, memory pressure is real and input file sizes are a significant contributor.
Most ELF linkers maintain their own in-memory section representation, so decoding the on-disk compact representation directly to that in-memory form is both simpler and more memory-efficient than first decompressing to a temporary buffer.

> The cshdr approach doesn't seem simpler, and the option to
> squeeze it further seems unnecessary. Order of magnitude
> reductions can be useful in some situations, but additional
> rounding error improvements don't move the dial much.

My data shows that cshdr builds make .o files over 10% smaller (14.2% smaller with -fno-unique-section-names)—this is not a "rounding error improvement." More importantly, cshdr achieves these savings without requiring any compression library dependency, with lower memory overhead, and with the ability to access entries on-demand.

I understand the preference for stability and compatibility. That's why this proposal is opt-in and carefully designed to coexist with traditional section headers. Tools that don't want to support cshdr simply won't generate it, while tools that need the space savings can opt in.

Ali Bahrami

unread,

Nov 13, 2025, 1:03:23 AM11/13/25

to gener...@googlegroups.com

On 11/12/25 1:46 AM, Fangrui Song wrote:
> I respectfully disagree that this is a corner case. Section header
> tables occupy 17.6% of the total .o file size in my CREL llvm-project
> build, and 16.8% for a Chromium non-CREL build. The ratio consistently
> exceeded 10% for many applications I measured when I was at Google. With
> a rough estimate of 480 bytes per function in a Clang build, the 64-byte
> sizeof(Elf64_Shdr) overhead represents a significant fraction of each
> function's total cost.

I was speculating about linking in general, while you're talking about
a specific case. You experience it a lot, so it's not a corner case
to you.

> If we insist on the natural size and alignment guidelines for control
> structures described in the original ELF paper, then variable-length
> encoding is indeed not traditionally "ELF-like." However, I believe such
> space savings are essential to keeping ELF relevant in the decades
> ahead. Variable-length encoding is a natural fit for this problem,
> allowing efficient encoding without the complexity of general compression.

I think the natural size and direct access aspects are important.
Given that ELF was lightweight enough for machines of the 80's, I don't
see how it could be too heavy now, on machines that vastly more capable.
I think that a simple format has a better chance of remaining relevant
than one that has lots of specialized addons. Things change, and old
optimizations don't always age well.

General compression may be more complex by some absolute metric,
but the interface it presents to ELF is simpler, and far more
people are dedicated to its ongoing testing and maintenance.

> Regarding libelf: I'm not sure why a specific library implementation
> should drive an ELF specification proposal. There are multiple libelf
> implementations—https://src.illumos.org/source/xref/illumos-gate/usr/
> src/head/libelf.h this one is likely close to the Solaris version, while
> Linux generally uses the one from elfutils.
> Not all implementations support every ELF feature, nor should they be
> required to for a specification to be valid.
> > More importantly, many tools don't use libelf at all. llvm-readelf,
GNU
> readelf, and numerous custom tools parse ELF directly.
> For these tools, decoding a compact section header table is
> significantly easier than integrating a compression library and handling
> the decompression workflow.

libelf is conceptually organized around managing sections,
so I think any implementation of it would need the full
section header array. If that's untrue, I'm interested in
knowing the details.

libelf is the libc of ELF. Not using it is to reinvent a very large
wheel. Of course you're not required to use it, nor do I insist, but
if you do, it's a central place where things like this can be handled.
They can be solved in other ways too. It's really not complicated to
make any code use libz or libzstd. Odds are that some of that code
already does, if only to support SHF_COMPRESSED.

> > The cshdr approach doesn't seem simpler, and the option to
> > squeeze it further seems unnecessary. Order of magnitude
> > reductions can be useful in some situations, but additional
> > rounding error improvements don't move the dial much.
>
> My data shows that cshdr builds make .o files over 10% smaller (14.2%
> smaller with -fno-unique-section-names)—this is not a "rounding error
> improvement."

I was responding to this:

> Moreover, as a byte-oriented format, cshdr can be further
> compressed, even though the result would be slightly larger than
> compressing the section header table directly.

I thought you meant that cshdr represents the larger win, while the
"can be further compressed" part would be a smaller incremental
improvement, and so, maybe not worth worrying about. I apologize if I
got that wrong. 10% is not a rounding error. What about not beating
the results from direct compression though? It seems that direct
compression would give smallest files, and avoid complicating the
format as much.

I think we're starting to repeat. Others should weigh in if they have
opinions.

- Ali

James Y Knight

unread,

Nov 13, 2025, 10:20:44 AM11/13/25

to gener...@googlegroups.com

On Thu, Nov 13, 2025 at 1:03 AM Ali Bahrami <ali_e...@emvision.com> wrote:

I think we're starting to repeat. Others should weigh in if they have
opinions.

I don't really have a strong opinion on whether a general-purpose compression algorithm or a varint header encoding is better. I'd be inclined to go with a varint proposal myself, but I'll let folks with more experience figure that part out.

I do want to chime in and say that I think that there is a real need, to make _some_ improvement here, for the future health of the ELF ecosystem.

Objects built with -ffunction-sections -fdata-sections are very common -- this is an important use-case, definitely not an edge case. Of course, reducing the size of such object files would be greatly appreciated! Yet, while there is an overhead here, it's already deployed today, and current users are already used to this size. The cost is "baked in", in some sense.

So, ISTM the _more critical_ problem to be solved is the added size that's observed when outputting new kinds of metadata.

Currently, whenever someone wants to add some new metadata to an ELF file, this ends up triggering a debate on whether to emit the metadata into a single section in the output, or to emit multiple sections corresponding to each function section, and then section groups to associate the metadata with the data. The argument for using a single section is that the added overhead of multiple sections overwhelms the size of the data, and is simply too high a cost to pay. The argument for using multiple grouped sections is that this is the way ELF is designed to work, and doing it that way won't require any special-casing to handle section GC.

The strong incentive to use a single section seems unhealthy for the ELF ecosystem. And this argument has to be repeated every time some new proposal is made. So the outcome I'd like to see is that the entire debate can be mooted, by removing the excessive overhead for doing things the "ELF way", and thus removing the incentive to shove everything into a single section.

I'm a little worried that this current proposal may not actually be sufficient to moot this debate, since section-groups will still have substantial overhead. It may be useful to see some new size measurements done with this goal in mind. E.g., what is the growth in object file size that'd occur when adding a new (small) metadata section and an associated section group corresponding to each function-section in the output? Both with the current ELF format, and with the proposed format(s).

Michael Matz

unread,

Nov 13, 2025, 12:24:52 PM11/13/25

to gener...@googlegroups.com

Heyho,

On Wed, 12 Nov 2025, Ali Bahrami wrote:

> I think we're starting to repeat. Others should weigh in if they have
> opinions.

As much as I like pondering about specific encodings of stuff, and hence
have much sympathy for the shdr-as-varint-with-optional-fields proposal,
I'm with Ali on this. If we were designing a file format anew, sure.
But we aren't. Compatibility and composability matters. We have a fine
general mechanism for section compression, we should use it if we think we
need to make the section headers section smaller. It does have a couple
disadvantages over a new encoding, as well as advantages, all mentioned in
this thread.

To me the disadvantages of a new encoding even outweigh not doing anything
(i.e. simply continue to pay the size cost in .o files). But as we have
the alternative of using section compression the disadvantages of a new
encoding become even worse in comparison.

Ciao,
Michael.

Ali Bahrami

unread,

Nov 14, 2025, 1:10:00 AM11/14/25

to gener...@googlegroups.com

On 11/10/25 10:47, Ali Bahrami wrote:
> I would imagine that the layout of the section header in such a case
> would have to be something like:
>
> shdr[0]
> Compression header (Elf32_Chdr, or Elf64_Chdr)
> compressed bytes

Although the above was enough to get the general idea
across, it's not really right, and I've been thinking about
how to do it for real since sending it. I wrote what follows
as an exercise to convince myself that it can be done cleanly.
I'm not assuming that we'll do this, nor do I think it's a
finished design, but I'm posting it now as a better starting
point for that discussion than what was said earlier.

I'm going to be gone for a few days, and will be back to
reading email by Tuesday.

- Ali

-----

The general idea seems workable: provide shdr[0], and have a compression
header, followed by the compressed bytes. But Elf[32|64]_Chdr are intended
for section data, and don't capture the right information for the section
header table. Consider:

typedef struct {
Elf32_Word ch_type;
Elf32_Word ch_size;
Elf32_Word ch_addralign;
} Elf32_Chdr;

The definition of ch_type is fine, but the other 2 fields
pose problems:

ch_addralign: We don't need to record an alignment for
the section header table, as that is known apriori.

ch_size: A Chdr stores the decompressed size in ch_size, but in
this case, we need to do the reverse, and store the compressed
size. We could just give ch_size that meaning when used in this
context, but I think that might prove to be confusing.

This leads me to think that a different compression header, catering
to the shdr table might be worth defining for this purpose.

Another issue is how to indicate that shdr table compression is being
used. One idea is to define an EF_CSHSR bit in e_flags. However, e_flags
is reserved to the psabi, so the gABI has no bits available to assign.
As our gABI says:

e_flags

This member holds processor-specific flags associated with
the file. Flag names take the form EF_machine_flag.

Maybe we could carve out one non-machine flag, but that would be
a messy slow process since it needs to be agreed to by many different
platform ABI groups.

Another idea is to use one of the unallocated bytes in e_ident for flags.
Those bytes are precious though, and each can only provide 8 flags. It
might be best to look elsewhere, and save these for other uses.

My last thought feels a bit awkward, but I'll note that sh_flags of
shdr[0] is not currently used, and we have a history of spilling
information related to the overall object into it, treating it as
an extension of the ELF header. So should we say that setting
SHF_COMPRESSED for shdr[0].sh_flags indicates a compressed section
header table?

For the moment, I'm going to go with that idea.

-----
Let's see how those things might fit together.

A compression header for the shdr table might look something like:

typedef struct {
Elf32_Shdr sch_shdr0; /* placeholder shdr[0] */
Elf32_Word sch_type; /* compress type (ELFCOMPRESS_xxx) */
Elf32_Word sch_size; /* # compressed bytes following Schdr */
} Elf32_Schdr;

typedef struct {
Elf64_Shdr sch_shdr0;
Elf64_Word sch_type;
Elf64_Word sch_reserved;
Elf64_Xword sch_size;
} Elf64_Schdr;

The purpose of sch_shdr0 is to provide an shdr[0] for any naive
application that might be looking for extended index details.

An application that supports a compressed shdr table would always
start by looking at shdr[0].sh_flags for SHF_COMPRESSED. This might
be libelf, or it might be a program doing raw access. On seeing
SHF_COMPRESSED, they would:

- Allocate a buffer of (e_shentsize * e_shnum) bytes, or
(e_shentsize * shdr[0].sh_size) bytes when extended section
indexes are in use.

- Decompress the sch_size bytes of compressed data found at
offset e_shoff into this newly allocated buffer.

- Use the allocated buffer as the section header table, rather
that the data in the object referenced by e_shoff.

As always, the size of this recovered section header table is
(e_shentsize * e_shnum), as with a normal shdr table, because the meaning
of e_shentsize and e_shnum are not changed by the use of compression.

Note that this decompressed shdr table includes a copy of shdr[0],
along with the other headers. The contents of this resulting
shdr[0] should be identical to the contents of sch_shdr0, except
that the SHF_COMPRESSED flag could be removed. Alternatively, we
could leave SHF_COMPRESSED in both, as documentation when examining
dumped object details. This ensures that once the shdr table has
been recovered, it can be treated normally, with no special handling
required.

I think libelf should probably do this decompression automatically
and non-optionally, as it needs the table itself, and as the
compressed version is not generally useful. However, diagnostic/dumping
programs like elfdump, objdump, readelf, etc, may want to access the
Schdr to print its contents. Programs that do not use libelf can simply
do this by accessing the data at the offset given by e_shoff. For libelf,
We might want to add functions to libelf to make that information
available, something like:

Elf32_Schdr *elf32_getschdr(Elf *elf);
Elf64_Schdr *elf64_getschdr(Elf *elf);

GElf_Schdr *gelf_getschdr(Elf *elf);

To get libelf to create a compressed section header when writing
out an object, we need to provide a way to tell it to do that.
I haven't thought this through deeply yet, but I think we could
define an ELF_F_CSHDR flag to be used with elf_flagelf() to handle
it. I suspect that this will only be used when creating relocatable
objects, as only those objects suffer from large numbers of section
headers, but there is no reason not to support it for all objects.

Naturally, trying to use an object with a compressed shdr table
on a system with a toolchain that doesn't yet understand the concept
is going to fail with odd errors. As with other similar cases we've
discussed recently, I think the answer here is to have the config
scripts used to build compilers probe their link-editors for support,
and disable the option when built against a non-supporting link-editor.

Fangrui Song

unread,

Nov 14, 2025, 3:20:25 AM11/14/25

to Generic System V Application Binary Interface

Hi Ali,

Thank you for exploring how compressed section header tables would work as an alternative.

Your design for Elf32/64_Schdr is well-structured, and using a bit from the ELF header is thoughtful. However, I still believe cshdr offers significant practical advantages over compression.

Besides the drawbacks of library dependency and memory usage, general compression introduces additional complexity and failure modes.

General compression also brings more potential failure types: unrecognized compression type, bitstream corruption, insufficient output buffer, memory allocation failure, and checksum mismatch. These represent additional error paths that every implementation must handle.

Furthermore, the same section header table can be compressed into different bitstreams, making memcmp infeasible for determining identity. An assembler may start to generate different output simply because the system's zstd library has been upgraded.
This poses challenges for distributed compilers (this is also a minor problem of compressed debug information).

I also want to elaborate on a point from my previous reply:

> If a runtime library ever needs section headers for executable and shared object files, implementing the compact section header table format would be far easier and safer than integrating a general compression library.

While section header table size is primarily an issue for relocatable files, I don't want to prevent this feature from being used with executables.

A program may retrieve its own executable via argv[0], the /proc/self/exe symlink, or by parsing /proc/self/maps. It may implement self-profiling or provide better crash reporters (.symtab typically has far more entries than .dynsym). For these runtime use cases, general compression would introduce a significant and unwelcome dependency.

> I was responding to this:
>
> > Moreover, as a byte-oriented format, cshdr can be further
> > compressed, even though the result would be slightly larger than
> > compressing the section header table directly.
>
> I thought you meant that cshdr represents the larger win, while the
> "can be further compressed" part would be a smaller incremental
> improvement, and so, maybe not worth worrying about. I apologize if I
> got that wrong. 10% is not a rounding error. What about not beating
> the results from direct compression though? It seems that direct
> compression would give smallest files, and avoid complicating the
> format as much.

My point is that when filesystem-level compression is applied, cshdr achieves similar or smaller file sizes compared to directly compressed section header tables. The cshdr encoding is compression-friendly—it doesn't degrade compression effectiveness the way some custom encodings might.

---

On Thursday, November 13, 2025 at 7:20:44 AM UTC-8 James Y Knight wrote:
> I don't really have a strong opinion on whether a general-purpose compression algorithm or a varint header encoding is better. I'd be inclined to go with a varint proposal myself, but I'll let folks with more experience figure that part out.
>
> I do want to chime in and say that I think that there is a real need, to make _some_ improvement here, for the future health of the ELF ecosystem.
>
> Objects built with -ffunction-sections -fdata-sections are very common -- this is an important use-case, definitely not an edge case. Of course, reducing the size of such object files would be greatly appreciated! Yet, while there is an overhead here, it's already deployed today, and current users are already used to this size. The cost is "baked in", in some sense.
>
> So, ISTM the _more critical_ problem to be solved is the added size that's observed when outputting new kinds of metadata.
>
> Currently, whenever someone wants to add some new metadata to an ELF file, this ends up triggering a debate on whether to emit the metadata into a single section in the output, or to emit multiple sections corresponding to each function section, and then section groups to associate the metadata with the data. The argument for using a single section is that the added overhead of multiple sections overwhelms the size of the data, and is simply too high a cost to pay. The argument for using multiple grouped sections is that this is the way ELF is designed to work, and doing it that way won't require any special-casing to handle section GC.
>

> The strong incentive to use a single section seems unhealthy for the ELF ecosystem. And this argument has to be repeated every time some new proposal is made. So the outcome I'd like to see is that the entire debate can be mooted, by removing the excessive overhead for doing things the "ELF way", and thus removing the incentive to shove everything into a single section.

Agreed, this is pretty critical to ensure health of ELF linkers for the coming years. I revived this proposal in response to the recent SFrame stack walking format. Interested readers can find more details at https://maskray.me/blog/2025-09-28-remarks-on-sframe#object-file-format-design-issues under "Section group compliance and garbage collection issues".

This approach would create multiple SFrame sections within relocatable files, making the size optimization benefits of a simplified linking view format even more compelling. While this incurs the overhead of additional section headers (each Elf64_Shdr consumes 64 bytes), it's a cost we should pay to be good ELF citizens. This reinforces the value of my section header reduction proposal.

(BTW, as a digression: I'm unsure whether SFrame is heading in the right direction. I believe a metadata format should support both asynchronous stack walking and synchronous C++ exceptions.
https://maskray.me/blog/2020-11-08-stack-unwinding#compact-unwind-information We can modify .eh_frame for relocatable files instead of introducing a new foramt)

> I'm a little worried that this current proposal may not actually be sufficient to moot this debate, since section-groups will still have substantial overhead. It may be useful to see some new size measurements done with this goal in mind. E.g., what is the growth in object file size that'd occur when adding a new (small) metadata section and an associated section group corresponding to each function-section in the output? Both with the current ELF format, and with the proposed format(s).

Here are statistics about clang -ffunction-sections -fdata-sections -fprofile-generate="/tmp/out/s2-custom-ins-crel/profiles" -O3 -DNDEBUG builds

#!/bin/zsh
conf() { configure-llvm $@ -DCMAKE_{EXE,SHARED}_LINKER_FLAGS='-Wl,-z,pack-relative-relocs -fuse-ld=lld' -DCMAKE_CXX_ARCHIVE_CREATE="/tmp/Rel/bin/llvm-ar qc --thin <TARGET> <OBJECTS>" -DLLVM_BUILD_INSTRUMENTED=IR -DLLVM_ENABLE_PROJECTS='llvm;clang'; }

conf s2-custom-ins-none -DCMAKE_{C,CXX}_FLAGS='-fno-unique-section-names'
conf s2-custom-ins-crel -DCMAKE_{C,CXX}_FLAGS='-fno-unique-section-names --start-no-unused-arguments -Wa,--crel,--allow-experimental-crel --end-no-unused-arguments'
conf s2-custom-ins-cshdr -DCMAKE_{C,CXX}_FLAGS='-fno-unique-section-names --start-no-unused-arguments -Wa,--crel,--allow-experimental-crel,--cshdr --end-no-unused-arguments'
for i in none crel cshdr; do ninja -C /tmp/out/s2-custom-ins-$i llvm-mc; done

% tail -n 7 ins-none.txt ins-crel.txt ins-cshdr.txt
==> ins-none.txt <==
Files: 406
Total uncompressed: 9612992 bytes (18.2%)
Total compressed (zstd): 777055 bytes (1.8%)
Total compressed (xz): 634964 bytes (1.4%)
Total (compressed + 24): 786799 bytes (1.8%)
Total (compressed + 64 + 24): 812783 bytes (1.8%)
Total file size: 52771384 bytes

==> ins-crel.txt <==
Files: 406
Total uncompressed: 9612992 bytes (23.1%)
Total compressed (zstd): 766734 bytes (2.3%)
Total compressed (xz): 617904 bytes (1.9%)
Total (compressed + 24): 776478 bytes (2.4%)
Total (compressed + 64 + 24): 802462 bytes (2.4%)
Total file size: 41659608 bytes

==> ins-cshdr.txt <==
Files: 406
Total uncompressed: 1968614 bytes (5.8%)
Total compressed (zstd): 780714 bytes (2.4%)
Total compressed (xz): 668256 bytes (2.0%)
Total (compressed + 24): 790458 bytes (2.4%)
Total (compressed + 64 + 24): 816442 bytes (2.5%)
Total file size: 34015270 bytes

A typical relocatable file looks like the following.
If my calculations are correct, a .group section incurs an overhead of 11 bytes per entry: presence (1 byte) + offset (3 bytes) + size (1 byte) + ent_size (1 byte) + link (2 bytes) + info (2 bytes) + alignment (1 byte).
The per-section overhead could be reduced by introducing optimizations like Least Recently Used (LRU) caching or utilizing a DWARF abbreviation table. However, a section header table ratio of 5.8% appears to be acceptable for current implementation simplicity.

─% fllvm-readelf -W -S s2-custom-ins-crel/lib/MC/CMakeFiles/LLVMMC.dir/MCExpr.cpp.o
There are 215 section headers, starting at offset 0x71f0:

Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 0060b8 001137 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
[ 3] .text PROGBITS 0000000000000000 000040 000c8b 00 AX 0 0 16
[ 4] .crel.text CREL 0000000000000000 005620 00030c 01 I 214 3 1
[ 5] .rodata PROGBITS 0000000000000000 000ccc 0000b0 00 A 0 0 4
[ 6] .crel.rodata CREL 0000000000000000 00592c 00007f 01 I 214 5 1
[ 7] .group GROUP 0000000000000000 004394 00000c 04 214 117 4
[ 8] .text PROGBITS 0000000000000000 000d80 0000a3 00 AXG 0 0 16
[ 9] .crel.text CREL 0000000000000000 0059ab 000023 01 G 214 8 1
[10] .text PROGBITS 0000000000000000 000e30 0000aa 00 AX 0 0 16
...
[199] .group GROUP 0000000000000000 0045d8 000010 04 214 158 4
[200] __llvm_prf_cnts PROGBITS 0000000000000000 003f68 000010 00 WAG 0 0 8
[201] __llvm_prf_data PROGBITS 0000000000000000 003f78 000040 00 WAG 0 0 8
[202] .crel__llvm_prf_data CREL 0000000000000000 006059 00000a 01 G 214 201 1
[203] .group GROUP 0000000000000000 0045e8 000010 04 214 162 4
[204] __llvm_prf_cnts PROGBITS 0000000000000000 003fb8 000008 00 WAG 0 0 8
[205] __llvm_prf_data PROGBITS 0000000000000000 003fc0 000040 00 WAG 0 0 8
[206] .crel__llvm_prf_data CREL 0000000000000000 006063 00000a 01 G 214 205 1
[207] __llvm_prf_vnds PROGBITS 0000000000000000 004000 000108 00 WAR 0 0 8
[208] __llvm_prf_names PROGBITS 0000000000000000 004108 000233 00 AR 0 0 1
[209] .group GROUP 0000000000000000 0045f8 000008 04 214 171 4
[210] .rodata PROGBITS 0000000000000000 004340 000038 00 AG 0 0 16
[211] .comment PROGBITS 0000000000000000 004378 000019 01 MS 0 0 1
[212] .note.GNU-stack PROGBITS 0000000000000000 004391 000000 00 0 0 1
[213] .llvm_addrsig LLVM_ADDRSIG 0000000000000000 00606d 00004b 00 E 214 0 1
[214] .symtab SYMTAB 0000000000000000 004600 001020 18 1 103 8

Michael Matz

unread,

Nov 25, 2025, 9:15:20 AM11/25/25

to Generic System V Application Binary Interface

Hello,

On Fri, 14 Nov 2025, Fangrui Song wrote:

> Besides the drawbacks of library dependency and memory usage, general
> compression introduces additional complexity and failure modes.
>
> General compression also brings more potential failure types:
> unrecognized compression type, bitstream corruption, insufficient output
> buffer, memory allocation failure, and checksum mismatch. These
> represent additional error paths that every implementation must handle.

If you worry about bitstream corruption in general compression, you have
to worry about bitstream corruption on all levels of ELF files. At that
point you realize that the solution to that is _not_ disallowing
compression, but simply on a different level. I.e. it cannot be used as
argument for or against compression. You're right about error paths about
memory allocation, but that doesn't seem a very convincing counter
argument: those error paths will be deep within the generic "open ELF
file" thingy, which, to client code, already has methods to communicate
any errors, including memory allocation errors: It's very difficult to
imagine a design that does anything useful with ELF files for client code
that doesn't already allocate any memory at all and hence wouldn't need to
communicate any memalloc errors now, but would have to with generic
compression.

> Furthermore, the same section header table can be compressed into different
> bitstreams, making memcmp infeasible for determining identity. An assembler
> may start to generate different output simply because the system's zstd
> library has been upgraded.

It is correct that the same information can be encoded in different bit
streams for generic compression. But that is already the case right now
without compression, and also the case in your proposal. (a) e.g. the
section headers may be sorted differently, or the section header blob may
be placed somewhere else. (b) your proposal relies on
not emitting detected redundacies (or rather encoding them with a "still
the same" bit). Different implementations may detect different
redundancies, and it depends on the input order of section headers. You
can heal that by making your proposal very strict in what are acceptable
encodings. But that is usually a bad design for a compression scheme.

So, no counter argument either.

> > If a runtime library ever needs section headers for executable and shared
> object files, implementing the compact section header table format would be
> far easier and safer than integrating a general compression library.
>
> While section header table size is primarily an issue for relocatable files,
> I don't want to prevent this feature from being used with executables.
>
> A program may retrieve its own executable via argv[0], the /proc/self/exe
> symlink, or by parsing /proc/self/maps. It may implement self-profiling or
> provide better crash reporters (.symtab typically has far more entries than
> .dynsym). For these runtime use cases, general compression would introduce a
> significant and unwelcome dependency.

Depends on what infrastructure is used to open/inspect argv[0]. Anything
similar to an existing libelf based library will essentially handle that
dependency transparently (and in fact will already have that dependency
right now because of, compressed sections). In fact: anything that
wouldn't handle that transparently right now has to have its own
implementation of ELF parsing. These are _exactly_ the things that would
need amending for a custom-designed compression scheme without which they
wouldn't be able to open the new argv[0] at all. Both (new dependency vis
needing to amend random ELF parsers) are disadvantages, but in my book the
new dependency is the smaller one at this time.

(Essentially the ship about worrying for new dependencies in ELF parsers
has sailed when we added a generic section compression scheme)

Ciao,
Michael.

Reply all

Reply to author

Forward

0 new messages