PC-relative reloc that resolves to the next instruction when undefined

24 views
Skip to first unread message

Fangrui Song

unread,
May 31, 2026, 6:22:15 PM (2 days ago) May 31
to X86-64 System V Application Binary Interface, Rahman Lavaee, James Y Knight, Jan Beulich, hjl....@gmail.com
## Request

A new relocation, provisionally `R_X86_64_PCNEXT32` (`word32`), self-contained:

| symbol | calculation |
|--------|-------------|
| defined | `S + A - P` (identical to `R_X86_64_PC32`) |
| undefined | `0` (no diagnostic) |

A `0` PC-relative displacement targets the instruction immediately
following the relocated field, so an undefined
reference degrades to a harmless, position-independent reference to
the next instruction — a valid instruction boundary.

R_X86_64_PC32 against an undefined symbol (weak or -z undefs) resolves
to VA 0, but a single rule can't serve both:

- Data (`.long sym - . + A`): value A is sensible
- `prefetchit1 sym(%rip)`: value should be 0 (next instruction), but
the place gives A=-4 -> mid-current-instruction (a potential security
vulnerability)

## Motivation

LLVM's Code-prefetch insertion (llvm/llvm-project
[#166324](https://github.com/llvm/llvm-project/pull/166324)) inserts
`prefetchit1 target(%rip)` ahead of an anticipated I-cache miss.

```
prefetchit1 __llvm_prefetch_target_<fn>_<bbid>_<cs>(%rip)
```

ahead of an anticipated front-end miss. The target is an address
*inside* a callee, usually in another object file, so the compiler
synthesizes a symbol at that address and the prefetch is a PC-relative
(`R_X86_64_PC32`) reference the linker resolves.

The referenced target symbol may be **undefined in the final build**:
the profile's `<function, bb_id, callsite_index>` triple can fail to
match the final image (function renamed, block structure changed,
callsite count differs), or the target may simply not be linked in.
To avoid an undefined-symbol error and degrade gracefully, the
referencing module emits a **weak fallback definition** immediately
after the prefetch:

```
prefetchit1 __llvm_prefetch_target_foo_x_y(%rip)
.weak __llvm_prefetch_target_foo_x_y
__llvm_prefetch_target_foo_x_y:
```

Standard ELF resolution then does everything with no linker change: a
real `STB_GLOBAL` target overrides the weak fallback; if the target is
absent, the fallback is used and the prefetch resolves to the next
instruction (harmless — code prefetches are non-blocking hints).

However, this approach doesn't allow the prefetch target to be a weak
definition - since the linker may resolve the relocation against the
undesired weak fallback instead of the desired weak definition
(probably in a different translation unit).

## Alternatives considered

- Reuse `R_X86_64_PC32` — rejected: changing it would alter the data
contract and make branch resolution unsafe
(pointing to mid-instruction).
- PATCHINST-dual (NOP-patch). Modeled on `R_AARCH64_PATCHINST`: when
the symbol is undefined, *overwrite the instruction
with a NOP* (the dual of PATCHINST, which patches when its symbol is
defined). This is more general — "deactivate any
instruction when its symbol is undefined" — and removes the
instruction entirely rather than leaving a harmless
prefetch. It is heavier on x86, though: variable-length encoding
means the patch span must be conveyed (e.g.
instruction length in the addend) and it pairs with a companion
`PC32`. For the prefetch alone the runtime difference
is nil (a `prefetchit1` of the next instruction is a
microarchitectural no-op, and the `0F 18 /6,/7` encoding is
already a NOP on CPUs without `PREFETCHI`), so the displacement form
above is preferred; the NOP-patch is the better
choice only if a general deactivation primitive is wanted.
- `st_other` bit / linker-private rule — wrong field
https://github.com/llvm/llvm-project/issues/200246#issuecomment-4588004989
Reply all
Reply to author
Forward
0 new messages