[RFC][PATCH] APX: Add R_X86_64_REX2_GOTPCRELX

49 views
Skip to first unread message

H.J. Lu

unread,
Jul 25, 2023, 1:14:05 PM7/25/23
to x86-6...@googlegroups.com
Intel Advanced Performance Extensions:

https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

adds the REX2 prefix for the additional general-purpose registers, r16-r31,
in

mov name@GOTPCREL(%rip), %reg
test %reg, name@GOTPCREL(%rip)
binop name@GOTPCREL(%rip), %reg

where binop is one of adc, add, add, cmp, or, sbb, sub, xor instructions.
Add

# define R_X86_64_REX2_GOTPCRELX 43

if the REX2 prefix is present. It is similar to R_X86_64_GOTPCRELX.
Linker can treat R_X86_64_REX2_GOTPCRELX as R_X86_64_GOTPCREL or convert
the above instructions to

lea name(%rip), %reg
mov $name, %reg
test $name, %reg
binop $name, %reg

when possible.
---
x86-64-ABI/linker-optimization.tex | 5 +++--
x86-64-ABI/object-files.tex | 6 ++++--
2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/x86-64-ABI/linker-optimization.tex b/x86-64-ABI/linker-optimization.tex
index 246350c..d681829 100644
--- a/x86-64-ABI/linker-optimization.tex
+++ b/x86-64-ABI/linker-optimization.tex
@@ -66,8 +66,9 @@ into an infinite loop at run-time.
\label{opt_gotpcrelx}

The \xARCH instruction encoding supports converting certain instructions
-on memory operand with \texttt{R_X86_64_GOTPCRELX} or
-\texttt{R_X86_64_REX_GOTPCRELX} relocations against symbol, \texttt{foo},
+on memory operand with \texttt{R_X86_64_GOTPCRELX},
+\texttt{R_X86_64_REX_GOTPCRELX} or \texttt{R_X86_64_REX2_GOTPCRELX}
+relocations against symbol, \texttt{foo},
into a different form on immediate operand if \texttt{foo} is defined
locally and the relocation addend is -4.

diff --git a/x86-64-ABI/object-files.tex b/x86-64-ABI/object-files.tex
index 7f20c0c..5a5cad6 100644
--- a/x86-64-ABI/object-files.tex
+++ b/x86-64-ABI/object-files.tex
@@ -486,6 +486,7 @@ or \texttt{Elf32_Rel} relocation entries.
\texttt{Deprecated} & 40 & & \\
\texttt{R_X86_64_GOTPCRELX} & 41 & \textit{word32} & \texttt{G + GOT + A - P} \\
\texttt{R_X86_64_REX_GOTPCRELX} & 42 & \textit{word32} & \texttt{G + GOT + A - P} \\
+ \texttt{R_X86_64_REX2_GOTPCRELX} & 43 & \textit{word32} & \texttt{G + GOT + A - P} \\
\cline{1-4}
\multicolumn{4}{l}{\small $^\dagger$ This relocation is used only for LP64.}\\
\multicolumn{4}{l}{\small $^{\dagger\dagger}$ This relocation only
@@ -538,8 +539,9 @@ instructions:
where \code{binop} is one of \code{adc}, \code{add}, \code{and},
\code{cmp}, \code{or}, \code{sbb}, \code{sub}, \code{xor}
instructions, the \texttt{R_X86_64_GOTPCRELX} relocation,
-or the \texttt{R_X86_64_REX_GOTPCRELX} relocation if the
-\code{REX} prefix is present, should be generated,
+the \texttt{R_X86_64_REX_GOTPCRELX} relocation if the
+\code{REX} prefix is present, or the \texttt{R_X86_64_REX2_GOTPCRELX}
+relocation if the \code{REX2} prefix is present should be generated,
instead of the \texttt{R_X86_64_GOTPCREL} relocation. See also
section~\ref{opt_gotpcrelx}.
\end{sloppypar}
--
2.41.0

Florian Weimer

unread,
Jul 25, 2023, 3:41:20 PM7/25/23
to H.J. Lu, x86-6...@googlegroups.com
* H. J. Lu:

> Intel Advanced Performance Extensions:
>
> https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
>
> adds the REX2 prefix for the additional general-purpose registers, r16-r31,
> in
>
> mov name@GOTPCREL(%rip), %reg
> test %reg, name@GOTPCREL(%rip)
> binop name@GOTPCREL(%rip), %reg
>
> where binop is one of adc, add, add, cmp, or, sbb, sub, xor instructions.
> Add
>
> # define R_X86_64_REX2_GOTPCRELX 43
>
> if the REX2 prefix is present. It is similar to R_X86_64_GOTPCRELX.
> Linker can treat R_X86_64_REX2_GOTPCRELX as R_X86_64_GOTPCREL or convert
> the above instructions to
>
> lea name(%rip), %reg
> mov $name, %reg
> test $name, %reg
> binop $name, %reg
>
> when possible.

You seem to have four relaxations and three original instructions in the
commit message?

Do the APX three-operand instructions support RIP-relative addressing
and would benefit from relaxation, too?

Thanks,
Florian

H.J. Lu

unread,
Jul 25, 2023, 4:02:55 PM7/25/23
to Florian Weimer, x86-6...@googlegroups.com
On Tue, Jul 25, 2023 at 12:41 PM Florian Weimer <fwe...@redhat.com> wrote:
>
> * H. J. Lu:
>
> > Intel Advanced Performance Extensions:
> >
> > https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
> >
> > adds the REX2 prefix for the additional general-purpose registers, r16-r31,
> > in
> >
> > mov name@GOTPCREL(%rip), %reg
> > test %reg, name@GOTPCREL(%rip)
> > binop name@GOTPCREL(%rip), %reg
> >
> > where binop is one of adc, add, add, cmp, or, sbb, sub, xor instructions.
> > Add
> >
> > # define R_X86_64_REX2_GOTPCRELX 43
> >
> > if the REX2 prefix is present. It is similar to R_X86_64_GOTPCRELX.
> > Linker can treat R_X86_64_REX2_GOTPCRELX as R_X86_64_GOTPCREL or convert
> > the above instructions to
> >
> > lea name(%rip), %reg
> > mov $name, %reg
> > test $name, %reg
> > binop $name, %reg
> >
> > when possible.
>
> You seem to have four relaxations and three original instructions in the
> commit message?

Correct.

mov name@GOTPCREL(%rip), %reg

may be converted to

lea name(%rip), %reg

or

mov $name, %reg

> Do the APX three-operand instructions support RIP-relative addressing
> and would benefit from relaxation, too?
>

It may be possible. But I haven't looked into it yet.

Thanks.

--
H.J.

Michael Matz

unread,
Jul 26, 2023, 10:29:31 AM7/26/23
to H.J. Lu, x86-6...@googlegroups.com
Hello,

On Tue, 25 Jul 2023, H.J. Lu wrote:

> Intel Advanced Performance Extensions:
>
> https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
>
> adds the REX2 prefix for the additional general-purpose registers, r16-r31,
> in
>
> mov name@GOTPCREL(%rip), %reg
> test %reg, name@GOTPCREL(%rip)
> binop name@GOTPCREL(%rip), %reg
>
> where binop is one of adc, add, add, cmp, or, sbb, sub, xor instructions.
> Add
>
> # define R_X86_64_REX2_GOTPCRELX 43
>
> if the REX2 prefix is present.

Hmm hmm. There's a proliferation of relocations that effectively only
differ in how many bytes to look back from the relocated place to
precisely determine which instruction we're dealing with. REX_GOTPCRELX
looks back one byte from mainopcode (or three if mainopcode+modrm are
counted), GOTPCRELX zero (or two), and now REX2_GOTPCRELX looks back two
bytes (or four, from relocated place). Well, it doesn't need to actually
look back two bytes to find the REX2-introducer (0xD5), but conceptually
that's the start of the full opcode.

I wonder if we finally should submit and introduce a relocation that
points _to_ the first interesting opcode byte (REX prefix, REX2 prefix,
(E)VEX prefix, whatever other future prefix is invented) and instead
determine the relocated place by interpreting the byte(s) found there.

For instance, the three-operand (EVEX) forms that Florian also mentioned:
in the current scheme they probably will need still another relocation to
be relaxable.

This is more a food for thought. But now would be the right time to go
that way if we wanted to, instead of adding more and more variants of
existing relocations.

Otherwise I don't have comments, it seems a natural enough extension.
(also for the first patch, introducing r16-r31).


Ciao,
Michael.

Jan Beulich

unread,
Jul 26, 2023, 10:48:40 AM7/26/23
to Michael Matz, x86-6...@googlegroups.com, H.J. Lu
But this is fragile. Consider such a reloc was introduced before we
learned of REX2. The code trying to find the to-be-relocated place
would simply need to give up when finding one, yet relocations (once
known) should always work on both legacy and new code. (Granted we're
talking of an optimization here, so not doing anything for such a
relocation is an option, but not a very nice one.)

That said, in principle I of course favor a single more general-
purpose reloc over a bunch of special purpose ones.

Jan

Michael Matz

unread,
Jul 26, 2023, 11:18:47 AM7/26/23
to Jan Beulich, x86-6...@googlegroups.com, H.J. Lu
Hello,

On Wed, 26 Jul 2023, Jan Beulich wrote:

> > Hmm hmm. There's a proliferation of relocations that effectively only
> > differ in how many bytes to look back from the relocated place to
> > precisely determine which instruction we're dealing with. REX_GOTPCRELX
> > looks back one byte from mainopcode (or three if mainopcode+modrm are
> > counted), GOTPCRELX zero (or two), and now REX2_GOTPCRELX looks back two
> > bytes (or four, from relocated place). Well, it doesn't need to actually
> > look back two bytes to find the REX2-introducer (0xD5), but conceptually
> > that's the start of the full opcode.
> >
> > I wonder if we finally should submit and introduce a relocation that
> > points _to_ the first interesting opcode byte (REX prefix, REX2 prefix,
> > (E)VEX prefix, whatever other future prefix is invented) and instead
> > determine the relocated place by interpreting the byte(s) found there.
>
> But this is fragile. Consider such a reloc was introduced before we
> learned of REX2. The code trying to find the to-be-relocated place
> would simply need to give up when finding one, yet relocations (once
> known)

That parenthetical remark is the important point: it seems there's no
difference in old tooling behaviour between not yet knowing a new
relocation type (and erroring out) and not knowing what to do with the
found opcode combination (and erroring out).

The problem you're alluding to might be one when as-of-yet unknown
prefixes would be mis-recognized by old tools as random opcodes and the
to-be-relocated place be mis-calculated. But in all the cases we have
until now the bytes at the places (4x, c4, c5, 62, now d5) were invalid
opcodes, and I'm assuming any future prefixes to follow the same scheme,
otherwise the proposed catch-all reloc simply wouldn't be used.

We could for instance document that the new reloc would error out on bytes
so-and-so (and amend that list from time to time, as prefixes are
introduced).

If I'm not missing anything at least :-)

> should always work on both legacy and new code. (Granted we're
> talking of an optimization here, so not doing anything for such a
> relocation is an option, but not a very nice one.)
>
> That said, in principle I of course favor a single more general-
> purpose reloc over a bunch of special purpose ones.

Yeah. OTOH, I also see that it might sound much easier to just introduce
20 new relocations that merely differ in lookback offset :)


Ciao,
Michael.

Jan Beulich

unread,
Jul 26, 2023, 11:56:38 AM7/26/23
to Michael Matz, x86-6...@googlegroups.com, H.J. Lu
On 26.07.2023 17:18, Michael Matz wrote:
> Hello,
>
> On Wed, 26 Jul 2023, Jan Beulich wrote:
>
>>> Hmm hmm. There's a proliferation of relocations that effectively only
>>> differ in how many bytes to look back from the relocated place to
>>> precisely determine which instruction we're dealing with. REX_GOTPCRELX
>>> looks back one byte from mainopcode (or three if mainopcode+modrm are
>>> counted), GOTPCRELX zero (or two), and now REX2_GOTPCRELX looks back two
>>> bytes (or four, from relocated place). Well, it doesn't need to actually
>>> look back two bytes to find the REX2-introducer (0xD5), but conceptually
>>> that's the start of the full opcode.
>>>
>>> I wonder if we finally should submit and introduce a relocation that
>>> points _to_ the first interesting opcode byte (REX prefix, REX2 prefix,
>>> (E)VEX prefix, whatever other future prefix is invented) and instead
>>> determine the relocated place by interpreting the byte(s) found there.
>>
>> But this is fragile. Consider such a reloc was introduced before we
>> learned of REX2. The code trying to find the to-be-relocated place
>> would simply need to give up when finding one, yet relocations (once
>> known)
>
> That parenthetical remark is the important point: it seems there's no
> difference in old tooling behaviour between not yet knowing a new
> relocation type (and erroring out) and not knowing what to do with the
> found opcode combination (and erroring out).

Hmm, my thinking has been somewhat different so far: For a new reloc
type, updates to static and dynamic linkers are of course necessary.
But once they know of a relocation type, and use of that type ought
to work, no matter how it's used (I know this already isn't the case
for sibling relocations to the one we're discussing here, but in a
way I view those a quirky anyway). In fact I thought you'd call out
the other parenthesized remark further down as the important one.

> The problem you're alluding to might be one when as-of-yet unknown
> prefixes would be mis-recognized by old tools as random opcodes and the
> to-be-relocated place be mis-calculated. But in all the cases we have
> until now the bytes at the places (4x, c4, c5, 62, now d5) were invalid
> opcodes, and I'm assuming any future prefixes to follow the same scheme,
> otherwise the proposed catch-all reloc simply wouldn't be used.
>
> We could for instance document that the new reloc would error out on bytes
> so-and-so (and amend that list from time to time, as prefixes are
> introduced).
>
> If I'm not missing anything at least :-)
>
>> should always work on both legacy and new code. (Granted we're
>> talking of an optimization here, so not doing anything for such a
>> relocation is an option, but not a very nice one.)
>>
>> That said, in principle I of course favor a single more general-
>> purpose reloc over a bunch of special purpose ones.
>
> Yeah. OTOH, I also see that it might sound much easier to just introduce
> 20 new relocations that merely differ in lookback offset :)

Right, that's the quick-and-cheap approach.

Since we're dealing with only rela relocations, could we perhaps use
the relocated field to hold the offset to the first prefix byte (or
the major opcode, in case that's all we need to know, albeit I doubt
that would suffice)?

Jan

H.J. Lu

unread,
Jul 28, 2023, 12:02:10 PM7/28/23
to Jan Beulich, Michael Matz, x86-6...@googlegroups.com
And the new relocation is consistent with how ELF relocation is
handled.

> Since we're dealing with only rela relocations, could we perhaps use
> the relocated field to hold the offset to the first prefix byte (or
> the major opcode, in case that's all we need to know, albeit I doubt
> that would suffice)?
>

RELA relocation ignores the value at the relocation offset.

--
H.J.

Jan Beulich

unread,
Jul 31, 2023, 2:09:53 AM7/31/23
to H.J. Lu, Michael Matz, x86-6...@googlegroups.com
Well, I know of course, but my understanding is that the psABI could
assign meaning.

Jan

H.J. Lu

unread,
Aug 4, 2023, 2:00:43 PM8/4/23
to Jan Beulich, Michael Matz, x86-6...@googlegroups.com
How about adding a new relocation, R_X86_64_OPCODE_GOTPCRELX,
and storing 0xd5 in the first byte at the relocation offset to indicate that
instruction has the REX2 prefix?

--
H.J.

Jan Beulich

unread,
Aug 7, 2023, 3:14:09 AM8/7/23
to H.J. Lu, Michael Matz, x86-6...@googlegroups.com
How would this information be used then? In the answer a hypothetical
further prefix with yet different distance to the to-be-relocated
field would also want considering. Especially with the possibility of
SIB being involved in addressing (and, at least in an abstract manner,
vSIB), I don't see how encoding anything other than the distance is
going to be useful.

Jan

Michael Matz

unread,
Aug 7, 2023, 8:20:31 AM8/7/23
to Jan Beulich, H.J. Lu, x86-6...@googlegroups.com
Hello,

On Mon, 7 Aug 2023, Jan Beulich wrote:

> >>> On Wed, Jul 26, 2023 at 8:56 AM Jan Beulich <jbeu...@suse.com> wrote:
> >>>> Since we're dealing with only rela relocations, could we perhaps use
> >>>> the relocated field to hold the offset to the first prefix byte (or
> >>>> the major opcode, in case that's all we need to know, albeit I doubt
> >>>> that would suffice)?
> >>>
> >>> RELA relocation ignores the value at the relocation offset.
> >>
> >> Well, I know of course, but my understanding is that the psABI could
> >> assign meaning.
> >>
> >
> > How about adding a new relocation, R_X86_64_OPCODE_GOTPCRELX,
> > and storing 0xd5 in the first byte at the relocation offset to indicate that
> > instruction has the REX2 prefix?
>
> How would this information be used then? In the answer a hypothetical
> further prefix with yet different distance to the to-be-relocated
> field would also want considering. Especially with the possibility of
> SIB being involved in addressing (and, at least in an abstract manner,
> vSIB), I don't see how encoding anything other than the distance is
> going to be useful.

Yes, if anything we want distances. I would perhaps also be okay to
encode the distance in the relocation type itself (and name them
appropriately then), not in the addend. This is effectively done already,
just named as if all of that were special cases. So, like:
R_X86_64_CODE_GOTPCREL_2
R_X86_64_CODE_GOTPCREL_3
and so on. The existing R_X86_64_REX_GOTPCRELX would be an alias for
one of these (or the new one an alias for it).

Not as extensible, but OTOH new distances will not be added often (and
there's a limit at 11 :) ) and we wouldn't need the content of the
relocated place.


Ciao,
Michael.

Florian Weimer

unread,
Aug 7, 2023, 8:46:21 AM8/7/23
to Michael Matz, Jan Beulich, H.J. Lu, x86-6...@googlegroups.com
* Michael Matz:

>> How would this information be used then? In the answer a hypothetical
>> further prefix with yet different distance to the to-be-relocated
>> field would also want considering. Especially with the possibility of
>> SIB being involved in addressing (and, at least in an abstract manner,
>> vSIB), I don't see how encoding anything other than the distance is
>> going to be useful.
>
> Yes, if anything we want distances. I would perhaps also be okay to
> encode the distance in the relocation type itself (and name them
> appropriately then), not in the addend. This is effectively done already,
> just named as if all of that were special cases. So, like:
> R_X86_64_CODE_GOTPCREL_2
> R_X86_64_CODE_GOTPCREL_3
> and so on. The existing R_X86_64_REX_GOTPCRELX would be an alias for
> one of these (or the new one an alias for it).
>
> Not as extensible, but OTOH new distances will not be added often (and
> there's a limit at 11 :) ) and we wouldn't need the content of the
> relocated place.

And if the instruction sequence is not recognized, a link editor can
just stupidly apply the relocation at the indicated offset and be done
with it?

That sounds nice to me.

Thanks,
Florian

Michael Matz

unread,
Aug 7, 2023, 9:01:51 AM8/7/23
to Florian Weimer, Jan Beulich, H.J. Lu, x86-6...@googlegroups.com
Hello,

On Mon, 7 Aug 2023, Florian Weimer wrote:

> > Yes, if anything we want distances. I would perhaps also be okay to
> > encode the distance in the relocation type itself (and name them
> > appropriately then), not in the addend. This is effectively done already,
> > just named as if all of that were special cases. So, like:
> > R_X86_64_CODE_GOTPCREL_2
> > R_X86_64_CODE_GOTPCREL_3
> > and so on. The existing R_X86_64_REX_GOTPCRELX would be an alias for
> > one of these (or the new one an alias for it).
> >
> > Not as extensible, but OTOH new distances will not be added often (and
> > there's a limit at 11 :) ) and we wouldn't need the content of the
> > relocated place.
>
> And if the instruction sequence is not recognized, a link editor can
> just stupidly apply the relocation at the indicated offset and be done
> with it?

Yes. (Note that the same could also be done with the proposed catch-all
reloc which encodes a distance in the relocated field).

We could even specify these relocs up to distance 11 (or a saner lower
value, like 4) right now and be done with them.

> That sounds nice to me.


Ciao,
Michael.

H.J. Lu

unread,
Aug 10, 2023, 12:56:05 PM8/10/23
to Michael Matz, Florian Weimer, Jan Beulich, x86-6...@googlegroups.com
How about adding R_X86_64_CODE_4_GOTPCREL which applies to
instructions starting 4 bytes before the relocation offset?


--
H.J.

Michael Matz

unread,
Aug 15, 2023, 6:17:29 AM8/15/23
to H.J. Lu, Florian Weimer, Jan Beulich, x86-6...@googlegroups.com
Hello,

On Thu, 10 Aug 2023, H.J. Lu wrote:

> > > > encode the distance in the relocation type itself (and name them
> > > > appropriately then), not in the addend. This is effectively done already,
> > > > just named as if all of that were special cases. So, like:
> > > > R_X86_64_CODE_GOTPCREL_2
> > > > R_X86_64_CODE_GOTPCREL_3
> > > > and so on. The existing R_X86_64_REX_GOTPCRELX would be an alias for
> > > > one of these (or the new one an alias for it).
> > > >
> > > > Not as extensible, but OTOH new distances will not be added often (and
> > > > there's a limit at 11 :) ) and we wouldn't need the content of the
> > > > relocated place.
> > >
> > > And if the instruction sequence is not recognized, a link editor can
> > > just stupidly apply the relocation at the indicated offset and be done
> > > with it?
> >
> > Yes. (Note that the same could also be done with the proposed catch-all
> > reloc which encodes a distance in the relocated field).
> >
> > We could even specify these relocs up to distance 11 (or a saner lower
> > value, like 4) right now and be done with them.
> >
> > > That sounds nice to me.
> >
>
> How about adding R_X86_64_CODE_4_GOTPCREL which applies to
> instructions starting 4 bytes before the relocation offset?

In line with what I suggested above, so fine by me :-)


Ciao,
Michael.
Reply all
Reply to author
Forward
0 new messages