RFC: Require REX prefix to encode R_X86_64_GOTTPOFF

31 views
Skip to first unread message

H.J. Lu

unread,
Jan 19, 2020, 2:46:39 PM1/19/20
to x86-64-abi
LEA instruction with R_X86_64_GOTPC32_TLSDESC relocation must be encoded
with REX prefix even if it isn't required by destination register. If
the LEA encoding has a variable length, linker can't tell where it starts
and can't safely perform GDesc -> IE/LE optimization. See;

https://sourceware.org/bugzilla/show_bug.cgi?id=25416
https://gitlab.com/x86-psABIs/x86-64-ABI/merge_requests/4

--
H.J.

Jan Beulich

unread,
Jan 20, 2020, 4:33:08 AM1/20/20
to H.J. Lu, x86-64-abi
I've looked at both, but the underlying issue (nor how adding a
REX prefix would address it) hasn't become clear to me. This may
largely be due to the terminology used: What exactly does
"variable length" here mean? The relocation type implies a
32-bit displacement, so the variable part - afaict - is whether
there's a SIB byte. How would adding a REX prefix make the
situation any better? Same goes for "linker can't tell where it
starts", whether or not that's related to the "variable length"
statement.

Jan

H.J. Lu

unread,
Jan 20, 2020, 8:11:20 AM1/20/20
to Jan Beulich, x86-64-abi
Here is an example:

0: 8d 05 00 00 00 00 lea 0x0(%rip),%eax # 0x6 2:
R_X86_64_GOTPC32_TLSDESC foo-0x4
6: 44 8d 1d 00 00 00 00 lea 0x0(%rip),%r11d # 0xd 9:
R_X86_64_GOTPC32_TLSDESC foo-0x4

When linker performs GDesc -> IE/LE optimization, it rewrites LEA to MOV
with the relocation offset of R_X86_64_GOTPC32_TLSDESC relocation,
assuming that "r_offset - 3" is the start of LEA. If LEA has a variable
encoding length, "r_offset - 3" may be the last byte of the previous
instruction.
The fixed assembler generates a dummy REX byte if needed:

0: 40 8d 05 00 00 00 00 rex lea 0x0(%rip),%eax # 0x7 3:
R_X86_64_GOTPC32_TLSDESC foo-0x4
7: 44 8d 1d 00 00 00 00 lea 0x0(%rip),%r11d # 0xe a:
R_X86_64_GOTPC32_TLSDESC foo-0x4

so that "r_offset - 3" is always the start of LEA.

--
H.J.

Jan Beulich

unread,
Jan 20, 2020, 9:20:04 AM1/20/20
to H.J. Lu, x86-64-abi
I see, thanks. There are a lot of assumptions on what the
producer of the assembly code may (not) do here, though.

Jan

Cary Coutant

unread,
Jan 20, 2020, 4:28:59 PM1/20/20
to H.J. Lu, Jan Beulich, x86-64-abi
> When linker performs GDesc -> IE/LE optimization, it rewrites LEA to MOV
> with the relocation offset of R_X86_64_GOTPC32_TLSDESC relocation,
> assuming that "r_offset - 3" is the start of LEA. If LEA has a variable
> encoding length, "r_offset - 3" may be the last byte of the previous
> instruction.
> The fixed assembler generates a dummy REX byte if needed:
>
> 0: 40 8d 05 00 00 00 00 rex lea 0x0(%rip),%eax # 0x7 3:
> R_X86_64_GOTPC32_TLSDESC foo-0x4
> 7: 44 8d 1d 00 00 00 00 lea 0x0(%rip),%r11d # 0xe a:
> R_X86_64_GOTPC32_TLSDESC foo-0x4
>
> so that "r_offset - 3" is always the start of LEA.

It sounds like there are legacy objects out there that use the TLSDESC
relocation with the shorter form of the instruction. If that's the
case, I think you're going to need a new relocation to enable this
link-time transformation.

-cary

H.J. Lu

unread,
Jan 20, 2020, 4:43:22 PM1/20/20
to Cary Coutant, Jan Beulich, x86-64-abi
-mtls-dialect=gnu2 never worked for x32:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93319
https://sourceware.org/bugzilla/show_bug.cgi?id=25416

I just fixed it for GCC 10 and binutils 2.35. It isn't a problem.

BTW, we also need to fix gold and update

commit 63887f3df5f9d17a88da98bdd2a761f830d61191
Author: H.J. Lu <hjl....@gmail.com>
Date: Mon Jan 30 21:13:30 2012 +0000

Check if -fpic -mtls-dialect=gnu2 works

2012-01-30 H.J. Lu <hongj...@intel.com>

* configure.ac: Check if -fpic -mtls-dialect=gnu2 works.
* configure: Regenerated.


--
H.J.

Cary Coutant

unread,
Jan 20, 2020, 5:17:13 PM1/20/20
to H.J. Lu, Jan Beulich, x86-64-abi
> > It sounds like there are legacy objects out there that use the TLSDESC
> > relocation with the shorter form of the instruction. If that's the
> > case, I think you're going to need a new relocation to enable this
> > link-time transformation.
>
> -mtls-dialect=gnu2 never worked for x32:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93319
> https://sourceware.org/bugzilla/show_bug.cgi?id=25416
>
> I just fixed it for GCC 10 and binutils 2.35. It isn't a problem.

OK, thanks for the explanation.

-cary

H.J. Lu

unread,
Jan 20, 2020, 11:11:14 PM1/20/20
to Cary Coutant, Jan Beulich, x86-64-abi

Michael Matz

unread,
Jan 27, 2020, 1:03:12 PM1/27/20
to H.J. Lu, Jan Beulich, x86-64-abi
Hello,

On Mon, 20 Jan 2020, H.J. Lu wrote:

> > > https://sourceware.org/bugzilla/show_bug.cgi?id=25416
> > > https://gitlab.com/x86-psABIs/x86-64-ABI/merge_requests/4
> >
> > I've looked at both, but the underlying issue (nor how adding a
> > REX prefix would address it) hasn't become clear to me. This may
> > largely be due to the terminology used: What exactly does
> > "variable length" here mean? The relocation type implies a
> > 32-bit displacement, so the variable part - afaict - is whether
> > there's a SIB byte. How would adding a REX prefix make the
> > situation any better? Same goes for "linker can't tell where it
> > starts", whether or not that's related to the "variable length"
> > statement.
> >
>
> Here is an example:
>
> 0: 8d 05 00 00 00 00 lea 0x0(%rip),%eax # 0x6 2:
> R_X86_64_GOTPC32_TLSDESC foo-0x4
> 6: 44 8d 1d 00 00 00 00 lea 0x0(%rip),%r11d # 0xd 9:
> R_X86_64_GOTPC32_TLSDESC foo-0x4
>
> When linker performs GDesc -> IE/LE optimization, it rewrites LEA to MOV
> with the relocation offset of R_X86_64_GOTPC32_TLSDESC relocation,
> assuming that "r_offset - 3" is the start of LEA. If LEA has a variable
> encoding length, "r_offset - 3" may be the last byte of the previous
> instruction.

I still don't see it. The opcode of lea is at 'r_offset - 2', not at
r_offset-3, no matter if it's prefixed or not. To change it into a mov
you replace that byte with 0x8b, the prefix (if it was there) will remain
correct.

So it rather seems you try to work-around a simple bug in the relaxation
code?

> The fixed assembler generates a dummy REX byte if needed:
>
> 0: 40 8d 05 00 00 00 00 rex lea 0x0(%rip),%eax # 0x7 3:
> R_X86_64_GOTPC32_TLSDESC foo-0x4
> 7: 44 8d 1d 00 00 00 00 lea 0x0(%rip),%r11d # 0xe a:
> R_X86_64_GOTPC32_TLSDESC foo-0x4
>
> so that "r_offset - 3" is always the start of LEA.


Ciao,
Michael.

H.J. Lu

unread,
Jan 27, 2020, 1:48:58 PM1/27/20
to Michael Matz, Jan Beulich, x86-64-abi
No, please see:

[hjl@gnu-cfl-1 tmp]$ cat l.s
leaq foo@TLSDESC(%rip), %rax
[hjl@gnu-cfl-1 tmp]$ gcc -c l.s
[hjl@gnu-cfl-1 tmp]$ objdump -dwr l.o

l.o: file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 0x7 3:
R_X86_64_GOTPC32_TLSDESC foo-0x4
[hjl@gnu-cfl-1 tmp]$

Linker expects that LEA starts at r_offset-3. This is true for both
LP64 and x32.

> So it rather seems you try to work-around a simple bug in the relaxation
> code?
>
> > The fixed assembler generates a dummy REX byte if needed:
> >
> > 0: 40 8d 05 00 00 00 00 rex lea 0x0(%rip),%eax # 0x7 3:
> > R_X86_64_GOTPC32_TLSDESC foo-0x4
> > 7: 44 8d 1d 00 00 00 00 lea 0x0(%rip),%r11d # 0xe a:
> > R_X86_64_GOTPC32_TLSDESC foo-0x4
> >
> > so that "r_offset - 3" is always the start of LEA.
>
>
> Ciao,
> Michael.



--
H.J.

Michael Matz

unread,
Jan 27, 2020, 2:04:13 PM1/27/20
to H.J. Lu, Jan Beulich, x86-64-abi
Hello,
What I'm trying to say is that this is the bug. The linker should expect
LEA to be at r_offset-2, which indeed it is: the main opcode of LEA is
0x8d, and that sits at offset 1, i.e. r_offset-2. That there is also a
REX prefix here is immaterial, replacing that byte with 0x8b will
transform it into a MOV, the REX prefix will stay and thereby still
correctly amend the address and the target register of the
MOV-that-once-was-LEA. So, I'm saying there's a bug in the linker
relaxation code if it indeed expects the whole lea to be at r_offset-3,
and that bug needs to be fixed, instead of another relocation be added to
the psABI.

If you disagree, please provide a code sequence that would be incorrectly
transformed with a so fixed linker (i.e. one that assumes the main opcode
to be at r_offset-2, instead of assuming that the fully-prefixed
instructions starts at r_offset-3, which, of course, is not a very
useful assumption).


Ciao,
Michael.

H.J. Lu

unread,
Jan 27, 2020, 2:16:49 PM1/27/20
to Michael Matz, Jan Beulich, x86-64-abi
Linker can't tell the byte before 0x8d is the REX prefix for LEA or the last
byte of previous instruction.

BTW, R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCREL were
added to address the similar issue with R_X86_64_GOTPCREL.

--
H.J.

Michael Matz

unread,
Jan 27, 2020, 3:09:55 PM1/27/20
to H.J. Lu, Jan Beulich, x86-64-abi
Hi,
Correct, but why should the linker need to know? Again, please come up
with a code sequence that's incorrectly changed with a linker that does
essentially this:

if (r_type == R_X86_64_GOTPC32_TLSDESC
&& bytes[r_offset - 2] == 0x8d) {
bytes[r_offset - 2] = 0x8b;
}

> BTW, R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCREL were
> added to address the similar issue with R_X86_64_GOTPCREL.

The _REX_GOTPCRELX form was added to cater for this situation, the X
forms itself was added to differentiate between relocs where relaxation
can be applied (to code, the GOTPCRELX forms) and those where it cannot
(the GOTPCREL form). The difference between GOTPCRELX and REX_GOTPCRELX
is necessary, because there the linker _does_ need to know if a REX prefix
is there, because it must be modified when transforming a (%rip) memory
operand into an immediate (and that is merely caused by unfortunate
encoding choices with the various REX bits; but it is as it is).

The TLSDESC relaxations are different: you only switch between mov and
lea, the memory operand remains a memory operand, and the destination
register remains the same register (and same width), so no changes in the
REX prefix are necessary, and so the linker doesn't need to care if there
is one.

If you disagree with any of that, see above, a mistransformed sequence
should be easy to construct then.


Ciao,
Michael.

H.J. Lu

unread,
Jan 27, 2020, 9:49:48 PM1/27/20
to Michael Matz, Jan Beulich, x86-64-abi
There are 2 kinds of TLSDESC relaxations: GDesc -> IE transition and
GDesc -> LE transition. What you described only works for GDesc -> IE
transition. For GDesc -> LE transition, we can have

[hjl@gnu-cfl-2 gcc]$ cat x.s
.text
.p2align 4
.globl test
.type test, @function
test:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
leaq foo@TLSDESC(%rip), %r9
movq %r9, %rax
call *foo@TLSCALL(%rax)
addq %fs:0, %rax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.size test, .-test
.section .tdata,"awT",@progbits
.align 4
.type foo, @object
.size foo, 4
foo:
.long 30
.section .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-2 gcc]$ cat main.c
extern int *test (void);

int
main ()
{
return *test ();
}
[hjl@gnu-cfl-2 gcc]$ gcc -c main.c x.s
[hjl@gnu-cfl-2 gcc]$ objdump -dwr x.o

x.o: file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <test>:
0: 48 83 ec 08 sub $0x8,%rsp
4: 4c 8d 0d 00 00 00 00 lea 0x0(%rip),%r9 # b <test+0xb>
7: R_X86_64_GOTPC32_TLSDESC foo-0x4
b: 4c 89 c8 mov %r9,%rax
e: ff 10 callq *(%rax) e: R_X86_64_TLSDESC_CALL foo
10: 64 48 03 04 25 00 00 00 00 add %fs:0x0,%rax
19: 48 83 c4 08 add $0x8,%rsp
1d: c3 retq
[hjl@gnu-cfl-2 gcc]$ gcc main.o x.o
[hjl@gnu-cfl-2 gcc]$ objdump -dw --disassemble=test a.out

a.out: file format elf64-x86-64


Disassembly of section .init:

Disassembly of section .text:

0000000000401120 <test>:
401120: 48 83 ec 08 sub $0x8,%rsp
401124: 49 c7 c1 fc ff ff ff mov $0xfffffffffffffffc,%r9

Linker rewrites

4c 8d 0d 00 00 00 00 lea 0x0(%rip),%r9 # leaq foo@tlsdesc(%rip), %r9

to

49 c7 c1 fc ff ff ff mov $0xfffffffffffffffc,%r9 # movq $foo@tpoff, %r9

Linker changes opcode 0x8d to 0xc7, not 0x8b. To do that, linker
may need to update the REX byte.

40112b: 4c 89 c8 mov %r9,%rax
40112e: 66 90 xchg %ax,%ax
401130: 64 48 03 04 25 00 00 00 00 add %fs:0x0,%rax
401139: 48 83 c4 08 add $0x8,%rsp
40113d: c3 retq

Disassembly of section .fini:
[hjl@gnu-cfl-2 gcc]$


--
H.J.

Michael Matz

unread,
Jan 28, 2020, 10:00:21 AM1/28/20
to H.J. Lu, Jan Beulich, x86-64-abi
Hello H.J.,

On Mon, 27 Jan 2020, H.J. Lu wrote:

> There are 2 kinds of TLSDESC relaxations: GDesc -> IE transition and
> GDesc -> LE transition. What you described only works for GDesc -> IE
> transition. For GDesc -> LE transition, we can have
...
> Linker rewrites
>
> 4c 8d 0d 00 00 00 00 lea 0x0(%rip),%r9 # leaq foo@tlsdesc(%rip), %r9
>
> to
>
> 49 c7 c1 fc ff ff ff mov $0xfffffffffffffffc,%r9 # movq $foo@tpoff, %r9

Thanks. Yes, if you rewrite mem-ops into immediates you do need changes
in the REX prefix, and for that you need to rely on its existence. I
haven't noticed that the ->LE transition uses immediate operands. Sorry
for the pushback, objection withdrawn.


Ciao,
Michael.
Reply all
Reply to author
Forward
0 new messages