RFC: Add a new relocation to x86-64/i386 psABIs

瀏覽次數:26 次
跳到第一則未讀訊息

H.J. Lu

未讀,
2015年5月18日 上午8:40:442015/5/18
收件者:ia32...@googlegroups.com、x86-6...@googlegroups.com、gcc-p...@gcc.gnu.org、binu...@sourceware.org
To avoid indirect branch to internal functions, I am proposing to add a
new relocation, R_X86_64_RELAX_GOTPCREL, to x86-64 psABI:

1. When branching to an external function, foo, compiler may generate
call/jmp *foo@GOTRELAX(%rip)
which generates R_X86_64_RELAX_GOTPCREL relocation, instead of
call/jmp foo[@PLT]
2. When function foo is locally defined, linker converts
call/jmp *foo@GOTRELAX(%rip)
to
nop call/jmp foo
3. Otherwise, linker treats R_X86_64_RELAX_GOTPCREL the same way as
R_X86_64_GOTPCREL.

For i386 psABI, we add R_386_RELAX_GOT32:

1. When branching to an external function, foo, in non-PIC mode,
compiler may generate
call/jmp *foo@GOTRELAX
which generates R_386_RELAX_GOT32 relocation, instead of
call/jmp foo
and in PIC mode
call/jmp *foo@GOTRELAX(%reg)
which generates R_386_RELAX_GOT32 relocation and REG holds the address
of GOT, instead of
call/jmp foo@PLT
2. When function foo is locally defined, linker converts
call/jmp *foo@GOTRELAX[(%reg)]
to
nop call/jmp foo
3. Otherwise,
a. In PIC mode, linker treats R_386_RELAX_GOT32 the same way as
R_386_GOT32 and "call/jmp *foo@GOTRELAX" is unsupported.
b. In no-PIC mode, linker computes its relocation value as relocation
value of R_386_GOT32 plus the address of GOT and converts
call/jmp *foo@GOTRELAX(%reg)
to
call/jmp *foo@GOTRELAX
if needed.

This new relocation effectively turns off lazy binding on function, foo.

For i386, compiler is free to choose any register to hold the address of
GOT and there is no need to make EBX a fixed register when branching to
an external function in PIC mode.

With this new relocation, only a one-byte NOP prefix overhead is added
when function, foo, which compiler determines is external, turns out to
be local at link-time, because of -Bsymbolic or a definition in another
input object file which compiler has no knowledge of.

The new -fno-plt GCC option can use R_X86_64_RELAX_GOTPCREL and
R_386_RELAX_GOT32 relocations if linker supports them to avoid indirect
branch to internal functions.


H.J.

H.J. Lu

未讀,
2015年5月18日 上午11:05:062015/5/18
收件者:Michael Matz、IA32 System V Application Binary Interface、x86-6...@googlegroups.com、GCC Patches、Binutils
On Mon, May 18, 2015 at 6:13 AM, Michael Matz <ma...@suse.de> wrote:
> Hi,
>
> On Mon, 18 May 2015, H.J. Lu wrote:
>
>> To avoid indirect branch to internal functions, I am proposing to add a
>> new relocation, R_X86_64_RELAX_GOTPCREL, to x86-64 psABI:
>>
>> 1. When branching to an external function, foo, compiler may generate
>> call/jmp *foo@GOTRELAX(%rip)
>> which generates R_X86_64_RELAX_GOTPCREL relocation, instead of
>> call/jmp foo[@PLT]
>> 2. When function foo is locally defined, linker converts
>> call/jmp *foo@GOTRELAX(%rip)
>> to
>> nop call/jmp foo
>
> For the jmp case the nop can also be added after it, to not even disturb

Yes, we should convert it to

nop call foo/jmp foo nop

I implemented it on users/hjl/relax branch in binutils git repo.

> the insn decoder. For calls as well of course, but there it might be
> better to have it before the call.
>

I think a nop prefix is better on call. We won't mandate "nop call foo"
in psABI and linker is free to use either a nop prefix or a nop suffix.

Should we move forward with it?

Thanks.

--
H.J.

H.J. Lu

未讀,
2015年5月18日 中午12:12:162015/5/18
收件者:Michael Matz、IA32 System V Application Binary Interface、x86-6...@googlegroups.com、GCC Patches、Binutils
> I think I have only nit-picking left: why call the whole thing relax? To
> me relax implies some length-changing transformation (like jump target
> relaxing, emitting shorter jumps when in range), but perhaps that's just
> me. OTOH I can't think of a better name right now.

Me neither. Linker manual has

'--relax'
'--no-relax'
An option with machine dependent effects. This option is only
supported on a few targets. *Note 'ld' and the H8/300: H8/300.
*Note 'ld' and the Intel 960 family: i960. *Note 'ld' and Xtensa
Processors: Xtensa. *Note 'ld' and the 68HC11 and 68HC12:
M68HC11/68HC12. *Note 'ld' and the Altera Nios II: Nios II. *Note
'ld' and PowerPC 32-bit ELF Support: PowerPC ELF32.

On some platforms the '--relax' option performs target specific,
global optimizations that become possible when the linker resolves
addressing in the program, such as relaxing address modes,
synthesizing new instructions, selecting shorter version of current
instructions, and combining constant values.

On some platforms these link time global optimizations may make
symbolic debugging of the resulting executable impossible. This is
known to be the case for the Matsushita MN10200 and MN10300 family
of processors.

In this sense, calling this scheme "relax" is not entirely inaccurate.

> Then one remark: there's a small interaction between this scheme and
> taking the address of a function. I _think_ it's all taken care of, but

I believe so.

> just want to make sure it is: the relax scheme must only apply to the
> .got.plt slot, not to the normal .got slot (which must continue to hold
> the final function address), and with the recent sharing you have
> implemented (when both are needed) it must be ensured that also an
> existing RELAX_GOTPCREL reloc doesn't overwrite that .got slot with the
> .plt entry address.

Since .got and .got.plt slots serve the same purpose, ld already
combines them into one single .got slot with .plt entry pointing to
the .got slot. That is if you take the address of the function and
branch to it, linker will arrange .plt entry to do an indirect branch
via its got slot.

--
H.J.
回覆所有人
回覆作者
轉寄
0 則新訊息