[PATCH] x86/boot: allow a relocatable kernel to be linked with lld

235 views
Skip to first unread message

Dmitry Golovin

unread,
May 1, 2020, 4:42:43 AM5/1/20
to clang-bu...@googlegroups.com, Dmitry Golovin, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x...@kernel.org, H. Peter Anvin, Nick Desaulniers, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
LLD by default disallows relocations in read-only segments. For a
relocatable kernel, we pass -z notext to the linker to explicitly
allow relocations. This behavior is the default for BFD.

Link: https://github.com/ClangBuiltLinux/linux/issues/579
Signed-off-by: Dmitry Golovin <di...@golovin.in>
---
arch/x86/boot/compressed/Makefile | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 5f7c262bcc99..7214751e1671 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -57,6 +57,9 @@ else
KBUILD_LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \
&& echo "-z noreloc-overflow -pie --no-dynamic-linker")
endif
+ifeq ($(CONFIG_RELOCATABLE), y)
+KBUILD_LDFLAGS += -z notext
+endif
LDFLAGS_vmlinux := -T

hostprogs := mkpiggy
--
2.25.1

Nathan Chancellor

unread,
May 1, 2020, 11:43:10 PM5/1/20
to Dmitry Golovin, clang-bu...@googlegroups.com, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x...@kernel.org, H. Peter Anvin, Nick Desaulniers, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
On Fri, May 01, 2020 at 08:42:13AM +0000, Dmitry Golovin wrote:
> LLD by default disallows relocations in read-only segments. For a
> relocatable kernel, we pass -z notext to the linker to explicitly
> allow relocations. This behavior is the default for BFD.
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/579
> Signed-off-by: Dmitry Golovin <di...@golovin.in>

I was able to link a Clang built i386_defconfig kernel with ld.lld and
boot it in QEMU 5.0 after this change. A GCC built kernel links still
with ld.bfd and also boots in QEMU successfully. x86_64_defconfig with
both compilers and their respective linkers did not regress.

Tested-by: Nathan Chancellor <natecha...@gmail.com>

Dmitry Golovin

unread,
May 15, 2020, 12:18:43 PM5/15/20
to clang-bu...@googlegroups.com, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x...@kernel.org, H. Peter Anvin, Nick Desaulniers, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
01.05.2020, 11:42, "Dmitry Golovin" <di...@golovin.in>:
Ping for review and acceptance.

~DG

Borislav Petkov

unread,
May 15, 2020, 2:50:56 PM5/15/20
to Dmitry Golovin, clang-bu...@googlegroups.com, Thomas Gleixner, Ingo Molnar, x...@kernel.org, H. Peter Anvin, Nick Desaulniers, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
On Fri, May 01, 2020 at 08:42:13AM +0000, Dmitry Golovin wrote:
> LLD by default disallows relocations in read-only segments. For a

I need more info here about which segment is read-only?

Is this something LLD does by default or what's happening?

Because my BFD-linked vmlinux has:

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000001000 0x0000000000000000 0x0000000000000000
0x000000000070fa28 0x0000000000726b00 RWE 0x1000
LOAD 0x0000000000000000 0x0000000000727000 0x0000000000727000
0x0000000000000000 0x0000000000007000 RW 0x1000
DYNAMIC 0x00000000007108f8 0x000000000070f8f8 0x000000000070f8f8
0x0000000000000130 0x0000000000000130 RW 0x8
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RWE 0x10

so what's up?

> relocatable kernel, we pass -z notext to the linker to explicitly
> allow relocations. This behavior is the default for BFD.

Or are you saying that ld.bfd makes the text segment by default RW while
ld.lld makes it read-only like the elf manpage says:

"p_flags
This member holds a bit mask of flags relevant to the segment:

PF_X An executable segment.
PF_W A writable segment.
PF_R A readable segment.

A text segment commonly has the flags PF_X and PF_R."

IOW, don't be afraid to be more verbose in the commit message. :)

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Dmitry Golovin

unread,
May 16, 2020, 4:08:50 AM5/16/20
to Borislav Petkov, clang-bu...@googlegroups.com, Thomas Gleixner, Ingo Molnar, x...@kernel.org, H. Peter Anvin, Nick Desaulniers, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
15.05.2020, 21:50, "Borislav Petkov" <b...@alien8.de>:
>
> I need more info here about which segment is read-only?
>
> Is this something LLD does by default or what's happening?
>

Probably should have quoted the original error message:

ld.lld: error: can't create dynamic relocation R_386_32 against
symbol: _bss in readonly segment; recompile object files with -fPIC
or pass '-Wl,-z,notext' to allow text relocations in the output

>
> IOW, don't be afraid to be more verbose in the commit message. :)
>

Tried both BFD and LLD for linking to understand the difference more and
rewrite the commit message, and came to the conclusion that the patch is
wrong. I will submit v2 when I figure out the correct solution.

Regards,
Dmitry

Fangrui Song

unread,
May 17, 2020, 3:44:34 PM5/17/20
to Dmitry Golovin, Borislav Petkov, clang-bu...@googlegroups.com, Thomas Gleixner, Ingo Molnar, x...@kernel.org, H. Peter Anvin, Nick Desaulniers, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
On 2020-05-16, Dmitry Golovin wrote:
>15.05.2020, 21:50, "Borislav Petkov" <b...@alien8.de>:
>>
>> I need more info here about which segment is read-only?
>>
>> Is this something LLD does by default or what's happening?
>>
>
>Probably should have quoted the original error message:
>
> ld.lld: error: can't create dynamic relocation R_386_32 against
> symbol: _bss in readonly segment; recompile object files with -fPIC
> or pass '-Wl,-z,notext' to allow text relocations in the output

Do we know where do these R_386_32 come from?

When linking in -shared mode, the linker assumes the image is a shared
object and has undetermined image base at runtime. An absolute
relocation needs a text relocation (a relocation against a readonly
segment).

When neither -z notext nor -z text is specified, GNU ld is in an
indefinite state where it will enable text relocations (DT_TEXTREL
DF_TEXTREL) on demand. It is not considered a good practice for
userspace applications to do this.

Of course the kernel is different....... I know little about the kernel,
but if there is a way to make the sections containing R_386_32
relocations writable (SHF_WRITE), that will be a better solution to me.
In LLD, -z notext is like making every section SHF_WRITE.

>>
>> IOW, don't be afraid to be more verbose in the commit message. :)
>>
>
>Tried both BFD and LLD for linking to understand the difference more and
>rewrite the commit message, and came to the conclusion that the patch is
>wrong. I will submit v2 when I figure out the correct solution.
>
>Regards,
>Dmitry
>
>--
>You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-li...@googlegroups.com.
>To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/602331589572661%40mail.yandex.ru.

Arvind Sankar

unread,
May 17, 2020, 4:25:30 PM5/17/20
to Fangrui Song, Dmitry Golovin, Borislav Petkov, clang-bu...@googlegroups.com, Thomas Gleixner, Ingo Molnar, x...@kernel.org, H. Peter Anvin, Nick Desaulniers, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
On Sun, May 17, 2020 at 12:44:29PM -0700, Fangrui Song wrote:
> On 2020-05-16, Dmitry Golovin wrote:
> >15.05.2020, 21:50, "Borislav Petkov" <b...@alien8.de>:
> >>
> >> I need more info here about which segment is read-only?
> >>
> >> Is this something LLD does by default or what's happening?
> >>
> >
> >Probably should have quoted the original error message:
> >
> > ld.lld: error: can't create dynamic relocation R_386_32 against
> > symbol: _bss in readonly segment; recompile object files with -fPIC
> > or pass '-Wl,-z,notext' to allow text relocations in the output
>
> Do we know where do these R_386_32 come from?
>
> When linking in -shared mode, the linker assumes the image is a shared
> object and has undetermined image base at runtime. An absolute
> relocation needs a text relocation (a relocation against a readonly
> segment).
>
> When neither -z notext nor -z text is specified, GNU ld is in an
> indefinite state where it will enable text relocations (DT_TEXTREL
> DF_TEXTREL) on demand. It is not considered a good practice for
> userspace applications to do this.
>
> Of course the kernel is different....... I know little about the kernel,
> but if there is a way to make the sections containing R_386_32
> relocations writable (SHF_WRITE), that will be a better solution to me.
> In LLD, -z notext is like making every section SHF_WRITE.

The assembly files head_32.S and head_64.S in arch/x86/boot/compressed
create bogus relocations in .head.text and .text.

This is the source of the common warning when using the bfd linker, for
eg on 64-bit:
LD arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating a DT_TEXTREL in object

These relocations are "bogus", i.e. they're unwanted and the kernel
won't actually boot if anything were to perform those relocations before
running it. They're also the cause of the 64-bit kernel requiring linker
support for the -z noreloc-overflow option to link it as PIE.

From arch/x86/boot/compressed/Makefile:

# To build 64-bit compressed kernel as PIE, we disable relocation
# overflow check to avoid relocation overflow error with a new linker
# command-line option, -z noreloc-overflow.
KBUILD_LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z
noreloc-overflow" \
&& echo "-z noreloc-overflow -pie --no-dynamic-linker")

The relocations come from code like
leal gdt(%ebp), %eax
which should really be
leal (gdt-startup_32)(%ebp), %eax
to be technically correct.

I've played around with fixing the head code to avoid creating the
relocations in the first place, but never got around to submitting
anything: if there is interest in this, I can polish it up and send it
around.

Thanks.

Nick Desaulniers

unread,
May 18, 2020, 3:10:25 PM5/18/20
to Arvind Sankar, Fangrui Song, Dmitry Golovin, Borislav Petkov, clang-bu...@googlegroups.com, Thomas Gleixner, Ingo Molnar, x...@kernel.org, H. Peter Anvin, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
We'd be happy to help test and review. :)
--
Thanks,
~Nick Desaulniers

Arvind Sankar

unread,
May 24, 2020, 5:28:19 PM5/24/20
to Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
The compressed kernel currently contains bogus runtime relocations in
the startup code in head_{32,64}.S, which are generated by the linker,
but must not actually be processed at runtime.

This generates warnings when linking with the BFD linker, and errors
with LLD, which defaults to erroring on runtime relocations in read-only
sections. It also requires the -z noreloc-overflow hack for the 64-bit
kernel, which prevents us from linking it as -pie on an older BFD linker
(<= 2.26) or on LLD, because the locations that are to be apparently
relocated are only 32-bits in size and so cannot normally have
R_X86_64_RELATIVE relocations.

This series aims to get rid of these relocations. It is based on
efi/next (efi-changes-for-v5.8), where the latest patches touch the
head code to eliminate the global offset table.

The first patch is an independent fix for LLD, to avoid an orphan
section in arch/x86/boot/setup.elf [0].

The second patch gets rid of almost all the relocations. It uses
standard PIC addressing technique for 32-bit, i.e. loading a register
with the address of _GLOBAL_OFFSET_TABLE_ and then using GOTOFF
references to access variables. For 64-bit, there is 32-bit code that
cannot use RIP-relative addressing, and also cannot use the 32-bit
method, since GOTOFF references are 64-bit only. This is instead handled
using a macro to replace a reference like gdt with (gdt-startup_32)
instead. The assembler will generate a PC32 relocation entry, with
addend set to (.-startup_32), and these will be replaced with constants
at link time. This works as long as all the code using such references
lives in the same section as startup_32, i.e. in .head.text.

The third patch addresses a remaining issue with the BFD linker, which
insists on generating runtime relocations for absolute symbols. We use
z_input_len and z_output_len, defined in the generated piggy.S file, as
symbols whose absolute "addresses" are actually the size of the
compressed payload and the size of the decompressed kernel image
respectively. LLD does not generate relocations for these two symbols,
but the BFD linker does. To get around this, piggy.S is extended to also
define two u32 variables (in .rodata) with the lengths, and the head
code is modified to use those instead of the symbol addresses.

An alternative way to handle z_input_len/z_output_len would be to just
include piggy.S in head_{32,64}.S instead of as a separate object file,
since the GNU assembler doesn't generate relocations for symbols set to
constants.

The last patch adds a check in the linker script to ensure that no
runtime relocations get reintroduced. Since the GOT has been eliminated
as well, the compressed kernel has no runtime relocations whatsoever any
more.

[0] https://lore.kernel.org/lkml/20200521152459.5...@alum.mit.edu/

Arvind Sankar (4):
x86/boot: Add .text.startup to setup.ld
x86/boot: Remove runtime relocations from .head.text code
x86/boot: Remove runtime relocations from head_{32,64}.S
x86/boot: Check that there are no runtime relocations

arch/x86/boot/compressed/Makefile | 36 +---------
arch/x86/boot/compressed/head_32.S | 59 +++++++--------
arch/x86/boot/compressed/head_64.S | 99 +++++++++++++++-----------
arch/x86/boot/compressed/mkpiggy.c | 6 ++
arch/x86/boot/compressed/vmlinux.lds.S | 11 +++
arch/x86/boot/setup.ld | 2 +-
6 files changed, 109 insertions(+), 104 deletions(-)

--
2.26.2

Arvind Sankar

unread,
May 24, 2020, 5:28:20 PM5/24/20
to Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
gcc puts the main function into .text.startup when compiled with -Os (or
-O2). This results in arch/x86/boot/main.c having a .text.startup
section which is currently not included explicitly in the linker script
setup.ld in the same directory.

The BFD linker places this orphan section immediately after .text, so
this still works. However, LLD git, since [1], is choosing to place it
immediately after the .bstext section instead (this is the first code
section). This plays havoc with the section layout that setup.elf
requires to create the setup header, for eg on 64-bit:

LD arch/x86/boot/setup.elf
ld.lld: error: section .text.startup file range overlaps with .header
>>> .text.startup range is [0x200040, 0x2001FE]
>>> .header range is [0x2001EF, 0x20026B]

ld.lld: error: section .header file range overlaps with .bsdata
>>> .header range is [0x2001EF, 0x20026B]
>>> .bsdata range is [0x2001FF, 0x200398]

ld.lld: error: section .bsdata file range overlaps with .entrytext
>>> .bsdata range is [0x2001FF, 0x200398]
>>> .entrytext range is [0x20026C, 0x2002D3]

ld.lld: error: section .text.startup virtual address range overlaps
with .header
>>> .text.startup range is [0x40, 0x1FE]
>>> .header range is [0x1EF, 0x26B]

ld.lld: error: section .header virtual address range overlaps with
.bsdata
>>> .header range is [0x1EF, 0x26B]
>>> .bsdata range is [0x1FF, 0x398]

ld.lld: error: section .bsdata virtual address range overlaps with
.entrytext
>>> .bsdata range is [0x1FF, 0x398]
>>> .entrytext range is [0x26C, 0x2D3]

ld.lld: error: section .text.startup load address range overlaps with
.header
>>> .text.startup range is [0x40, 0x1FE]
>>> .header range is [0x1EF, 0x26B]

ld.lld: error: section .header load address range overlaps with
.bsdata
>>> .header range is [0x1EF, 0x26B]
>>> .bsdata range is [0x1FF, 0x398]

ld.lld: error: section .bsdata load address range overlaps with
.entrytext
>>> .bsdata range is [0x1FF, 0x398]
>>> .entrytext range is [0x26C, 0x2D3]

Explicitly pull .text.startup into the .text output section to avoid
this.

[1] https://reviews.llvm.org/D75225

Signed-off-by: Arvind Sankar <nive...@alum.mit.edu>
Reviewed-by: Fangrui Song <mas...@google.com>
---
arch/x86/boot/setup.ld | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 24c95522f231..ed60abcdb089 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -20,7 +20,7 @@ SECTIONS
.initdata : { *(.initdata) }
__end_init = .;

- .text : { *(.text) }
+ .text : { *(.text.startup) *(.text) }
.text32 : { *(.text32) }

. = ALIGN(16);
--
2.26.2

Arvind Sankar

unread,
May 24, 2020, 5:28:21 PM5/24/20
to Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
The assembly code in head_{32,64}.S, while meant to be
position-independent, generates run-time relocations because it uses
instructions such as
leal gdt(%edx), %eax
which make the assembler and linker think that the code is using %edx as
an index into gdt, and hence gdt needs to be relocated to its run-time
address.

With the BFD linker, this generates a warning during the build:
LD arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
ld: warning: creating a DT_TEXTREL in object

With lld, Dmitry Golovin reports that this results in a link-time error
with default options (i.e. unless -z notext is explicitly passed):
LD arch/x86/boot/compressed/vmlinux
ld.lld: error: can't create dynamic relocation R_386_32 against local
symbol in readonly segment; recompile object files with -fPIC or pass
'-Wl,-z,notext' to allow text relocations in the output

Start fixing this by removing relocations from .head.text:
- On 32-bit, use a base register that holds the address of the GOT and
reference symbol addresses using @GOTOFF, i.e.
leal gdt@GOTOFF(%edx), %eax
- On 64-bit, most of the code can (and already does) use %rip-relative
addressing, however the .code32 bits can't, and the 64-bit code also
needs to reference symbol addresses as they will be after moving the
compressed kernel to the end of the decompression buffer.
For these cases, reference the symbols as an offset to startup_32 to
avoid creating relocations, i.e.
leal (gdt-startup_32)(%bp), %eax
This only works in .head.text as the subtraction cannot be represented
as a PC-relative relocation unless startup_32 is in the same section
as the code. Move efi32_pe_entry into .head.text so that it can use
the same method to avoid relocations.

Signed-off-by: Arvind Sankar <nive...@alum.mit.edu>
---
arch/x86/boot/compressed/head_32.S | 40 +++++++------
arch/x86/boot/compressed/head_64.S | 95 ++++++++++++++++++------------
2 files changed, 77 insertions(+), 58 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index dfa4131c65df..66657bb99aae 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -73,10 +73,10 @@ SYM_FUNC_START(startup_32)
leal (BP_scratch+4)(%esi), %esp
call 1f
1: popl %edx
- subl $1b, %edx
+ addl $_GLOBAL_OFFSET_TABLE_+(.-1b), %edx

/* Load new GDT */
- leal gdt(%edx), %eax
+ leal gdt@GOTOFF(%edx), %eax
movl %eax, 2(%eax)
lgdt (%eax)

@@ -89,14 +89,16 @@ SYM_FUNC_START(startup_32)
movl %eax, %ss

/*
- * %edx contains the address we are loaded at by the boot loader and %ebx
- * contains the address where we should move the kernel image temporarily
- * for safe in-place decompression. %ebp contains the address that the kernel
- * will be decompressed to.
+ * %edx contains the address we are loaded at by the boot loader (plus the
+ * offset to the GOT). The below code calculates %ebx to be the address where
+ * we should move the kernel image temporarily for safe in-place decompression
+ * (again, plus the offset to the GOT).
+ *
+ * %ebp is calculated to be the address that the kernel will be decompressed to.
*/

#ifdef CONFIG_RELOCATABLE
- movl %edx, %ebx
+ leal startup_32@GOTOFF(%edx), %ebx

#ifdef CONFIG_EFI_STUB
/*
@@ -107,7 +109,7 @@ SYM_FUNC_START(startup_32)
* image_offset = startup_32 - image_base
* Otherwise image_offset will be zero and has no effect on the calculations.
*/
- subl image_offset(%edx), %ebx
+ subl image_offset@GOTOFF(%edx), %ebx
#endif

movl BP_kernel_alignment(%esi), %eax
@@ -124,10 +126,10 @@ SYM_FUNC_START(startup_32)
movl %ebx, %ebp // Save the output address for later
/* Target address to relocate to for decompression */
addl BP_init_size(%esi), %ebx
- subl $_end, %ebx
+ subl $_end@GOTOFF, %ebx

/* Set up the stack */
- leal boot_stack_end(%ebx), %esp
+ leal boot_stack_end@GOTOFF(%ebx), %esp

/* Zero EFLAGS */
pushl $0
@@ -138,8 +140,8 @@ SYM_FUNC_START(startup_32)
* where decompression in place becomes safe.
*/
pushl %esi
- leal (_bss-4)(%edx), %esi
- leal (_bss-4)(%ebx), %edi
+ leal (_bss@GOTOFF-4)(%edx), %esi
+ leal (_bss@GOTOFF-4)(%ebx), %edi
movl $(_bss - startup_32), %ecx
shrl $2, %ecx
std
@@ -152,14 +154,14 @@ SYM_FUNC_START(startup_32)
* during extract_kernel below. To avoid any issues, repoint the GDTR
* to the new copy of the GDT.
*/
- leal gdt(%ebx), %eax
+ leal gdt@GOTOFF(%ebx), %eax
movl %eax, 2(%eax)
lgdt (%eax)

/*
* Jump to the relocated address.
*/
- leal .Lrelocated(%ebx), %eax
+ leal .Lrelocated@GOTOFF(%ebx), %eax
jmp *%eax
SYM_FUNC_END(startup_32)

@@ -169,7 +171,7 @@ SYM_FUNC_START_ALIAS(efi_stub_entry)
add $0x4, %esp
movl 8(%esp), %esi /* save boot_params pointer */
call efi_main
- leal startup_32(%eax), %eax
+ /* efi_main returns the possibly relocated address of startup_32 */
jmp *%eax
SYM_FUNC_END(efi32_stub_entry)
SYM_FUNC_END_ALIAS(efi_stub_entry)
@@ -182,8 +184,8 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
* Clear BSS (stack is currently empty)
*/
xorl %eax, %eax
- leal _bss(%ebx), %edi
- leal _ebss(%ebx), %ecx
+ leal _bss@GOTOFF(%ebx), %edi
+ leal _ebss@GOTOFF(%ebx), %ecx
subl %edi, %ecx
shrl $2, %ecx
rep stosl
@@ -197,9 +199,9 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
pushl %ebp /* output address */

pushl $z_input_len /* input_len */
- leal input_data(%ebx), %eax
+ leal input_data@GOTOFF(%ebx), %eax
pushl %eax /* input_data */
- leal boot_heap(%ebx), %eax
+ leal boot_heap@GOTOFF(%ebx), %eax
pushl %eax /* heap area */
pushl %esi /* real mode pointer */
call extract_kernel /* returns kernel location in %eax */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 706fbf6eef53..f6ba32cd5702 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -42,6 +42,23 @@
.hidden _ebss

__HEAD
+
+/*
+ * This macro gives the link address of X. It's the same as X, since startup_32
+ * has link address 0, but defining it this way tells the assembler/linker that
+ * we want the link address, and not the run-time address of X. This prevents
+ * the linker from creating a run-time relocation entry for this reference.
+ *
+ * The macro should be used as a displacement with a base register containing
+ * the run-time address of startup_32 [i.e. la(X)(%reg)], or as an
+ * immediate [$ la(X)].
+ *
+ * This macro can only be used from within the .head.text section, since the
+ * expression requires startup_32 to be in the same section as the code being
+ * assembled.
+ */
+#define la(X) ((X) - startup_32)
+
.code32
SYM_FUNC_START(startup_32)
/*
@@ -64,10 +81,10 @@ SYM_FUNC_START(startup_32)
leal (BP_scratch+4)(%esi), %esp
call 1f
1: popl %ebp
- subl $1b, %ebp
+ subl $ la(1b), %ebp

/* Load new GDT with the 64bit segments using 32bit descriptor */
- leal gdt(%ebp), %eax
+ leal la(gdt)(%ebp), %eax
movl %eax, 2(%eax)
lgdt (%eax)

@@ -80,7 +97,7 @@ SYM_FUNC_START(startup_32)
movl %eax, %ss

/* setup a stack and make sure cpu supports long mode. */
- leal boot_stack_end(%ebp), %esp
+ leal la(boot_stack_end)(%ebp), %esp

call verify_cpu
testl %eax, %eax
@@ -107,7 +124,7 @@ SYM_FUNC_START(startup_32)
* image_offset = startup_32 - image_base
* Otherwise image_offset will be zero and has no effect on the calculations.
*/
- subl image_offset(%ebp), %ebx
+ subl la(image_offset)(%ebp), %ebx
#endif

movl BP_kernel_alignment(%esi), %eax
@@ -123,7 +140,7 @@ SYM_FUNC_START(startup_32)

/* Target address to relocate to for decompression */
addl BP_init_size(%esi), %ebx
- subl $_end, %ebx
+ subl $ la(_end), %ebx

/*
* Prepare for entering 64 bit mode
@@ -151,19 +168,19 @@ SYM_FUNC_START(startup_32)
1:

/* Initialize Page tables to 0 */
- leal pgtable(%ebx), %edi
+ leal la(pgtable)(%ebx), %edi
xorl %eax, %eax
movl $(BOOT_INIT_PGT_SIZE/4), %ecx
rep stosl

/* Build Level 4 */
- leal pgtable + 0(%ebx), %edi
+ leal la(pgtable + 0)(%ebx), %edi
leal 0x1007 (%edi), %eax
movl %eax, 0(%edi)
addl %edx, 4(%edi)

/* Build Level 3 */
- leal pgtable + 0x1000(%ebx), %edi
+ leal la(pgtable + 0x1000)(%ebx), %edi
leal 0x1007(%edi), %eax
movl $4, %ecx
1: movl %eax, 0x00(%edi)
@@ -174,7 +191,7 @@ SYM_FUNC_START(startup_32)
jnz 1b

/* Build Level 2 */
- leal pgtable + 0x2000(%ebx), %edi
+ leal la(pgtable + 0x2000)(%ebx), %edi
movl $0x00000183, %eax
movl $2048, %ecx
1: movl %eax, 0(%edi)
@@ -185,7 +202,7 @@ SYM_FUNC_START(startup_32)
jnz 1b

/* Enable the boot page tables */
- leal pgtable(%ebx), %eax
+ leal la(pgtable)(%ebx), %eax
movl %eax, %cr3

/* Enable Long mode in EFER (Extended Feature Enable Register) */
@@ -211,17 +228,17 @@ SYM_FUNC_START(startup_32)
* used to perform that far jump.
*/
pushl $__KERNEL_CS
- leal startup_64(%ebp), %eax
+ leal la(startup_64)(%ebp), %eax
#ifdef CONFIG_EFI_MIXED
- movl efi32_boot_args(%ebp), %edi
+ movl la(efi32_boot_args)(%ebp), %edi
cmp $0, %edi
jz 1f
- leal efi64_stub_entry(%ebp), %eax
- movl efi32_boot_args+4(%ebp), %esi
- movl efi32_boot_args+8(%ebp), %edx // saved bootparams pointer
+ leal la(efi64_stub_entry)(%ebp), %eax
+ movl la(efi32_boot_args+4)(%ebp), %esi
+ movl la(efi32_boot_args+8)(%ebp), %edx // saved bootparams pointer
cmpl $0, %edx
jnz 1f
- leal efi_pe_entry(%ebp), %eax
+ leal la(efi_pe_entry)(%ebp), %eax
movl %edi, %ecx // MS calling convention
movl %esi, %edx
1:
@@ -246,18 +263,18 @@ SYM_FUNC_START(efi32_stub_entry)

call 1f
1: pop %ebp
- subl $1b, %ebp
+ subl $ la(1b), %ebp

- movl %esi, efi32_boot_args+8(%ebp)
+ movl %esi, la(efi32_boot_args+8)(%ebp)
SYM_INNER_LABEL(efi32_pe_stub_entry, SYM_L_LOCAL)
- movl %ecx, efi32_boot_args(%ebp)
- movl %edx, efi32_boot_args+4(%ebp)
- movb $0, efi_is64(%ebp)
+ movl %ecx, la(efi32_boot_args)(%ebp)
+ movl %edx, la(efi32_boot_args+4)(%ebp)
+ movb $0, la(efi_is64)(%ebp)

/* Save firmware GDTR and code/data selectors */
- sgdtl efi32_boot_gdt(%ebp)
- movw %cs, efi32_boot_cs(%ebp)
- movw %ds, efi32_boot_ds(%ebp)
+ sgdtl la(efi32_boot_gdt)(%ebp)
+ movw %cs, la(efi32_boot_cs)(%ebp)
+ movw %ds, la(efi32_boot_ds)(%ebp)

/* Disable paging */
movl %cr0, %eax
@@ -336,11 +353,11 @@ SYM_CODE_START(startup_64)

/* Target address to relocate to for decompression */
movl BP_init_size(%rsi), %ebx
- subl $_end, %ebx
+ subl $ la(_end), %ebx
addq %rbp, %rbx

/* Set up the stack */
- leaq boot_stack_end(%rbx), %rsp
+ leaq la(boot_stack_end)(%rbx), %rsp

/*
* At this point we are in long mode with 4-level paging enabled,
@@ -406,7 +423,7 @@ SYM_CODE_START(startup_64)
lretq
trampoline_return:
/* Restore the stack, the 32-bit trampoline uses its own stack */
- leaq boot_stack_end(%rbx), %rsp
+ leaq la(boot_stack_end)(%rbx), %rsp

/*
* cleanup_trampoline() would restore trampoline memory.
@@ -418,7 +435,7 @@ trampoline_return:
* this function call.
*/
pushq %rsi
- leaq top_pgtable(%rbx), %rdi
+ leaq la(top_pgtable)(%rbx), %rdi
call cleanup_trampoline
popq %rsi

@@ -432,9 +449,9 @@ trampoline_return:
*/
pushq %rsi
leaq (_bss-8)(%rip), %rsi
- leaq (_bss-8)(%rbx), %rdi
- movq $_bss /* - $startup_32 */, %rcx
- shrq $3, %rcx
+ leaq la(_bss-8)(%rbx), %rdi
+ movl $(_bss - startup_32), %ecx
+ shrl $3, %ecx
std
rep movsq
cld
@@ -445,15 +462,15 @@ trampoline_return:
* during extract_kernel below. To avoid any issues, repoint the GDTR
* to the new copy of the GDT.
*/
- leaq gdt64(%rbx), %rax
- leaq gdt(%rbx), %rdx
+ leaq la(gdt64)(%rbx), %rax
+ leaq la(gdt)(%rbx), %rdx
movq %rdx, 2(%rax)
lgdt (%rax)

/*
* Jump to the relocated address.
*/
- leaq .Lrelocated(%rbx), %rax
+ leaq la(.Lrelocated)(%rbx), %rax
jmp *%rax
SYM_CODE_END(startup_64)

@@ -465,7 +482,7 @@ SYM_FUNC_START_ALIAS(efi_stub_entry)
movq %rdx, %rbx /* save boot_params pointer */
call efi_main
movq %rbx,%rsi
- leaq startup_64(%rax), %rax
+ leaq la(startup_64)(%rax), %rax
jmp *%rax
SYM_FUNC_END(efi64_stub_entry)
SYM_FUNC_END_ALIAS(efi_stub_entry)
@@ -628,7 +645,7 @@ SYM_DATA(efi_is64, .byte 1)
#define BS32_handle_protocol 88 // offsetof(efi_boot_services_32_t, handle_protocol)
#define LI32_image_base 32 // offsetof(efi_loaded_image_32_t, image_base)

- .text
+ __HEAD
.code32
SYM_FUNC_START(efi32_pe_entry)
/*
@@ -650,12 +667,12 @@ SYM_FUNC_START(efi32_pe_entry)

call 1f
1: pop %ebx
- subl $1b, %ebx
+ subl $ la(1b), %ebx

/* Get the loaded image protocol pointer from the image handle */
leal -4(%ebp), %eax
pushl %eax // &loaded_image
- leal loaded_image_proto(%ebx), %eax
+ leal la(loaded_image_proto)(%ebx), %eax
pushl %eax // pass the GUID address
pushl 8(%ebp) // pass the image handle

@@ -690,7 +707,7 @@ SYM_FUNC_START(efi32_pe_entry)
* use it before we get to the 64-bit efi_pe_entry() in C code.
*/
subl %esi, %ebx
- movl %ebx, image_offset(%ebp) // save image_offset
+ movl %ebx, la(image_offset)(%ebp) // save image_offset
jmp efi32_pe_stub_entry

2: popl %edi // restore callee-save registers
--
2.26.2

Arvind Sankar

unread,
May 24, 2020, 5:28:21 PM5/24/20
to Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
The BFD linker generates runtime relocations for z_input_len and
z_output_len, even though they are absolute symbols.

Work around this by defining two variables input_len and output_len in
addition to the symbols, and use them via position-independent
references.

This eliminates the last two runtime relocations in the head code and
allows us to drop the -z noreloc-overflow flag to the linker.

Signed-off-by: Arvind Sankar <nive...@alum.mit.edu>
---
arch/x86/boot/compressed/Makefile | 8 --------
arch/x86/boot/compressed/head_32.S | 17 ++++++++---------
arch/x86/boot/compressed/head_64.S | 4 ++--
arch/x86/boot/compressed/mkpiggy.c | 6 ++++++
4 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index aa9ed814e5fa..d3e882e855ee 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -49,15 +49,7 @@ UBSAN_SANITIZE :=n
KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
# Compressed kernel should be built as PIE since it may be loaded at any
# address by the bootloader.
-ifeq ($(CONFIG_X86_32),y)
KBUILD_LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
-else
-# To build 64-bit compressed kernel as PIE, we disable relocation
-# overflow check to avoid relocation overflow error with a new linker
-# command-line option, -z noreloc-overflow.
-KBUILD_LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \
- && echo "-z noreloc-overflow -pie --no-dynamic-linker")
-endif
LDFLAGS_vmlinux := -T

hostprogs := mkpiggy
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 66657bb99aae..064e895bad92 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -193,18 +193,17 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
/*
* Do the extraction, and jump to the new kernel..
*/
- /* push arguments for extract_kernel: */
- pushl $z_output_len /* decompressed length, end of relocs */
+ /* push arguments for extract_kernel: */

- pushl %ebp /* output address */
-
- pushl $z_input_len /* input_len */
+ pushl output_len@GOTOFF(%ebx) /* decompressed length, end of relocs */
+ pushl %ebp /* output address */
+ pushl input_len@GOTOFF(%ebx) /* input_len */
leal input_data@GOTOFF(%ebx), %eax
- pushl %eax /* input_data */
+ pushl %eax /* input_data */
leal boot_heap@GOTOFF(%ebx), %eax
- pushl %eax /* heap area */
- pushl %esi /* real mode pointer */
- call extract_kernel /* returns kernel location in %eax */
+ pushl %eax /* heap area */
+ pushl %esi /* real mode pointer */
+ call extract_kernel /* returns kernel location in %eax */
addl $24, %esp

/*
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index f6ba32cd5702..6e4704b6a94e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -508,9 +508,9 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
movq %rsi, %rdi /* real mode address */
leaq boot_heap(%rip), %rsi /* malloc area for uncompression */
leaq input_data(%rip), %rdx /* input_data */
- movl $z_input_len, %ecx /* input_len */
+ movl input_len(%rip), %ecx /* input_len */
movq %rbp, %r8 /* output target address */
- movl $z_output_len, %r9d /* decompressed length, end of relocs */
+ movl output_len(%rip), %r9d /* decompressed length, end of relocs */
call extract_kernel /* returns kernel location in %rax */
popq %rsi

diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index 7e01248765b2..52aa56cdbacc 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -60,6 +60,12 @@ int main(int argc, char *argv[])
printf(".incbin \"%s\"\n", argv[1]);
printf("input_data_end:\n");

+ printf(".section \".rodata\",\"a\",@progbits\n");
+ printf(".globl input_len\n");
+ printf("input_len:\n\t.long %lu\n", ilen);
+ printf(".globl output_len\n");
+ printf("output_len:\n\t.long %lu\n", (unsigned long)olen);
+
retval = 0;
bail:
if (f)
--
2.26.2

Arvind Sankar

unread,
May 24, 2020, 5:28:22 PM5/24/20
to Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
Add a linker script check that there are no runtime relocations, and
remove the old one that tries to check via looking for specially-named
sections in the object files.

Drop the tests for -fPIE compiler option and -pie linker option, as they
are available in all supported gcc and binutils versions (as well as
clang and lld).

Signed-off-by: Arvind Sankar <nive...@alum.mit.edu>
---
arch/x86/boot/compressed/Makefile | 28 +++-----------------------
arch/x86/boot/compressed/vmlinux.lds.S | 11 ++++++++++
2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index d3e882e855ee..679a2b383bfe 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -27,7 +27,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \
vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4

KBUILD_CFLAGS := -m$(BITS) -O2
-KBUILD_CFLAGS += -fno-strict-aliasing $(call cc-option, -fPIE, -fPIC)
+KBUILD_CFLAGS += -fno-strict-aliasing -fPIE
KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
cflags-$(CONFIG_X86_32) := -march=i386
cflags-$(CONFIG_X86_64) := -mcmodel=small
@@ -49,7 +49,7 @@ UBSAN_SANITIZE :=n
KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
# Compressed kernel should be built as PIE since it may be loaded at any
# address by the bootloader.
-KBUILD_LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
+KBUILD_LDFLAGS += -pie $(call ld-option, --no-dynamic-linker)
LDFLAGS_vmlinux := -T

hostprogs := mkpiggy
@@ -84,30 +84,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o

-# The compressed kernel is built with -fPIC/-fPIE so that a boot loader
-# can place it anywhere in memory and it will still run. However, since
-# it is executed as-is without any ELF relocation processing performed
-# (and has already had all relocation sections stripped from the binary),
-# none of the code can use data relocations (e.g. static assignments of
-# pointer values), since they will be meaningless at runtime. This check
-# will refuse to link the vmlinux if any of these relocations are found.
-quiet_cmd_check_data_rel = DATAREL $@
-define cmd_check_data_rel
- for obj in $(filter %.o,$^); do \
- $(READELF) -S $$obj | grep -qF .rel.local && { \
- echo "error: $$obj has data relocations!" >&2; \
- exit 1; \
- } || true; \
- done
-endef
-
-# We need to run two commands under "if_changed", so merge them into a
-# single invocation.
-quiet_cmd_check-and-link-vmlinux = LD $@
- cmd_check-and-link-vmlinux = $(cmd_check_data_rel); $(cmd_ld)
-
$(obj)/vmlinux: $(vmlinux-objs-y) FORCE
- $(call if_changed,check-and-link-vmlinux)
+ $(call if_changed,ld)

OBJCOPYFLAGS_vmlinux.bin := -R .comment -S
$(obj)/vmlinux.bin: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index d826ab38a8f9..0ac14feacb24 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -11,9 +11,15 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT)
#ifdef CONFIG_X86_64
OUTPUT_ARCH(i386:x86-64)
ENTRY(startup_64)
+
+#define REL .rela
+
#else
OUTPUT_ARCH(i386)
ENTRY(startup_32)
+
+#define REL .rel
+
#endif

SECTIONS
@@ -42,6 +48,9 @@ SECTIONS
*(.rodata.*)
_erodata = . ;
}
+ REL.dyn : {
+ *(REL.*)
+ }
.got : {
*(.got)
}
@@ -83,3 +92,5 @@ ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18, "Unexpected GOT/PLT en
#else
ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc, "Unexpected GOT/PLT entries detected!")
#endif
+
+ASSERT(SIZEOF(REL.dyn) == 0, "Unexpected runtime relocations detected!")
--
2.26.2

Fangrui Song

unread,
May 24, 2020, 6:13:32 PM5/24/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
Should .text.startup* be used instead? If -ffunction-sections is used,

// a.c
int main() {}

gcc -O2 a.c # .text.startup
gcc -Os a.c # .text.startup

gcc -O2 -ffunction-sections a.c # .text.startup.main
gcc -Os -ffunction-sections a.c # .text.startup.main

-----

In case anyone wants to CC a GCC dev for the citation that
main compiles to `.text.startup` in -Os or -O2 mode, I have a small request
that `.text.startup.` probably makes more sense. See

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

I made an llvm change recently https://reviews.llvm.org/D79600

Brian Gerst

unread,
May 24, 2020, 6:48:39 PM5/24/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-built-linux, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
On Sun, May 24, 2020 at 5:32 PM Arvind Sankar <nive...@alum.mit.edu> wrote:
>
> gcc puts the main function into .text.startup when compiled with -Os (or
> -O2). This results in arch/x86/boot/main.c having a .text.startup
> section which is currently not included explicitly in the linker script
> setup.ld in the same directory.

If the compiler is making assumptions based on the function name
"main" wouldn't it be simpler just to rename the function?

--
Brian Gerst

Fangrui Song

unread,
May 24, 2020, 6:54:05 PM5/24/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
On 2020-05-24, Arvind Sankar wrote:
>The assembly code in head_{32,64}.S, while meant to be
>position-independent, generates run-time relocations because it uses
>instructions such as
> leal gdt(%edx), %eax
>which make the assembler and linker think that the code is using %edx as
>an index into gdt, and hence gdt needs to be relocated to its run-time
>address.
>
>With the BFD linker, this generates a warning during the build:
> LD arch/x86/boot/compressed/vmlinux
>ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
>ld: warning: creating a DT_TEXTREL in object

Interesting. How does the build generate a warning by default?
Do you use Gentoo Linux which appears to ship a --warn-shared-textrel
enabled-by-default patch? (https://bugs.gentoo.org/700488)

% cat a.s
leal gdt(%edx), %eax
% as --32 a.s -o a.o
% ld.bfd -m elf_i386 -shared a.o -z notext # DT_TEXTREL is set. R_386_32

% ld.bfd -m elf_i386 -shared a.o # on-demand text relocations. DT_TEXTREL is set. R_386_32

% ld.bfd -m elf_i386 -shared a.o --warn-shared-textrel
ld.bfd: a.o: warning: relocation against `gdt' in read-only section `.text'
ld.bfd: warning: creating a DT_TEXTREL in a shared object

% ld.bfd -m elf_i386 -shared a.o -z text
ld.bfd: a.o: warning: relocation against `gdt' in read-only section `.text'
ld.bfd: read-only segment has dynamic relocations
## The above is an error. Output is suppressed

lld has only two modes: -z text (default) and -z notext. There is no
on-demand state. By default it will error.

There are feature requests to make -z text default, at least for some
architectures. I just found
https://sourceware.org/bugzilla/show_bug.cgi?id=20824

>With lld, Dmitry Golovin reports that this results in a link-time error
>with default options (i.e. unless -z notext is explicitly passed):
> LD arch/x86/boot/compressed/vmlinux
>ld.lld: error: can't create dynamic relocation R_386_32 against local
>symbol in readonly segment; recompile object files with -fPIC or pass
>'-Wl,-z,notext' to allow text relocations in the output
>
>Start fixing this by removing relocations from .head.text:
>- On 32-bit, use a base register that holds the address of the GOT and
> reference symbol addresses using @GOTOFF, i.e.
> leal gdt@GOTOFF(%edx), %eax

Looks good to me.

>- On 64-bit, most of the code can (and already does) use %rip-relative
> addressing, however the .code32 bits can't, and the 64-bit code also
> needs to reference symbol addresses as they will be after moving the
> compressed kernel to the end of the decompression buffer.
> For these cases, reference the symbols as an offset to startup_32 to
> avoid creating relocations, i.e.
> leal (gdt-startup_32)(%bp), %eax
> This only works in .head.text as the subtraction cannot be represented
> as a PC-relative relocation unless startup_32 is in the same section
> as the code. Move efi32_pe_entry into .head.text so that it can use
> the same method to avoid relocations.

I have a nit about the startup_32 comment. See below.
IIRC, %ebp contains the address of startup_32. la(X) references X
relative to startup_32. The fixup (in GNU as and clang integrated
assembler's term) is a constant which is resolved by the assembler.

There is no R_386_32 or R_386_PC32 for the linker to resolve.

Not very sure stating that "since startup_32 has link address 0" is very
appropriate here (probably because I did't see the term "link address"
before). If my understanding above is correct, I think you can just
reword the comment to express that X is referenced relative to
startup_32, which is stored in %ebp.

Arvind Sankar

unread,
May 24, 2020, 7:00:57 PM5/24/20
to Fangrui Song, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
It's probably unlikely we'll use function-sections on the setup code,
but *(.text.*) might be more future-proof, since gcc/clang might grow
the ability to stick code into .text.hot or .text.unlikely etc
automatically.

Fangrui Song

unread,
May 24, 2020, 7:22:20 PM5/24/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org

On 2020-05-24, Arvind Sankar wrote:
Probably worth mentioning that this works around GNU ld<2.35


This commit fixing https://sourceware.org/bugzilla/show_bug.cgi?id=25754
also fixed the bug. (Just verified that both 2.24 and 2.34 have the bug. binutils-gdb HEAD (future 2.35) is good.)

% cat a.s
pushl $z_input_len
% cat b.s
.globl z_input_len
z_input_len = 0xb612
% gcc -m32 -c a.s b.s
% ld.bfd -m elf_i386 -pie a.o b.o # has an incorrect R_386_RELATIVE before binutils 2.35


Reviewed-by: Fangrui Song <mas...@google.com>

Fangrui Song

unread,
May 24, 2020, 7:36:13 PM5/24/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, Kees Cook, linux-...@vger.kernel.org
On 2020-05-24, Arvind Sankar wrote:
`grep -qF .rel.local` from 98f78525371b55ccd1c480207ce10296c72fa340
may be incorrect.. None of these synthesized dynamic relocation sections is
called *.rel.local* ...
(it probably wanted to name .rel.data.rel.ro or .rel.data)


Reviewed-by: Fangrui Song <mas...@google.com>

Arvind Sankar

unread,
May 24, 2020, 7:44:06 PM5/24/20
to Fangrui Song, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
On Sun, May 24, 2020 at 03:53:59PM -0700, Fangrui Song wrote:
> On 2020-05-24, Arvind Sankar wrote:
> >The assembly code in head_{32,64}.S, while meant to be
> >position-independent, generates run-time relocations because it uses
> >instructions such as
> > leal gdt(%edx), %eax
> >which make the assembler and linker think that the code is using %edx as
> >an index into gdt, and hence gdt needs to be relocated to its run-time
> >address.
> >
> >With the BFD linker, this generates a warning during the build:
> > LD arch/x86/boot/compressed/vmlinux
> >ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
> >ld: warning: creating a DT_TEXTREL in object
>
> Interesting. How does the build generate a warning by default?
> Do you use Gentoo Linux which appears to ship a --warn-shared-textrel
> enabled-by-default patch? (https://bugs.gentoo.org/700488)

Ah, yes I am using gentoo. I didn't realize that was a distro
modification.

> >+
> >+/*
> >+ * This macro gives the link address of X. It's the same as X, since startup_32
> >+ * has link address 0, but defining it this way tells the assembler/linker that
> >+ * we want the link address, and not the run-time address of X. This prevents
> >+ * the linker from creating a run-time relocation entry for this reference.
> >+ * The macro should be used as a displacement with a base register containing
> >+ * the run-time address of startup_32 [i.e. la(X)(%reg)], or as an
> >+ * immediate [$ la(X)].
> >+ *
> >+ * This macro can only be used from within the .head.text section, since the
> >+ * expression requires startup_32 to be in the same section as the code being
> >+ * assembled.
> >+ */
> >+#define la(X) ((X) - startup_32)
> >+
>
> IIRC, %ebp contains the address of startup_32. la(X) references X
> relative to startup_32. The fixup (in GNU as and clang integrated
> assembler's term) is a constant which is resolved by the assembler.
>
> There is no R_386_32 or R_386_PC32 for the linker to resolve.

This is incorrect (or maybe I'm not understanding you correctly). X is a
symbol whose final location relative to startup_32 is in most cases not
known to the assembler (there are a couple of cases where X is a label
within .head.text which do get completely resolved by the assembler).

For example, taking the instruction loading the gdt address, this is
what we get from the assembler:
lea la(gdt)(%ebp), %eax
becomes in the object file:
11: 8d 85 00 00 00 00 lea 0x0(%ebp),%eax
13: R_X86_64_PC32 .data+0x23
or a cleaner example using a global symbol:
subl la(image_offset)(%ebp), %ebx
becomes
41: 2b 9d 00 00 00 00 sub 0x0(%ebp),%ebx
43: R_X86_64_PC32 image_offset+0x43

So in general you get PC32 relocations, with the addend being set by the
assembler to .-startup_32, modulo the adjustment for where within the
instruction the displacement needs to be. These relocations are resolved
by the static linker to produce constants in the final executable.


>
> Not very sure stating that "since startup_32 has link address 0" is very
> appropriate here (probably because I did't see the term "link address"
> before). If my understanding above is correct, I think you can just
> reword the comment to express that X is referenced relative to
> startup_32, which is stored in %ebp.
>

Yeah, the more standard term is virtual address/VMA, but that sounds
confusing to me with PIE code since the _actual_ virtual address at
which this code is going to run isn't 0, that's just the address assumed
for linking. I can reword it to avoid referencing "link address" but
then it's not obvious why the macro is named "la" :) I'm open to
suggestions on a better name, I could use offset but that's a bit
long-winded. I could just use vma() if nobody else finds it confusing.

Thanks.

Fangrui Song

unread,
May 24, 2020, 7:49:36 PM5/24/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
*(.text.*) looks good to me. When you send PATCH v2, feel free to add

(There is indeed no guarantee that future clang FDO will not place it .text.unknown. :)

Reviewed-by: Fangrui Song <mas...@google.com>

Arvind Sankar

unread,
May 24, 2020, 7:57:51 PM5/24/20
to Fangrui Song, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, Kees Cook, linux-...@vger.kernel.org
On Sun, May 24, 2020 at 04:36:07PM -0700, Fangrui Song wrote:
>
> `grep -qF .rel.local` from 98f78525371b55ccd1c480207ce10296c72fa340
> may be incorrect.. None of these synthesized dynamic relocation sections is
> called *.rel.local* ...
> (it probably wanted to name .rel.data.rel.ro or .rel.data)
>
>
> Reviewed-by: Fangrui Song <mas...@google.com>

At least from gcc you get .data.rel.local sections if you have, for eg:
struct { void *p; } foo = { .p = &bar };
where bar is defined in the same file. These aren't relocation sections,
foo is actually placed in the .data.rel.local section instead of .data.

But yeah, it's incomplete, you wouldn't catch it if bar was external
(foo goes in .data.rel) or foo was const (foo goes in .data.rel.ro*).

Arvind Sankar

unread,
May 24, 2020, 7:58:59 PM5/24/20
to Fangrui Song, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
On Sun, May 24, 2020 at 04:22:14PM -0700, Fangrui Song wrote:
>
> Probably worth mentioning that this works around GNU ld<2.35

Thanks, I'll add that in v2.

Fangrui Song

unread,
May 24, 2020, 8:55:18 PM5/24/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
You are right. I am not familiar with the code and only looked at 1b.

Just preprocessed head_64.S and verified many target symbols are in
.data and .pgtable The assembler converts an expression `foo - symbol_defined_in_same_section`
to be `foo - . + offset` which can be encoded as an R_X86_64_PC32 (or
resolved the fixup if it is a constant, e.g. `1b - startup_32`)

>>
>> Not very sure stating that "since startup_32 has link address 0" is very
>> appropriate here (probably because I did't see the term "link address"
>> before). If my understanding above is correct, I think you can just
>> reword the comment to express that X is referenced relative to
>> startup_32, which is stored in %ebp.
>>
>
>Yeah, the more standard term is virtual address/VMA, but that sounds
>confusing to me with PIE code since the _actual_ virtual address at
>which this code is going to run isn't 0, that's just the address assumed
>for linking. I can reword it to avoid referencing "link address" but
>then it's not obvious why the macro is named "la" :) I'm open to
>suggestions on a better name, I could use offset but that's a bit
>long-winded. I could just use vma() if nobody else finds it confusing.
>
>Thanks.

With your approach, the important property is that "the distance between
startup_32 and the target symbol is a constant", not that "startup_32
has link address 0". `ra`, `rva`, `rvma` or any other term which expresses "relative"
should work. Hope someone can come up with a good suggestion:)

Ard Biesheuvel

unread,
May 25, 2020, 2:10:32 AM5/25/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-built-linux, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
Do we really need the macro here? Could we just do

.rel.dyn : { *(.rel.* .rela.*) }

(or even

.rel.dyn : { *(.rel.* }
.rela.dyn : { *(.rela.*) }

if the output section name matters, and always assert that both are empty)?

Ard Biesheuvel

unread,
May 25, 2020, 3:10:36 AM5/25/20
to Arvind Sankar, Ingo Molnar, Borislav Petkov, Thomas Gleixner, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-built-linux, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
On Sun, 24 May 2020 at 23:28, Arvind Sankar <nive...@alum.mit.edu> wrote:
>
> The compressed kernel currently contains bogus runtime relocations in
> the startup code in head_{32,64}.S, which are generated by the linker,
> but must not actually be processed at runtime.
>
> This generates warnings when linking with the BFD linker, and errors
> with LLD, which defaults to erroring on runtime relocations in read-only
> sections. It also requires the -z noreloc-overflow hack for the 64-bit
> kernel, which prevents us from linking it as -pie on an older BFD linker
> (<= 2.26) or on LLD, because the locations that are to be apparently
> relocated are only 32-bits in size and so cannot normally have
> R_X86_64_RELATIVE relocations.
>
> This series aims to get rid of these relocations. It is based on
> efi/next (efi-changes-for-v5.8), where the latest patches touch the
> head code to eliminate the global offset table.
>

Note: I dropped my decompressor linker script changes from that tag,
but they are still at the top of the efi/next branch.

Given these changes to go on top, I think it is better to merge all of
them separately, and let the x86 maintainers decide how and when.
(I can prepare a branch and a separate PR if desired)

For the series (modulo one nit in a separate reply)

Reviewed-by: Ard Biesheuvel <ar...@kernel.org>

Fangrui Song

unread,
May 25, 2020, 12:26:32 PM5/25/20
to Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Dmitry Golovin, clang-built-linux, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
The output section name does not matter: it will be discarded by the linker.

>.rel.dyn : { *(.rel.* .rela.*) }

If for some reasons there is at least one SHT_REL and at least one
SHT_RELA, LLD will error "section type mismatch for .rel.dyn", while the
intended diagnostic is the assert below.

>(or even
>
>.rel.dyn : { *(.rel.* }
>.rela.dyn : { *(.rela.*) }
>
>if the output section name matters, and always assert that both are empty)?

.rel.dyn : { *(.rel.* }
.rela.dyn : { *(.rela.*) }

looks good to me.


FWIW I intend to add -z rel and -z rela to LLD: https://reviews.llvm.org/D80496#inline-738804
(binutils thread https://sourceware.org/pipermail/binutils/2020-May/111244.html)

In case someone builds the x86-64 kernel with -z rel, your suggested
input section description will work out of the box...

Arvind Sankar

unread,
May 25, 2020, 3:22:24 PM5/25/20
to Fangrui Song, Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Dmitry Golovin, clang-built-linux, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
Ok with me.

Arvind Sankar

unread,
May 25, 2020, 6:59:21 PM5/25/20
to Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
The compressed kernel currently contains bogus runtime relocations in
the startup code in head_{32,64}.S, which are generated by the linker,
but must not actually be processed at runtime.

This generates warnings when linking with the BFD linker, and errors
with LLD, which defaults to erroring on runtime relocations in read-only
sections. It also requires the -z noreloc-overflow hack for the 64-bit
kernel, which prevents us from linking it as -pie on an older BFD linker
(<= 2.26) or on LLD, because the locations that are to be apparently
relocated are only 32-bits in size and so cannot normally have
R_X86_64_RELATIVE relocations.

This series aims to get rid of these relocations. It is based on
efi/next, where the latest patches touch the head code to eliminate the
global offset table.

The first patch is an independent fix for LLD, to avoid an orphan
section in arch/x86/boot/setup.elf.

The second patch gets rid of almost all the relocations. It uses
standard PIC addressing technique for 32-bit, i.e. loading a register
with the address of _GLOBAL_OFFSET_TABLE_ and then using GOTOFF
references to access variables. For 64-bit, there is 32-bit code that
cannot use RIP-relative addressing, and also cannot use the 32-bit
method, since GOTOFF references are 64-bit only. This is instead handled
using a macro to replace a reference like gdt with (gdt-startup_32)
instead. The assembler will generate a PC32 relocation entry, with
addend set to (.-startup_32), and these will be replaced with constants
at link time. This works as long as all the code using such references
lives in the same section as startup_32, i.e. in .head.text.

The third patch addresses a remaining issue with the BFD linker, which
insists on generating runtime relocations for absolute symbols. We use
z_input_len and z_output_len, defined in the generated piggy.S file, as
symbols whose absolute "addresses" are actually the size of the
compressed payload and the size of the decompressed kernel image
respectively. LLD does not generate relocations for these two symbols,
but the BFD linker does, prior to the upcoming 2.35. To get around this,
piggy.S is extended to also define two u32 variables (in .rodata) with
the lengths, and the head code is modified to use those instead of the
symbol addresses.

An alternative way to handle z_input_len/z_output_len would be to just
include piggy.S in head_{32,64}.S instead of as a separate object file,
since the GNU assembler doesn't generate relocations for symbols set to
constants.

The last patch adds a check in the linker script to ensure that no
runtime relocations get reintroduced. Since the GOT has been eliminated
as well, the compressed kernel has no runtime relocations whatsoever any
more.

Changes from v1:
- Add .text.* to setup.ld instead of just .text.startup
- Rename the la() macro introduced in the second patch for 64-bit to
rva(), and rework the explanatory comment.
- In the last patch, check both .rel.dyn and .rela.dyn, instead of just
one per arch.

Arvind Sankar (4):
x86/boot: Add .text.* to setup.ld
x86/boot: Remove run-time relocations from .head.text code
x86/boot: Remove runtime relocations from head_{32,64}.S
x86/boot: Check that there are no runtime relocations

arch/x86/boot/compressed/Makefile | 36 +--------
arch/x86/boot/compressed/head_32.S | 59 +++++++-------
arch/x86/boot/compressed/head_64.S | 108 +++++++++++++++----------
arch/x86/boot/compressed/mkpiggy.c | 6 ++
arch/x86/boot/compressed/vmlinux.lds.S | 8 ++
arch/x86/boot/setup.ld | 2 +-
6 files changed, 115 insertions(+), 104 deletions(-)

--
2.26.2

Arvind Sankar

unread,
May 25, 2020, 6:59:21 PM5/25/20
to Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
Add .text.* to the .text output section to fix this, and also prevent
any future surprises if the compiler decides to create other such
sections.

[1] https://reviews.llvm.org/D75225

Signed-off-by: Arvind Sankar <nive...@alum.mit.edu>
---
arch/x86/boot/setup.ld | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 24c95522f231..49546c247ae2 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -20,7 +20,7 @@ SECTIONS
.initdata : { *(.initdata) }
__end_init = .;

- .text : { *(.text) }
+ .text : { *(.text .text.*) }

Arvind Sankar

unread,
May 25, 2020, 6:59:22 PM5/25/20
to Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
The assembly code in head_{32,64}.S, while meant to be
position-independent, generates run-time relocations because it uses
instructions such as
leal gdt(%edx), %eax
which make the assembler and linker think that the code is using %edx as
an index into gdt, and hence gdt needs to be relocated to its run-time
address.

On 32-bit, with lld Dmitry Golovin reports that this results in a
link-time error with default options (i.e. unless -z notext is
explicitly passed):
LD arch/x86/boot/compressed/vmlinux
ld.lld: error: can't create dynamic relocation R_386_32 against local
symbol in readonly segment; recompile object files with -fPIC or pass
'-Wl,-z,notext' to allow text relocations in the output

With the BFD linker, this generates a warning during the build, if
--warn-shared-textrel is enabled, which at least Gentoo enables by
default:
LD arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
ld: warning: creating a DT_TEXTREL in object

On 64-bit, it is not possible to link the kernel as -pie with lld, and
it is only possible with a BFD linker that supports -z noreloc-overflow,
i.e. versions >2.26. This is because these instructions cannot really be
relocated: the displacement field is only 32-bits wide, and thus cannot
be relocated for a 64-bit load address. The -z noreloc-overflow option
simply overrides the linker error, and results in R_X86_64_RELATIVE
relocations that apply a 64-bit relocation to a 32-bit field anyway.
This happens to work because nothing will process these run-time
relocations.

Start fixing this by removing relocations from .head.text:
- On 32-bit, use a base register that holds the address of the GOT and
reference symbol addresses using @GOTOFF, i.e.
leal gdt@GOTOFF(%edx), %eax
- On 64-bit, most of the code can (and already does) use %rip-relative
addressing, however the .code32 bits can't, and the 64-bit code also
needs to reference symbol addresses as they will be after moving the
compressed kernel to the end of the decompression buffer.
For these cases, reference the symbols as an offset to startup_32 to
avoid creating relocations, i.e.
leal (gdt-startup_32)(%bp), %eax
This only works in .head.text as the subtraction cannot be represented
as a PC-relative relocation unless startup_32 is in the same section
as the code. Move efi32_pe_entry into .head.text so that it can use
the same method to avoid relocations.

Signed-off-by: Arvind Sankar <nive...@alum.mit.edu>
---
arch/x86/boot/compressed/head_32.S | 40 +++++------
arch/x86/boot/compressed/head_64.S | 104 ++++++++++++++++++-----------
2 files changed, 86 insertions(+), 58 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index dfa4131c65df..66657bb99aae 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
pushl %ebp /* output address */

pushl $z_input_len /* input_len */
- leal input_data(%ebx), %eax
+ leal input_data@GOTOFF(%ebx), %eax
pushl %eax /* input_data */
- leal boot_heap(%ebx), %eax
+ leal boot_heap@GOTOFF(%ebx), %eax
pushl %eax /* heap area */
pushl %esi /* real mode pointer */
call extract_kernel /* returns kernel location in %eax */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 706fbf6eef53..18b0edcb23d2 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -42,6 +42,32 @@
.hidden _ebss

__HEAD
+
+/*
+ * This macro gives the relative virtual address of X, i.e. the offset of X
+ * from startup_32. This is the same as the link-time virtual address of X,
+ * since startup_32 is at 0, but defining it this way tells the
+ * assembler/linker that we do not want the actual run-time address of X. This
+ * prevents the linker from trying to create unwanted run-time relocation
+ * entries for the reference when the compressed kernel is linked as PIE.
+ *
+ * A reference X(%reg) will result in the link-time VA of X being stored with
+ * the instruction, and a run-time R_X86_64_RELATIVE relocation entry that
+ * adds the 64-bit base address where the kernel is loaded.
+ *
+ * Replacing it with (X-startup_32)(%reg) results in the offset being stored,
+ * and no run-time relocation.
+ *
+ * The macro should be used as a displacement with a base register containing
+ * the run-time address of startup_32 [i.e. rva(X)(%reg)], or as an immediate
+ * [$ rva(X)].
+ *
+ * This macro can only be used from within the .head.text section, since the
+ * expression requires startup_32 to be in the same section as the code being
+ * assembled.
+ */
+#define rva(X) ((X) - startup_32)
+
.code32
SYM_FUNC_START(startup_32)
/*
@@ -64,10 +90,10 @@ SYM_FUNC_START(startup_32)
leal (BP_scratch+4)(%esi), %esp
call 1f
1: popl %ebp
- subl $1b, %ebp
+ subl $ rva(1b), %ebp

/* Load new GDT with the 64bit segments using 32bit descriptor */
- leal gdt(%ebp), %eax
+ leal rva(gdt)(%ebp), %eax
movl %eax, 2(%eax)
lgdt (%eax)

@@ -80,7 +106,7 @@ SYM_FUNC_START(startup_32)
movl %eax, %ss

/* setup a stack and make sure cpu supports long mode. */
- leal boot_stack_end(%ebp), %esp
+ leal rva(boot_stack_end)(%ebp), %esp

call verify_cpu
testl %eax, %eax
@@ -107,7 +133,7 @@ SYM_FUNC_START(startup_32)
* image_offset = startup_32 - image_base
* Otherwise image_offset will be zero and has no effect on the calculations.
*/
- subl image_offset(%ebp), %ebx
+ subl rva(image_offset)(%ebp), %ebx
#endif

movl BP_kernel_alignment(%esi), %eax
@@ -123,7 +149,7 @@ SYM_FUNC_START(startup_32)

/* Target address to relocate to for decompression */
addl BP_init_size(%esi), %ebx
- subl $_end, %ebx
+ subl $ rva(_end), %ebx

/*
* Prepare for entering 64 bit mode
@@ -151,19 +177,19 @@ SYM_FUNC_START(startup_32)
1:

/* Initialize Page tables to 0 */
- leal pgtable(%ebx), %edi
+ leal rva(pgtable)(%ebx), %edi
xorl %eax, %eax
movl $(BOOT_INIT_PGT_SIZE/4), %ecx
rep stosl

/* Build Level 4 */
- leal pgtable + 0(%ebx), %edi
+ leal rva(pgtable + 0)(%ebx), %edi
leal 0x1007 (%edi), %eax
movl %eax, 0(%edi)
addl %edx, 4(%edi)

/* Build Level 3 */
- leal pgtable + 0x1000(%ebx), %edi
+ leal rva(pgtable + 0x1000)(%ebx), %edi
leal 0x1007(%edi), %eax
movl $4, %ecx
1: movl %eax, 0x00(%edi)
@@ -174,7 +200,7 @@ SYM_FUNC_START(startup_32)
jnz 1b

/* Build Level 2 */
- leal pgtable + 0x2000(%ebx), %edi
+ leal rva(pgtable + 0x2000)(%ebx), %edi
movl $0x00000183, %eax
movl $2048, %ecx
1: movl %eax, 0(%edi)
@@ -185,7 +211,7 @@ SYM_FUNC_START(startup_32)
jnz 1b

/* Enable the boot page tables */
- leal pgtable(%ebx), %eax
+ leal rva(pgtable)(%ebx), %eax
movl %eax, %cr3

/* Enable Long mode in EFER (Extended Feature Enable Register) */
@@ -211,17 +237,17 @@ SYM_FUNC_START(startup_32)
* used to perform that far jump.
*/
pushl $__KERNEL_CS
- leal startup_64(%ebp), %eax
+ leal rva(startup_64)(%ebp), %eax
#ifdef CONFIG_EFI_MIXED
- movl efi32_boot_args(%ebp), %edi
+ movl rva(efi32_boot_args)(%ebp), %edi
cmp $0, %edi
jz 1f
- leal efi64_stub_entry(%ebp), %eax
- movl efi32_boot_args+4(%ebp), %esi
- movl efi32_boot_args+8(%ebp), %edx // saved bootparams pointer
+ leal rva(efi64_stub_entry)(%ebp), %eax
+ movl rva(efi32_boot_args+4)(%ebp), %esi
+ movl rva(efi32_boot_args+8)(%ebp), %edx // saved bootparams pointer
cmpl $0, %edx
jnz 1f
- leal efi_pe_entry(%ebp), %eax
+ leal rva(efi_pe_entry)(%ebp), %eax
movl %edi, %ecx // MS calling convention
movl %esi, %edx
1:
@@ -246,18 +272,18 @@ SYM_FUNC_START(efi32_stub_entry)

call 1f
1: pop %ebp
- subl $1b, %ebp
+ subl $ rva(1b), %ebp

- movl %esi, efi32_boot_args+8(%ebp)
+ movl %esi, rva(efi32_boot_args+8)(%ebp)
SYM_INNER_LABEL(efi32_pe_stub_entry, SYM_L_LOCAL)
- movl %ecx, efi32_boot_args(%ebp)
- movl %edx, efi32_boot_args+4(%ebp)
- movb $0, efi_is64(%ebp)
+ movl %ecx, rva(efi32_boot_args)(%ebp)
+ movl %edx, rva(efi32_boot_args+4)(%ebp)
+ movb $0, rva(efi_is64)(%ebp)

/* Save firmware GDTR and code/data selectors */
- sgdtl efi32_boot_gdt(%ebp)
- movw %cs, efi32_boot_cs(%ebp)
- movw %ds, efi32_boot_ds(%ebp)
+ sgdtl rva(efi32_boot_gdt)(%ebp)
+ movw %cs, rva(efi32_boot_cs)(%ebp)
+ movw %ds, rva(efi32_boot_ds)(%ebp)

/* Disable paging */
movl %cr0, %eax
@@ -336,11 +362,11 @@ SYM_CODE_START(startup_64)

/* Target address to relocate to for decompression */
movl BP_init_size(%rsi), %ebx
- subl $_end, %ebx
+ subl $ rva(_end), %ebx
addq %rbp, %rbx

/* Set up the stack */
- leaq boot_stack_end(%rbx), %rsp
+ leaq rva(boot_stack_end)(%rbx), %rsp

/*
* At this point we are in long mode with 4-level paging enabled,
@@ -406,7 +432,7 @@ SYM_CODE_START(startup_64)
lretq
trampoline_return:
/* Restore the stack, the 32-bit trampoline uses its own stack */
- leaq boot_stack_end(%rbx), %rsp
+ leaq rva(boot_stack_end)(%rbx), %rsp

/*
* cleanup_trampoline() would restore trampoline memory.
@@ -418,7 +444,7 @@ trampoline_return:
* this function call.
*/
pushq %rsi
- leaq top_pgtable(%rbx), %rdi
+ leaq rva(top_pgtable)(%rbx), %rdi
call cleanup_trampoline
popq %rsi

@@ -432,9 +458,9 @@ trampoline_return:
*/
pushq %rsi
leaq (_bss-8)(%rip), %rsi
- leaq (_bss-8)(%rbx), %rdi
- movq $_bss /* - $startup_32 */, %rcx
- shrq $3, %rcx
+ leaq rva(_bss-8)(%rbx), %rdi
+ movl $(_bss - startup_32), %ecx
+ shrl $3, %ecx
std
rep movsq
cld
@@ -445,15 +471,15 @@ trampoline_return:
* during extract_kernel below. To avoid any issues, repoint the GDTR
* to the new copy of the GDT.
*/
- leaq gdt64(%rbx), %rax
- leaq gdt(%rbx), %rdx
+ leaq rva(gdt64)(%rbx), %rax
+ leaq rva(gdt)(%rbx), %rdx
movq %rdx, 2(%rax)
lgdt (%rax)

/*
* Jump to the relocated address.
*/
- leaq .Lrelocated(%rbx), %rax
+ leaq rva(.Lrelocated)(%rbx), %rax
jmp *%rax
SYM_CODE_END(startup_64)

@@ -465,7 +491,7 @@ SYM_FUNC_START_ALIAS(efi_stub_entry)
movq %rdx, %rbx /* save boot_params pointer */
call efi_main
movq %rbx,%rsi
- leaq startup_64(%rax), %rax
+ leaq rva(startup_64)(%rax), %rax
jmp *%rax
SYM_FUNC_END(efi64_stub_entry)
SYM_FUNC_END_ALIAS(efi_stub_entry)
@@ -628,7 +654,7 @@ SYM_DATA(efi_is64, .byte 1)
#define BS32_handle_protocol 88 // offsetof(efi_boot_services_32_t, handle_protocol)
#define LI32_image_base 32 // offsetof(efi_loaded_image_32_t, image_base)

- .text
+ __HEAD
.code32
SYM_FUNC_START(efi32_pe_entry)
/*
@@ -650,12 +676,12 @@ SYM_FUNC_START(efi32_pe_entry)

call 1f
1: pop %ebx
- subl $1b, %ebx
+ subl $ rva(1b), %ebx

/* Get the loaded image protocol pointer from the image handle */
leal -4(%ebp), %eax
pushl %eax // &loaded_image
- leal loaded_image_proto(%ebx), %eax
+ leal rva(loaded_image_proto)(%ebx), %eax
pushl %eax // pass the GUID address
pushl 8(%ebp) // pass the image handle

@@ -690,7 +716,7 @@ SYM_FUNC_START(efi32_pe_entry)
* use it before we get to the 64-bit efi_pe_entry() in C code.
*/
subl %esi, %ebx
- movl %ebx, image_offset(%ebp) // save image_offset
+ movl %ebx, rva(image_offset)(%ebp) // save image_offset

Arvind Sankar

unread,
May 25, 2020, 6:59:23 PM5/25/20
to Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
The BFD linker generates runtime relocations for z_input_len and
z_output_len, even though they are absolute symbols.

This is fixed for binutils-2.35 [1]. Work around this for earlier
versions by defining two variables input_len and output_len in addition
to the symbols, and use them via position-independent references.

This eliminates the last two runtime relocations in the head code and
allows us to drop the -z noreloc-overflow flag to the linker.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=25754

Signed-off-by: Arvind Sankar <nive...@alum.mit.edu>
---
arch/x86/boot/compressed/Makefile | 8 --------
arch/x86/boot/compressed/head_32.S | 17 ++++++++---------
arch/x86/boot/compressed/head_64.S | 4 ++--
arch/x86/boot/compressed/mkpiggy.c | 6 ++++++
4 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index aa9ed814e5fa..d3e882e855ee 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -49,15 +49,7 @@ UBSAN_SANITIZE :=n
KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
# Compressed kernel should be built as PIE since it may be loaded at any
# address by the bootloader.
-ifeq ($(CONFIG_X86_32),y)
KBUILD_LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
-else
-# To build 64-bit compressed kernel as PIE, we disable relocation
-# overflow check to avoid relocation overflow error with a new linker
-# command-line option, -z noreloc-overflow.
-KBUILD_LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \
- && echo "-z noreloc-overflow -pie --no-dynamic-linker")
-endif
LDFLAGS_vmlinux := -T

hostprogs := mkpiggy
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 66657bb99aae..064e895bad92 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -193,18 +193,17 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
/*
* Do the extraction, and jump to the new kernel..
*/
- /* push arguments for extract_kernel: */
- pushl $z_output_len /* decompressed length, end of relocs */
+ /* push arguments for extract_kernel: */

- pushl %ebp /* output address */
-
- pushl $z_input_len /* input_len */
+ pushl output_len@GOTOFF(%ebx) /* decompressed length, end of relocs */
+ pushl %ebp /* output address */
+ pushl input_len@GOTOFF(%ebx) /* input_len */
leal input_data@GOTOFF(%ebx), %eax
- pushl %eax /* input_data */
+ pushl %eax /* input_data */
leal boot_heap@GOTOFF(%ebx), %eax
- pushl %eax /* heap area */
- pushl %esi /* real mode pointer */
- call extract_kernel /* returns kernel location in %eax */
+ pushl %eax /* heap area */
+ pushl %esi /* real mode pointer */
+ call extract_kernel /* returns kernel location in %eax */
addl $24, %esp

/*
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 18b0edcb23d2..4b7ad1dfbea6 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -517,9 +517,9 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)

Arvind Sankar

unread,
May 25, 2020, 6:59:24 PM5/25/20
to Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
Add a linker script check that there are no runtime relocations, and
remove the old one that tries to check via looking for specially-named
sections in the object files.

Drop the tests for -fPIE compiler option and -pie linker option, as they
are available in all supported gcc and binutils versions (as well as
clang and lld).

Signed-off-by: Arvind Sankar <nive...@alum.mit.edu>
---
arch/x86/boot/compressed/Makefile | 28 +++-----------------------
arch/x86/boot/compressed/vmlinux.lds.S | 8 ++++++++
2 files changed, 11 insertions(+), 25 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index d3e882e855ee..679a2b383bfe 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -27,7 +27,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \
vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4

KBUILD_CFLAGS := -m$(BITS) -O2
-KBUILD_CFLAGS += -fno-strict-aliasing $(call cc-option, -fPIE, -fPIC)
+KBUILD_CFLAGS += -fno-strict-aliasing -fPIE
KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
cflags-$(CONFIG_X86_32) := -march=i386
cflags-$(CONFIG_X86_64) := -mcmodel=small
@@ -49,7 +49,7 @@ UBSAN_SANITIZE :=n
KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
# Compressed kernel should be built as PIE since it may be loaded at any
# address by the bootloader.
-KBUILD_LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
+KBUILD_LDFLAGS += -pie $(call ld-option, --no-dynamic-linker)
LDFLAGS_vmlinux := -T

hostprogs := mkpiggy
index d826ab38a8f9..f9902c6ffe29 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -42,6 +42,12 @@ SECTIONS
*(.rodata.*)
_erodata = . ;
}
+ .rel.dyn : {
+ *(.rel.*)
+ }
+ .rela.dyn : {
+ *(.rela.*)
+ }
.got : {
*(.got)
}
@@ -83,3 +89,5 @@ ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18, "Unexpected GOT/PLT en
#else
ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc, "Unexpected GOT/PLT entries detected!")
#endif
+
+ASSERT(SIZEOF(.rel.dyn) == 0 && SIZEOF(.rela.dyn) == 0, "Unexpected runtime relocations detected!")
--
2.26.2

Fangrui Song

unread,
May 25, 2020, 8:37:11 PM5/25/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Dmitry Golovin, clang-bu...@googlegroups.com, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
On 2020-05-24, Arvind Sankar wrote:
All 4 commits look good.

Reviewed-by: Fangrui Song <mas...@google.com>

Ard Biesheuvel

unread,
May 26, 2020, 2:12:09 AM5/26/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-built-linux, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
Do we still need -pie linking with these changes applied?

Sedat Dilek

unread,
May 26, 2020, 8:29:40 AM5/26/20
to Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, x...@kernel.org, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Ard Biesheuvel, Masahiro Yamada, Daniel Kiper, linux-...@vger.kernel.org
Hi,

I would like to test this patchset v2 on top of Linux v5.7-rc7 together with:

[1] x86/boot: Discard .discard.unreachable for arch/x86/boot/compressed/vmlinux
[2] x86/boot: Correct relocation destination on old linkers

I tried to pull efi/next on top of Linux v5.7-rc7 and cleaned up the
merge problems, but I am not sure I did it correctly.
So, which patches are really relevant from efi/next?

What's your suggestions?

Regards,
- Sedat -

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=d6ee6529436a15a0541aff6e1697989ee7dc2c44
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=5214028dd89e49ba27007c3ee475279e584261f0

> Arvind Sankar (4):
> x86/boot: Add .text.* to setup.ld
> x86/boot: Remove run-time relocations from .head.text code
> x86/boot: Remove runtime relocations from head_{32,64}.S
> x86/boot: Check that there are no runtime relocations
>
> arch/x86/boot/compressed/Makefile | 36 +--------
> arch/x86/boot/compressed/head_32.S | 59 +++++++-------
> arch/x86/boot/compressed/head_64.S | 108 +++++++++++++++----------
> arch/x86/boot/compressed/mkpiggy.c | 6 ++
> arch/x86/boot/compressed/vmlinux.lds.S | 8 ++
> arch/x86/boot/setup.ld | 2 +-
> 6 files changed, 115 insertions(+), 104 deletions(-)
>
> --
> 2.26.2
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-li...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20200525225918.1624470-1-nivedita%40alum.mit.edu.

Ard Biesheuvel

unread,
May 26, 2020, 8:31:00 AM5/26/20
to sedat...@gmail.com, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
efi/next is here:

https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git/log/?h=next

You'll need the top 3 patches.

Sedat Dilek

unread,
May 26, 2020, 8:34:07 AM5/26/20
to Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
Thanks /o\.

- Sedat -

Sedat Dilek

unread,
May 26, 2020, 8:44:41 AM5/26/20
to Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
Are those diffs correct when using "x86/boot: Correct relocation
destination on old linkers"?

$ cat ../head_32_S.diff
diff --cc arch/x86/boot/compressed/head_32.S
index 064e895bad92,03557f2174bf..000000000000
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@@ -49,13 -49,17 +49,14 @@@
* Position Independent Executable (PIE) so that linker won't optimize
* R_386_GOT32X relocation to its fixed symbol address. Older
* linkers generate R_386_32 relocations against locally defined symbols,
- * _bss, _ebss, in PIE. It isn't wrong, just suboptimal compared
- * _bss, _ebss, _got, _egot and _end, in PIE. It isn't wrong, just less
- * optimal than R_386_RELATIVE. But the x86 kernel fails to properly handle
++ * _bss, _ebss, _end in PIE. It isn't wrong, just suboptimal compared
+ * to R_386_RELATIVE. But the x86 kernel fails to properly handle
* R_386_32 relocations when relocating the kernel. To generate
- * R_386_RELATIVE relocations, we mark _bss and _ebss as hidden:
- * R_386_RELATIVE relocations, we mark _bss, _ebss, _got, _egot and _end as
- * hidden:
++ * R_386_RELATIVE relocations, we mark _bss, _ebss and _end as hidden:
*/
.hidden _bss
.hidden _ebss
- .hidden _got
- .hidden _egot
+ .hidden _end

__HEAD
SYM_FUNC_START(startup_32)

$ cat ../head_64_S.diff
diff --cc arch/x86/boot/compressed/head_64.S
index 4b7ad1dfbea6,76d1d64d51e3..000000000000
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@@ -40,34 -40,11 +40,35 @@@
*/
.hidden _bss
.hidden _ebss
- .hidden _got
- .hidden _egot
+ .hidden _end
Thanks.

- Sedat -

Arvind Sankar

unread,
May 26, 2020, 10:47:52 AM5/26/20
to Sedat Dilek, Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
On Tue, May 26, 2020 at 02:44:29PM +0200, Sedat Dilek wrote:
>
> Are those diffs correct when using "x86/boot: Correct relocation
> destination on old linkers"?
>

It looks ok, but that patch (and even marking the other symbols .hidden)
should be unnecessary after this series.

Sedat Dilek

unread,
May 26, 2020, 10:49:10 AM5/26/20
to Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
With LLVM/Clang/LLD I see:

mycompiler -Wp,-MD,arch/x86/boot/compressed/.kernel_info.o.d
-nostdinc -isystem
/home/dileks/src/llvm-toolchain/install/lib/clang/10.0.1rc1/include
-I./arch/x86/include -I./arch/x86/include/generated -I./include
-I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include
./include/linux/kconfig.h -D__KERNEL__ -Qunused-arguments -m64 -O2
-fno-strict-aliasing -fPIE -DDISABLE_BRANCH_PROFILING -mcmodel=small
-mno-mmx -mno-sse -ffreestanding -fno-stack-protector
-Wno-address-of-packed-member -Wno-gnu -Wno-pointer-sign
-fmacro-prefix-map=./= -fno-asynchronous-unwind-tables -include
hidden.h -D__ASSEMBLY__ -c -o
arch/x86/boot/compressed/kernel_info.o
arch/x86/boot/compressed/kernel_info.S
<built-in>:345:10: fatal error: 'hidden.h' file not found
#include "hidden.h"
^~~~~~~~~~
1 error generated.
make[5]: *** [scripts/Makefile.build:349:
arch/x86/boot/compressed/kernel_info.o] Error 1
make[4]: *** [arch/x86/boot/Makefile:114:
arch/x86/boot/compressed/vmlinux] Error 2
make[4]: *** Waiting for unfinished jobs....

mycompiler is a wrapper-script to use ccache * clang-10 as compiler.

patchset to previous build:

$ git log --no-merges --oneline 5.7.0-rc7-1-amd64-clang..5.7.0-rc7-2-amd64-clang
8b74901cb9e5 (for-5.7/x86-boot-remove-runtime-relocations-from-compressed-kernel-v2-nivedita76)
x86/boot: Check that there are no runtime relocations
83fb1bc3b076 x86/boot: Remove runtime relocations from head_{32,64}.S
fede23dacbbd x86/boot: Remove run-time relocations from .head.text code
3e5ea481b8fb x86/boot: Add .text.* to setup.ld
bec910ba3d67 x86/boot/compressed: Get rid of GOT fixup code
be834cee6f39 x86/boot/compressed: Force hidden visibility for all
symbol references
9b7c7d8d2d7b x86/boot/compressed: Move .got.plt entries out of the .got section
ba6a49f806a0 (for-5.7/kbuild-compressed-debug-info) Makefile: support
compressed debug info
a1fb11944d11 (for-5.7/x86-boot-nivedita76) x86/boot: Correct
relocation destination on old linkers
c70e3890058f (for-5.7/x86-build-maskray) x86/boot: Discard
.discard.unreachable for arch/x86/boot/compressed/vmlinux

$ find ./ -name hidden.h
./drivers/firmware/efi/libstub/hidden.h
./arch/x86/boot/compressed/hidden.h

Any thoughts?

- Sedat -

Sedat Dilek

unread,
May 26, 2020, 10:50:49 AM5/26/20
to Arvind Sankar, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
You mean _bss, _ebss and _end?

- Sedat -

Sedat Dilek

unread,
May 26, 2020, 10:55:25 AM5/26/20
to Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
Maybe this should be:

[ arch/x86/boot/compressed/Makefile ]

-KBUILD_CFLAGS += -include hidden.h
+KBUILD_CFLAGS += -include ./hidden.h

- Sedat -

Sedat Dilek

unread,
May 26, 2020, 11:07:36 AM5/26/20
to Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
NOPE.

This works:

[ arch/x86/boot/compressed/Makefile ]

-KBUILD_CFLAGS += -include hidden.h
+KBUILD_CFLAGS += -include ./arch/x86/boot/compressed/hidden.h

$ ll arch/x86/boot/bzImage arch/x86/boot/compressed/vmlinux
-rw-r--r-- 1 dileks dileks 6,5M Mai 26 17:05 arch/x86/boot/bzImage
-rwxr-xr-x 1 dileks dileks 6,5M Mai 26 17:05 arch/x86/boot/compressed/vmlinux

- Sedat -

Arvind Sankar

unread,
May 26, 2020, 11:16:26 AM5/26/20
to Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, clang-built-linux, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
On Tue, May 26, 2020 at 08:11:56AM +0200, Ard Biesheuvel wrote:
> On Tue, 26 May 2020 at 00:59, Arvind Sankar <nive...@alum.mit.edu> wrote:
> > # Compressed kernel should be built as PIE since it may be loaded at any
> > # address by the bootloader.
> > -KBUILD_LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
> > +KBUILD_LDFLAGS += -pie $(call ld-option, --no-dynamic-linker)
>
> Do we still need -pie linking with these changes applied?
>

I think it's currently not strictly necessary -- eg the 64bit kernel
doesn't get linked as pie right now with LLD or old binutils. However,
it is safer to do so to ensure that the result remains PIC with future
versions of the linker. There are linker optimizations that can convert
certain PIC instructions when PIE is disabled. While I think they
currently all focus on eliminating indirection through the GOT (and thus
wouldn't be applicable any more), it's easy to imagine that they could
get extended to, for eg, convert
leaq foo(%rip), %rax
to
movl $foo, %eax
with some nop padding, etc.

Also, the relocation check that's being added here would only work with
PIE linking.

Arvind Sankar

unread,
May 26, 2020, 11:31:07 AM5/26/20
to Sedat Dilek, Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
On Tue, May 26, 2020 at 05:07:24PM +0200, Sedat Dilek wrote:
> > >
> >
> > Maybe this should be:
> >
> > [ arch/x86/boot/compressed/Makefile ]
> >
> > -KBUILD_CFLAGS += -include hidden.h
> > +KBUILD_CFLAGS += -include ./hidden.h
> >
>
> NOPE.
>
> This works:
>
> [ arch/x86/boot/compressed/Makefile ]
>
> -KBUILD_CFLAGS += -include hidden.h
> +KBUILD_CFLAGS += -include ./arch/x86/boot/compressed/hidden.h
>
> $ ll arch/x86/boot/bzImage arch/x86/boot/compressed/vmlinux
> -rw-r--r-- 1 dileks dileks 6,5M Mai 26 17:05 arch/x86/boot/bzImage
> -rwxr-xr-x 1 dileks dileks 6,5M Mai 26 17:05 arch/x86/boot/compressed/vmlinux
>
> - Sedat -

It needs to either be $(srctree)/$(src)/hidden.h, or we should add
-I $(srctree)/$(src) to the KBUILD_CFLAGS. The latter option is added
automatically when building in a separate builddir with O=${KOBJ} (which
is how I, and I assume Ard, was testing), but for some reason is not
added when building in-tree. The -include option doesn't automatically
search the directory of the source file.

-include file Process file as if "#include "file"" appeared as the first
line of the primary source file. However, the first directory searched
for file is the preprocessor's working directory instead of the
directory containing the main source file. If not found there, it is
searched for in the remainder of the "#include "..."" search chain as
normal.

Arvind Sankar

unread,
May 26, 2020, 11:36:40 AM5/26/20
to Sedat Dilek, Arvind Sankar, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
Yes. Those .hidden markings are there to ensure that when relocations
are generated (as they are currently), they're generated as
R_386_RELATIVE (which uses B+A calculation, with A being the link-time
virtual address of the symbol, and stored in the relocation field)
rather than R_386_32 (which uses S+A calculation, and so doesn't work
without runtime processing). After this patchset there aren't any
relocations, so while the .hidden markings won't hurt, they won't be
necessary either.

Sedat Dilek

unread,
May 26, 2020, 11:39:02 AM5/26/20
to Arvind Sankar, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
So, I am here on Debian/testing AMD64.

How can I test the patchset worked correctly?

- Sedat -

Sedat Dilek

unread,
May 26, 2020, 12:18:19 PM5/26/20
to Ard Biesheuvel, Arvind Sankar, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
On Tue, May 26, 2020 at 5:07 PM Sedat Dilek <sedat...@gmail.com> wrote:

...

> > > patchset to previous build:
> > >
> > > $ git log --no-merges --oneline 5.7.0-rc7-1-amd64-clang..5.7.0-rc7-2-amd64-clang
> > > 8b74901cb9e5 (for-5.7/x86-boot-remove-runtime-relocations-from-compressed-kernel-v2-nivedita76)
> > > x86/boot: Check that there are no runtime relocations
> > > 83fb1bc3b076 x86/boot: Remove runtime relocations from head_{32,64}.S
> > > fede23dacbbd x86/boot: Remove run-time relocations from .head.text code
> > > 3e5ea481b8fb x86/boot: Add .text.* to setup.ld
> > > bec910ba3d67 x86/boot/compressed: Get rid of GOT fixup code
> > > be834cee6f39 x86/boot/compressed: Force hidden visibility for all
> > > symbol references
> > > 9b7c7d8d2d7b x86/boot/compressed: Move .got.plt entries out of the .got section
> > > ba6a49f806a0 (for-5.7/kbuild-compressed-debug-info) Makefile: support
> > > compressed debug info
> > > a1fb11944d11 (for-5.7/x86-boot-nivedita76) x86/boot: Correct
> > > relocation destination on old linkers
> > > c70e3890058f (for-5.7/x86-build-maskray) x86/boot: Discard
> > > .discard.unreachable for arch/x86/boot/compressed/vmlinux

...

> This works:
>
> [ arch/x86/boot/compressed/Makefile ]
>
> -KBUILD_CFLAGS += -include hidden.h
> +KBUILD_CFLAGS += -include ./arch/x86/boot/compressed/hidden.h
>
> $ ll arch/x86/boot/bzImage arch/x86/boot/compressed/vmlinux
> -rw-r--r-- 1 dileks dileks 6,5M Mai 26 17:05 arch/x86/boot/bzImage
> -rwxr-xr-x 1 dileks dileks 6,5M Mai 26 17:05 arch/x86/boot/compressed/vmlinux
>

I was able to build/link and boot on bare metal.

root@iniza:~# cat /proc/version
Linux version 5.7.0-rc7-2-amd64-clang (sedat...@gmail.com@iniza)
(clang version 10.0.1rc1, LLD 10.0.1rc1) #2~bullseye+dileks1 SMP
2020-05-26

- Sedat -

Fangrui Song

unread,
May 26, 2020, 1:13:47 PM5/26/20
to Arvind Sankar, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Dmitry Golovin, clang-built-linux, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List

On 2020-05-26, Arvind Sankar wrote:
>On Tue, May 26, 2020 at 08:11:56AM +0200, Ard Biesheuvel wrote:
>> On Tue, 26 May 2020 at 00:59, Arvind Sankar <nive...@alum.mit.edu> wrote:
>> > # Compressed kernel should be built as PIE since it may be loaded at any
>> > # address by the bootloader.
>> > -KBUILD_LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
>> > +KBUILD_LDFLAGS += -pie $(call ld-option, --no-dynamic-linker)
>>
>> Do we still need -pie linking with these changes applied?
>>
>
>I think it's currently not strictly necessary -- eg the 64bit kernel
>doesn't get linked as pie right now with LLD or old binutils. However,
>it is safer to do so to ensure that the result remains PIC with future
>versions of the linker. There are linker optimizations that can convert
>certain PIC instructions when PIE is disabled. While I think they
>currently all focus on eliminating indirection through the GOT (and thus
>wouldn't be applicable any more),

There are 3 forms described by x86-64 psABI B.2 Optimize GOTPCRELX Relocations

(1) movq foo@GOTPCREL(%rip), %reg -> leaq foo(%rip), %reg
(2) call *foo@GOTPCREL(%rip) -> nop; call foo
(3) jmp *foo@GOTPCREL(%rip) -> jmp foo; nop

ld.bfd and gold perform (1) even for R_X86_64_GOTPCREL. LLD requires R_X86_64_[REX_]GOTPCRELX

>it's easy to imagine that they could
>get extended to, for eg, convert
> leaq foo(%rip), %rax
>to
> movl $foo, %eax
>with some nop padding, etc.

Not with NOP padding, but probably with instruction prefixes. It is
unclear the rewriting will be beneficial. Rewriting instructions definitely requires a
dedicated relocation type like R_X86_64_[REX_]GOTPCRELX.

Arvind Sankar

unread,
May 26, 2020, 3:14:15 PM5/26/20
to Fangrui Song, Arvind Sankar, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Dmitry Golovin, clang-built-linux, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
On Tue, May 26, 2020 at 10:13:40AM -0700, Fangrui Song wrote:
>
> On 2020-05-26, Arvind Sankar wrote:
> >On Tue, May 26, 2020 at 08:11:56AM +0200, Ard Biesheuvel wrote:
> >> On Tue, 26 May 2020 at 00:59, Arvind Sankar <nive...@alum.mit.edu> wrote:
> >> > # Compressed kernel should be built as PIE since it may be loaded at any
> >> > # address by the bootloader.
> >> > -KBUILD_LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
> >> > +KBUILD_LDFLAGS += -pie $(call ld-option, --no-dynamic-linker)
> >>
> >> Do we still need -pie linking with these changes applied?
> >>
> >
> >I think it's currently not strictly necessary -- eg the 64bit kernel
> >doesn't get linked as pie right now with LLD or old binutils. However,
> >it is safer to do so to ensure that the result remains PIC with future
> >versions of the linker. There are linker optimizations that can convert
> >certain PIC instructions when PIE is disabled. While I think they
> >currently all focus on eliminating indirection through the GOT (and thus
> >wouldn't be applicable any more),
>
> There are 3 forms described by x86-64 psABI B.2 Optimize GOTPCRELX Relocations
>
> (1) movq foo@GOTPCREL(%rip), %reg -> leaq foo(%rip), %reg
> (2) call *foo@GOTPCREL(%rip) -> nop; call foo
> (3) jmp *foo@GOTPCREL(%rip) -> jmp foo; nop
>
> ld.bfd and gold perform (1) even for R_X86_64_GOTPCREL. LLD requires R_X86_64_[REX_]GOTPCRELX

The psABI says (1) can be relaxed into mov $foo, %reg if PIC is disabled
and foo lives below 4Gb. Similarly for the "test and binop" cases. Such
a relaxation would produce code that's not PIC any more, and wouldn't
boot.

>
> >it's easy to imagine that they could
> >get extended to, for eg, convert
> > leaq foo(%rip), %rax
> >to
> > movl $foo, %eax
> >with some nop padding, etc.
>
> Not with NOP padding, but probably with instruction prefixes. It is
> unclear the rewriting will be beneficial. Rewriting instructions definitely requires a
> dedicated relocation type like R_X86_64_[REX_]GOTPCRELX.

It ought to be faster: according to Agner Fog's tables, upto 4x higher
throughput than the RIP-relative LEA, and movq $foo, %rax is actually
the same size.

To take a step back, there isn't any *point* in not specifying -pie
after these changes: it would be lying to the toolchain just for the
sake of lying. It is inherently fragile, and would work only because the
toolchain isn't sophisticated enough to do some optimizations.

Eg, consider that if you ask for the address of an external function,
the compiler will generate
movq f@GOTPCREL(%rip), %reg
if f has default visibility, and
leaq f(%rip), %reg
if f has hidden visibility.

If you then link without -pie, the former gets relaxed into the non-PIC
movq $f, %reg
by the BFD linker, but the latter isn't relaxed. This is a missed
optimization, which happens because there's the GOTPCRELX to tell the
linker that the first form can be relaxed, and there's no special
relaxable relocation type for the second form.

The 64-bit kernel actually contains one of these relocations, prior to
Ard's patches to add hiddden visibility for everything. It currently
works with LLD (which can't use -pie) only because LLD doesn't appear to
perform the relaxation of
movq f@GOTPCREL(%rip), %reg
all the way to
movq $f, %reg
Binutils-2.34, which does do that relaxation, produces an unbootable
kernel if you leave out the -pie.

Sedat Dilek

unread,
May 27, 2020, 2:24:36 AM5/27/20
to Arvind Sankar, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
Will you send a follow-up or a v3 for this?

- Sedat -

Sedat Dilek

unread,
May 27, 2020, 2:27:00 AM5/27/20
to Arvind Sankar, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Fangrui Song, Dmitry Golovin, Clang-Built-Linux ML, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
Do you plan a change on this?

- Sedat -

Andy Shevchenko

unread,
Aug 6, 2020, 7:20:05 AM8/6/20
to Arvind Sankar, Fangrui Song, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Dmitry Golovin, clang-built-linux, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
On Tue, May 26, 2020 at 03:14:11PM -0400, Arvind Sankar wrote:

Side question: are you going to submit a v3 of this?
Or i.o.w. what is the status of this series?

--
With Best Regards,
Andy Shevchenko


Arvind Sankar

unread,
Aug 6, 2020, 12:12:21 PM8/6/20
to Andy Shevchenko, Arvind Sankar, Fangrui Song, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin, X86 ML, Nick Desaulniers, Dmitry Golovin, clang-built-linux, Masahiro Yamada, Daniel Kiper, Linux Kernel Mailing List
The latest is v6 [0], which has some minor changes over this version and
is rebased on 5.8-rc7.

I will send out another rebase once the merge window closes.

Thanks.

[0] https://lore.kernel.org/lkml/20200731202738.2...@alum.mit.edu/
Reply all
Reply to author
Forward
0 new messages