[PATCH 61/74] Kbuild, lto: Drop .number postfixes in modpost

Aug 18, 2012, 11:10:03 PM8/18/12

to

This rather large patchkit enables gcc Link Time Optimization (LTO)
support for the kernel.

With LTO gcc will do whole program optimizations for
the whole kernel and each module. This increases compile time,
but can generate faster code.

LTO allows gcc to inline functions between different files and
do various other optimization across the whole binary.

It might also trigger bugs due to more aggressive optimization.
It allows gcc to drop unused code. It also allows it to check
types over the whole program.

The build slow down is currently between 2-4x (with larger binaries
taking longer). Typical configs with reasonably sized vmlinux
compile with less than 4GB memory, but very large setups (like
allyes) need upto 9GB.

You probably wouldn't use it for development, but it may become
a useful option in the future for release builds.

We see speedups in various benchmarks, but also still a few minor
regressions. There's still some outstanding tuning, both in compile
time and allow gcc even better optimization. Also the kernel currently
triggers some slow behaviour in gcc, which will hopefully improve
in future gcc versions, allowing faster LTO builds.

The kit contains workarounds for various toolchain problems with gcc 4.7.
Part of those will be hopefully removed with some upcoming changes.

Currently a special tool chain setup is needed for LTO, with
gcc 4.7 and HJ Lu's Linux binutils. Please see Documentation/lto-build
for more details on how to install the right versions with the right setup.
The LTO code disables itself if it doesn't find the right toolchain
(however it may not be able to detect all misconfigurations)

This is in the RFC stage at this point. I only tested it on 32bit
and 64bit x86. Other architectures will undoubtedly need more
changes. I would be interested in any testing and benchmarking and
review.

Some options are currently disabled with LTO. MODVERSIONS I plan
to fix. Some others like the FUNCTION_TRACER (who rely on
different options for specific files) may need compiler changes.

This patchkit relies on the separately posted const-sections patchkit
With LTO gcc insists on correct section attributes.

Available from

git://github.com/andikleen/linux-misc lto-3.6 (or -3.5 and -3.7 in the future)

Note the tree is frequently rebased.

Thanks to HJ Lu, Joe Mario, Honza Hubicka, Richard Guenther,
Don Zickus, Changlong Xie who helped with this project
(and probably some more who I forgot, sorry)

-Andi

Andi Kleen

unread,

Aug 18, 2012, 11:10:03 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

arch/x86/kernel/cpu/common.c | 4 ++--
arch/x86/kernel/process_64.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 46d8786..8f12e8c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1075,7 +1075,7 @@ EXPORT_PER_CPU_SYMBOL(kernel_stack);
DEFINE_PER_CPU(char *, irq_stack_ptr) =
init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE - 64;

-DEFINE_PER_CPU(unsigned int, irq_count) = -1;
+DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;

DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);

@@ -1114,7 +1114,7 @@ void syscall_init(void)
X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|X86_EFLAGS_IOPL);
}

-unsigned long kernel_eflags;
+unsigned long kernel_eflags __visible;

/*
* Copies of the original ist values from the tss are only accessed during
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index a5720ed..34435e2 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -52,7 +52,7 @@

asmlinkage extern void ret_from_fork(void);

-DEFINE_PER_CPU(unsigned long, old_rsp);
+asmlinkage DEFINE_PER_CPU(unsigned long, old_rsp);

/* Prints also some state that isn't saved in the pt_regs */
void __show_regs(struct pt_regs *regs, int all)
--
1.7.7.6

Andi Kleen

unread,

Aug 18, 2012, 11:10:03 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

arch/x86/platform/efi/efi.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 2dc29f5..02bc41a 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -53,7 +53,7 @@
int efi_enabled;
EXPORT_SYMBOL(efi_enabled);

-struct efi __read_mostly efi = {
+struct efi __visible __read_mostly efi = {
.mps = EFI_INVALID_TABLE_ADDR,
.acpi = EFI_INVALID_TABLE_ADDR,
.acpi20 = EFI_INVALID_TABLE_ADDR,
--
1.7.7.6

Andi Kleen

unread,

Aug 18, 2012, 11:10:03 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

On a 32bit build gcc 4.7 with LTO decides to clobber the 6th argument on the
stack. Unfortunately this corrupts the user EBP and leads to later crashes.
For now mark do_futex noinline to prevent this.

I wish there was a generic way to handle this. Seems like a ticking time
bomb problem.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

kernel/futex.c | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 3717e7b..48b5a07 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2620,7 +2620,7 @@ void exit_robust_list(struct task_struct *curr)
curr, pip);
}

-long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
+noinline long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
int cmd = op & FUTEX_CMD_MASK;
--
1.7.7.6

Andi Kleen

unread,

Aug 18, 2012, 11:10:04 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Making this visible fixes some missing symbols with gcc 4.7 LTO.
This is a workaround for a compiler problem.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

drivers/media/video/pvrusb2/pvrusb2-audio.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/media/video/pvrusb2/pvrusb2-audio.c b/drivers/media/video/pvrusb2/pvrusb2-audio.c
index cc06d5e..aaa6420 100644
--- a/drivers/media/video/pvrusb2/pvrusb2-audio.c
+++ b/drivers/media/video/pvrusb2/pvrusb2-audio.c
@@ -32,7 +32,7 @@ struct routing_scheme {
unsigned int cnt;
};

-static const int routing_scheme0[] = {
+__visible const int pvrusb2_routing_scheme0[] = {
[PVR2_CVAL_INPUT_TV] = MSP_INPUT_DEFAULT,
[PVR2_CVAL_INPUT_RADIO] = MSP_INPUT(MSP_IN_SCART2,
MSP_IN_TUNER1,
@@ -49,8 +49,8 @@ static const int routing_scheme0[] = {
};

static const struct routing_scheme routing_def0 = {
- .def = routing_scheme0,
- .cnt = ARRAY_SIZE(routing_scheme0),
+ .def = pvrusb2_routing_scheme0,
+ .cnt = ARRAY_SIZE(pvrusb2_routing_scheme0),
};

static const struct routing_scheme *routing_schemes[] = {
--
1.7.7.6

unread,

Aug 18, 2012, 11:10:04 PM8/18/12

to

From: Joe Mario <jma...@redhat.com>

With the added postfixes that LTO adds for local
symbols, the longest name in the kernel overflows
the namebuf[KSYM_NAME_LEN] array by two bytes. That name is:
__pci_fixup_resumePCI_VENDOR_ID_SERVERWORKSPCI_DEVICE_ID_SERVERWORKS_HT1000SBquirk_disable_broadcom_boot_interrupt.1488004.672802

Double the max symbol name length.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

include/linux/kallsyms.h | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

Andi Kleen

unread,

Aug 18, 2012, 11:10:05 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

The paravirt patching code assumes that it can reference a
local assembler label between two different top level assembler
statements. This does not work with some experimental gcc builds,
where the assembler code may end up in different assembler files.

Replace it with extern / global /asm linkage labels.

This also removes one redundant copy of the macro.

Cc: jer...@goop.org

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

arch/x86/include/asm/paravirt_types.h | 9 +++++----
arch/x86/kernel/paravirt.c | 5 -----
2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 4f262bc..6a464ba 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -385,10 +385,11 @@ extern struct pv_lock_ops pv_lock_ops;
_paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")

/* Simple instruction patching code. */
-#define DEF_NATIVE(ops, name, code) \
- extern const char start_##ops##_##name[] __visible, \
- end_##ops##_##name[] __visible; \
- asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
+#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
+
+#define DEF_NATIVE(ops, name, code) \
+ __visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \
+ asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))

unsigned paravirt_patch_nop(void);
unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 17fff18..947255e 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -62,11 +62,6 @@ void __init default_banner(void)
pv_info.name);
}

-/* Simple instruction patching code. */
-#define DEF_NATIVE(ops, name, code) \
- extern const char start_##ops##_##name[], end_##ops##_##name[]; \
- asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
-
/* Undefined instruction for dealing with missing ops pointers. */
static const unsigned char ud2a[] = { 0x0f, 0x0b };

--
1.7.7.6

Andi Kleen

unread,

Aug 18, 2012, 11:10:05 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

VDSO does not play well with LTO, so just disable it.

(note that powerpc will likely need more changes for LTO, this was
just from grep)

Cc: be...@kernel.crashing.org

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

arch/powerpc/kernel/vdso32/Makefile | 2 +-
arch/powerpc/kernel/vdso64/Makefile | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index 53e6c9b..8cc88bf 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -16,7 +16,7 @@ obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))

GCOV_PROFILE := n

-ccflags-y := -shared -fno-common -fno-builtin
+ccflags-y := -shared -fno-common -fno-builtin $(DISABLE_LTO)
ccflags-y += -nostdlib -Wl,-soname=linux-vdso32.so.1 \
$(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
asflags-y := -D__VDSO32__ -s
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index effca94..5bca644 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -9,7 +9,7 @@ obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))

GCOV_PROFILE := n

-ccflags-y := -shared -fno-common -fno-builtin
+ccflags-y := -shared -fno-common -fno-builtin $(DISABLE_LTO)
ccflags-y += -nostdlib -Wl,-soname=linux-vdso64.so.1 \
$(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
asflags-y := -D__VDSO64__ -s
--
1.7.7.6

Andi Kleen

unread,

Aug 18, 2012, 11:10:05 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

We cannot assume that the inline assembler code always ends up
in the same file as the original C file. So make any assembler labels
that are called with "extern" by C global

Cc: w...@iguana.be

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

drivers/watchdog/hpwdt.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 1eff743..68bda60 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -161,7 +161,8 @@ extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
#define HPWDT_ARCH 32

asm(".text \n\t"
- ".align 4 \n"
+ ".align 4 \n\t"
+ ".globl asminline_call \n"
"asminline_call: \n\t"
"pushl %ebp \n\t"
"movl %esp, %ebp \n\t"
@@ -351,7 +352,8 @@ static int __devinit detect_cru_service(void)
#define HPWDT_ARCH 64

asm(".text \n\t"
- ".align 4 \n"
+ ".align 4 \n\t"
+ ".globl asminline_call \n"
"asminline_call: \n\t"
"pushq %rbp \n\t"
"movq %rsp, %rbp \n\t"
--
1.7.7.6

Andi Kleen

unread,

Aug 18, 2012, 11:10:05 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

include/linux/jump_label.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 0976fc4..a39e8e3 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -106,8 +106,8 @@ static __always_inline bool static_key_true(struct static_key *key)
return !static_key_false(key);
}

-extern struct jump_entry __start___jump_table[];
-extern struct jump_entry __stop___jump_table[];
+extern __visible struct jump_entry __start___jump_table[];
+extern __visible struct jump_entry __stop___jump_table[];

extern void jump_label_init(void);
extern void jump_label_lock(void);
--
1.7.7.6

Andi Kleen

unread,

Aug 18, 2012, 11:10:06 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

lib/bug.c | 2 +-
lib/dynamic_debug.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/bug.c b/lib/bug.c
index a28c141..f81e1a6 100644
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -43,7 +43,7 @@
#include <linux/bug.h>
#include <linux/sched.h>

-extern const struct bug_entry __start___bug_table[], __stop___bug_table[];
+extern __visible const struct bug_entry __start___bug_table[], __stop___bug_table[];

static inline unsigned long bug_addr(const struct bug_entry *bug)
{
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 7ca29a0..1760a71 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -34,8 +34,8 @@
#include <linux/device.h>
#include <linux/netdevice.h>

-extern struct _ddebug __start___verbose[];
-extern struct _ddebug __stop___verbose[];
+extern __visible struct _ddebug __start___verbose[];
+extern __visible struct _ddebug __stop___verbose[];

struct ddebug_table {
struct list_head link;
--
1.7.7.6

Andi Kleen

unread,

Aug 18, 2012, 11:10:06 PM8/18/12

to

unread,

Aug 18, 2012, 11:10:05 PM8/18/12

unread,

Aug 18, 2012, 11:20:02 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Make the sys_call_table type defined in asm/syscall.h match
the definition in syscall_64.c

v2: include asm/syscall.h in syscall_64.c too. I left uml alone
because it doesn't have an syscall.h on its own and including
the native one leads to other errors.
Cc: x...@kernel.org

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

arch/x86/include/asm/syscall.h | 3 ++-
arch/x86/kernel/syscall_64.c | 3 +--

2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 1ace47b..c36962d 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -20,7 +20,8 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>

-extern const unsigned long sys_call_table[];
+typedef void (*sys_call_ptr_t)(void);
+extern const sys_call_ptr_t sys_call_table[];

/*
* Only the low 32 bits of orig_ax are meaningful, so we return int.
diff --git a/arch/x86/kernel/syscall_64.c b/arch/x86/kernel/syscall_64.c
index 3967318..4ac730b 100644
--- a/arch/x86/kernel/syscall_64.c
+++ b/arch/x86/kernel/syscall_64.c
@@ -4,6 +4,7 @@
#include <linux/sys.h>
#include <linux/cache.h>
#include <asm/asm-offsets.h>
+#include <asm/syscall.h>

#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)

@@ -19,8 +20,6 @@

#define __SYSCALL_64(nr, sym, compat) [nr] = sym,

-typedef void (*sys_call_ptr_t)(void);
-
extern void sys_ni_syscall(void);

asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {

Andi Kleen

unread,

Aug 18, 2012, 11:20:02 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

init/do_mounts_initrd.c | 2 +-
init/initramfs.c | 4 ++--
init/main.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 135959a2..71a625e 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -36,7 +36,7 @@ __setup("noinitrd", no_initrd);
static int __init do_linuxrc(void *_shell)
{
static const char *argv[] = { "linuxrc", NULL, };
- extern const char *envp_init[];
+ extern __visible const char *envp_init[];
const char *shell = _shell;

sys_close(old_fd);sys_close(root_fd);
diff --git a/init/initramfs.c b/init/initramfs.c
index 84c6bf1..8a1fd07 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -493,8 +493,8 @@ static int __init retain_initrd_param(char *str)
}
__setup("retain_initrd", retain_initrd_param);

-extern char __initramfs_start[];
-extern unsigned long __initramfs_size;
+extern __visible char __initramfs_start[];
+extern __visible unsigned long __initramfs_size;
#include <linux/initrd.h>
#include <linux/kexec.h>

diff --git a/init/main.c b/init/main.c
index e60679d..6438ffd 100644
--- a/init/main.c
+++ b/init/main.c
@@ -470,7 +470,7 @@ static void __init mm_init(void)
asmlinkage void __init start_kernel(void)
{
char * command_line;
- extern const struct kernel_param __start___param[], __stop___param[];
+ extern __visible const struct kernel_param __start___param[], __stop___param[];

/*
* Need to run as early as possible, to initialize the

Andi Kleen

unread,

Aug 18, 2012, 11:20:02 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

- Make the C code used by the paravirt stubs visible
- Since they have to be global now, give them a more unique
name.

Cc: ru...@rustcorp.com.au

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

arch/x86/lguest/boot.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 642d880..dd167d2 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -234,13 +234,13 @@ static void lguest_end_context_switch(struct task_struct *next)
* flags word contains all kind of stuff, but in practice Linux only cares
* about the interrupt flag. Our "save_flags()" just returns that.
*/
-static unsigned long save_fl(void)
+asmlinkage unsigned long lguest_save_fl(void)
{
return lguest_data.irq_enabled;
}

/* Interrupts go off... */
-static void irq_disable(void)
+asmlinkage void lguest_irq_disable(void)
{
lguest_data.irq_enabled = 0;
}
@@ -254,8 +254,8 @@ static void irq_disable(void)
* PV_CALLEE_SAVE_REGS_THUNK(), which pushes %eax onto the stack, calls the
* C function, then restores it.
*/
-PV_CALLEE_SAVE_REGS_THUNK(save_fl);
-PV_CALLEE_SAVE_REGS_THUNK(irq_disable);
+PV_CALLEE_SAVE_REGS_THUNK(lguest_save_fl);
+PV_CALLEE_SAVE_REGS_THUNK(lguest_irq_disable);
/*:*/

/* These are in i386_head.S */
@@ -1285,9 +1285,9 @@ __init void lguest_init(void)
*/

/* Interrupt-related operations */
- pv_irq_ops.save_fl = PV_CALLEE_SAVE(save_fl);
+ pv_irq_ops.save_fl = PV_CALLEE_SAVE(lguest_save_fl);
pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(lg_restore_fl);
- pv_irq_ops.irq_disable = PV_CALLEE_SAVE(irq_disable);
+ pv_irq_ops.irq_disable = PV_CALLEE_SAVE(lguest_irq_disable);
pv_irq_ops.irq_enable = __PV_IS_CALLEE_SAVE(lg_irq_enable);
pv_irq_ops.safe_halt = lguest_safe_halt;

Andi Kleen

unread,

Aug 18, 2012, 11:20:02 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

kernel/kallsyms.c | 12 ++++++------

1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 2169fee..1b40cb7 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -36,20 +36,20 @@
* These will be re-linked against their real values
* during the second link stage.
*/
-extern const unsigned long kallsyms_addresses[] __attribute__((weak));
-extern const u8 kallsyms_names[] __attribute__((weak));
+extern __visible const unsigned long kallsyms_addresses[] __attribute__((weak));
+extern __visible const u8 kallsyms_names[] __attribute__((weak));

/*
* Tell the compiler that the count isn't in the small data section if the arch
* has one (eg: FRV).
*/
-extern const unsigned long kallsyms_num_syms
+extern __visible const unsigned long kallsyms_num_syms
__attribute__((weak, section(".rodata")));

-extern const u8 kallsyms_token_table[] __attribute__((weak));
-extern const u16 kallsyms_token_index[] __attribute__((weak));
+extern __visible const u8 kallsyms_token_table[] __attribute__((weak));
+extern __visible const u16 kallsyms_token_index[] __attribute__((weak));

-extern const unsigned long kallsyms_markers[] __attribute__((weak));
+extern __visible const unsigned long kallsyms_markers[] __attribute__((weak));

static inline int is_kernel_inittext(unsigned long addr)
{

Andi Kleen

unread,

Aug 18, 2012, 11:20:02 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

arch/x86/include/asm/vvar.h | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

unread,

Aug 18, 2012, 11:20:03 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

kernel/trace/ftrace.c | 4 ++--
kernel/trace/trace.h | 4 ++--
kernel/trace/trace_branch.c | 8 ++++----
kernel/trace/trace_events.c | 4 ++--
kernel/trace/trace_syscalls.c | 4 ++--
kernel/tracepoint.c | 4 ++--
6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b4f20fb..5028bd3 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3866,8 +3866,8 @@ struct notifier_block ftrace_module_nb = {
.priority = 0,
};

-extern unsigned long __start_mcount_loc[];
-extern unsigned long __stop_mcount_loc[];
+extern __visible unsigned long __start_mcount_loc[];
+extern __visible unsigned long __stop_mcount_loc[];

void __init ftrace_init(void)
{
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..8c063e7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -836,8 +836,8 @@ extern void trace_event_enable_cmd_record(bool enable);
extern struct mutex event_mutex;
extern struct list_head ftrace_events;

-extern const char *__start___trace_bprintk_fmt[];
-extern const char *__stop___trace_bprintk_fmt[];
+extern __visible const char *__start___trace_bprintk_fmt[];
+extern __visible const char *__stop___trace_bprintk_fmt[];

void trace_printk_init_buffers(void);

diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
index 8d3538b..5be6217 100644
--- a/kernel/trace/trace_branch.c
+++ b/kernel/trace/trace_branch.c
@@ -226,8 +226,8 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect)
}
EXPORT_SYMBOL(ftrace_likely_update);

-extern unsigned long __start_annotated_branch_profile[];
-extern unsigned long __stop_annotated_branch_profile[];
+extern __visible unsigned long __start_annotated_branch_profile[];
+extern __visible unsigned long __stop_annotated_branch_profile[];

static int annotated_branch_stat_headers(struct seq_file *m)
{
@@ -355,8 +355,8 @@ fs_initcall(init_annotated_branch_stats);

#ifdef CONFIG_PROFILE_ALL_BRANCHES

-extern unsigned long __start_branch_profile[];
-extern unsigned long __stop_branch_profile[];
+extern __visible unsigned long __start_branch_profile[];
+extern __visible unsigned long __stop_branch_profile[];

static int all_branch_stat_headers(struct seq_file *m)
{
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 29111da..325c9f0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1435,8 +1435,8 @@ static struct notifier_block trace_module_nb = {
.priority = 0,
};

-extern struct ftrace_event_call *__start_ftrace_events[];
-extern struct ftrace_event_call *__stop_ftrace_events[];
+extern __visible struct ftrace_event_call *__start_ftrace_events[];
+extern __visible struct ftrace_event_call *__stop_ftrace_events[];

static char bootup_event_buf[COMMAND_LINE_SIZE] __initdata;

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 60e4d78..52f3e15 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -56,8 +56,8 @@ struct ftrace_event_class event_class_syscall_exit = {
.raw_init = init_syscall_trace,
};

-extern struct syscall_metadata *__start_syscalls_metadata[];
-extern struct syscall_metadata *__stop_syscalls_metadata[];
+extern __visible struct syscall_metadata *__start_syscalls_metadata[];
+extern __visible struct syscall_metadata *__stop_syscalls_metadata[];

static struct syscall_metadata **syscalls_metadata;

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index d96ba22..ddae1de 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -27,8 +27,8 @@
#include <linux/sched.h>
#include <linux/static_key.h>

-extern struct tracepoint * const __start___tracepoints_ptrs[];
-extern struct tracepoint * const __stop___tracepoints_ptrs[];
+extern __visible struct tracepoint * const __start___tracepoints_ptrs[];
+extern __visible struct tracepoint * const __stop___tracepoints_ptrs[];

/* Set to 1 to enable tracepoint debug output */
static const int tracepoint_debug;

Andi Kleen

unread,

Aug 18, 2012, 11:20:03 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

arch/x86/kernel/alternative.c | 4 ++--
arch/x86/kernel/vsyscall_64.c | 4 ++--
arch/x86/power/hibernate_32.c | 2 +-
arch/x86/um/vdso/vma.c | 2 +-
arch/x86/vdso/vma.c | 10 +++++-----
5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index afb7ff7..27ae345 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -252,8 +252,8 @@ static void __init_or_module add_nops(void *insns, unsigned int len)
}
}

-extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
-extern s32 __smp_locks[], __smp_locks_end[];
+extern __visible struct alt_instr __alt_instructions[], __alt_instructions_end[];
+extern __visible s32 __smp_locks[], __smp_locks_end[];
void *text_poke_early(void *addr, const void *opcode, size_t len);

/* Replace instructions with better alternatives for this CPU type.
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 8d141b3..70f25f2 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -355,9 +355,9 @@ cpu_vsyscall_notifier(struct notifier_block *n, unsigned long action, void *arg)

void __init map_vsyscall(void)
{
- extern char __vsyscall_page;
+ extern __visible char __vsyscall_page;
unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page);
- extern char __vvar_page;
+ extern __visible char __vvar_page;
unsigned long physaddr_vvar_page = __pa_symbol(&__vvar_page);

__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_vsyscall,
diff --git a/arch/x86/power/hibernate_32.c b/arch/x86/power/hibernate_32.c
index 74202c1..7b8d7df 100644
--- a/arch/x86/power/hibernate_32.c
+++ b/arch/x86/power/hibernate_32.c
@@ -18,7 +18,7 @@
extern int restore_image(void);

/* References to section boundaries */
-extern const void __nosave_begin, __nosave_end;
+extern __visible const void __nosave_begin, __nosave_end;

/* Pointer to the temporary resume page tables */
pgd_t *resume_pg_dir;
diff --git a/arch/x86/um/vdso/vma.c b/arch/x86/um/vdso/vma.c
index af91901..a09f903 100644
--- a/arch/x86/um/vdso/vma.c
+++ b/arch/x86/um/vdso/vma.c
@@ -16,7 +16,7 @@ unsigned int __read_mostly vdso_enabled = 1;
unsigned long um_vdso_addr;

extern unsigned long task_size;
-extern char vdso_start[], vdso_end[];
+extern __visible char vdso_start[], vdso_end[];

static struct page **vdsop;

diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 00aaf04..fe08e2b 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -18,15 +18,15 @@

unsigned int __read_mostly vdso_enabled = 1;

-extern char vdso_start[], vdso_end[];
-extern unsigned short vdso_sync_cpuid;
+extern __visible char vdso_start[], vdso_end[];
+extern __visible unsigned short vdso_sync_cpuid;

-extern struct page *vdso_pages[];
+extern __visible struct page *vdso_pages[];
static unsigned vdso_size;

#ifdef CONFIG_X86_X32_ABI
-extern char vdsox32_start[], vdsox32_end[];
-extern struct page *vdsox32_pages[];
+extern __visible char vdsox32_start[], vdsox32_end[];
+extern __visible struct page *vdsox32_pages[];
static unsigned vdsox32_size;

static void __init patch_vdsox32(void *vdso, size_t len)

unread,

Aug 18, 2012, 11:20:03 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

When __gnu_lto_* is present that means that the module hasn't run with
LTO yet.

Cc: ru...@rustcorp.com.au

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

kernel/module.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 2cbbae3..a8a29c4 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1905,8 +1905,11 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)

switch (sym[i].st_shndx) {
case SHN_COMMON:

/* Ignore common symbols */

- if (!strncmp(name, "__gnu_lto", 9))

+ if (!strncmp(name, "__gnu_lto", 9)) {

+ printk("%s: module not link time optimized\n",
+ mod->name);

break;
+ }

/* We compiled with -fno-common. These are not
supposed to happen. */

Andi Kleen

unread,

Aug 18, 2012, 11:20:03 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

include/asm-generic/sections.h | 24 ++++++++++++------------
1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index c1a1216..eab95aa 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -3,20 +3,20 @@

/* References to section boundaries */

-extern char _text[], _stext[], _etext[];
-extern char _data[], _sdata[], _edata[];
-extern char __bss_start[], __bss_stop[];
-extern char __init_begin[], __init_end[];
-extern char _sinittext[], _einittext[];
-extern char _end[];
-extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[];
-extern char __kprobes_text_start[], __kprobes_text_end[];
-extern char __entry_text_start[], __entry_text_end[];
-extern char __initdata_begin[], __initdata_end[];
-extern char __start_rodata[], __end_rodata[];
+extern __visible char _text[], _stext[], _etext[];
+extern __visible char _data[], _sdata[], _edata[];
+extern __visible char __bss_start[], __bss_stop[];
+extern __visible char __init_begin[], __init_end[];
+extern __visible char _sinittext[], _einittext[];
+extern __visible char _end[];
+extern __visible char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[];
+extern __visible char __kprobes_text_start[], __kprobes_text_end[];
+extern __visible char __entry_text_start[], __entry_text_end[];
+extern __visible char __initdata_begin[], __initdata_end[];
+extern __visible char __start_rodata[], __end_rodata[];

/* Start and end of .ctors section - used for constructor calls. */
-extern char __ctors_start[], __ctors_end[];
+extern __visible char __ctors_start[], __ctors_end[];

/* function descriptor handling (if any). Override
* in asm/sections.h */

Andi Kleen

unread,

Aug 18, 2012, 11:20:03 PM8/18/12

to

From: Andi Kleen <a...@linux.intel.com>

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---

drivers/base/firmware_class.c | 4 ++--
drivers/base/power/trace.c | 2 +-
drivers/pci/quirks.c | 28 ++++++++++++++--------------
3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 803cfc1..618ca735 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -30,8 +30,8 @@ MODULE_LICENSE("GPL");

#ifdef CONFIG_FW_LOADER

-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern __visible struct builtin_fw __start_builtin_fw[];
+extern __visible struct builtin_fw __end_builtin_fw[];

static bool fw_get_builtin_firmware(struct firmware *fw, const char *name)
{
diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c
index d94a1f5..3048afa 100644
--- a/drivers/base/power/trace.c
+++ b/drivers/base/power/trace.c
@@ -166,7 +166,7 @@ void generate_resume_trace(const void *tracedata, unsigned int user)
}
EXPORT_SYMBOL(generate_resume_trace);

-extern char __tracedata_start, __tracedata_end;
+extern __visible char __tracedata_start, __tracedata_end;
static int show_file_hash(unsigned int value)
{
int match;
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 5155317..d18ea93 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2941,20 +2941,20 @@ static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
}
}

-extern struct pci_fixup __start_pci_fixups_early[];
-extern struct pci_fixup __end_pci_fixups_early[];
-extern struct pci_fixup __start_pci_fixups_header[];
-extern struct pci_fixup __end_pci_fixups_header[];
-extern struct pci_fixup __start_pci_fixups_final[];
-extern struct pci_fixup __end_pci_fixups_final[];
-extern struct pci_fixup __start_pci_fixups_enable[];
-extern struct pci_fixup __end_pci_fixups_enable[];
-extern struct pci_fixup __start_pci_fixups_resume[];
-extern struct pci_fixup __end_pci_fixups_resume[];
-extern struct pci_fixup __start_pci_fixups_resume_early[];
-extern struct pci_fixup __end_pci_fixups_resume_early[];
-extern struct pci_fixup __start_pci_fixups_suspend[];
-extern struct pci_fixup __end_pci_fixups_suspend[];
+extern __visible struct pci_fixup __start_pci_fixups_early[];
+extern __visible struct pci_fixup __end_pci_fixups_early[];
+extern __visible struct pci_fixup __start_pci_fixups_header[];
+extern __visible struct pci_fixup __end_pci_fixups_header[];
+extern __visible struct pci_fixup __start_pci_fixups_final[];
+extern __visible struct pci_fixup __end_pci_fixups_final[];
+extern __visible struct pci_fixup __start_pci_fixups_enable[];
+extern __visible struct pci_fixup __end_pci_fixups_enable[];
+extern __visible struct pci_fixup __start_pci_fixups_resume[];
+extern __visible struct pci_fixup __end_pci_fixups_resume[];
+extern __visible struct pci_fixup __start_pci_fixups_resume_early[];
+extern __visible struct pci_fixup __end_pci_fixups_resume_early[];
+extern __visible struct pci_fixup __start_pci_fixups_suspend[];
+extern __visible struct pci_fixup __end_pci_fixups_suspend[];

static bool pci_apply_fixup_final_quirks;

Jan Beulich

unread,

Aug 19, 2012, 4:00:01 AM8/19/12

to

>>> Andi Kleen <an...@firstfloor.org> 08/19/12 5:02 AM >>>

>-extern const unsigned long kallsyms_addresses[] __attribute__((weak));
>-extern const u8 kallsyms_names[] __attribute__((weak));
>+extern __visible const unsigned long kallsyms_addresses[] __attribute__((weak));
>+extern __visible const u8 kallsyms_names[] __attribute__((weak));

Shouldn't we minimally aim at consistency here:
- all attributes in a one place (I personally prefer the placement between type
and name, for compatibility with other compilers, but there are rare cases -
iirc not on declarations though - where gcc doesn't allow this)
- not using open coded __attribute__(()) when a definition (here: __weak) is
available, or alternatively open coding all of them (__attribute__((weak, ...)))?

Jan

Jan Beulich

unread,

Aug 19, 2012, 4:30:01 AM8/19/12

to

>>> Andi Kleen <an...@firstfloor.org> 08/19/12 4:58 AM >>>
>--- a/arch/x86/Kconfig
>+++ b/arch/x86/Kconfig
>@@ -224,8 +224,9 @@ config X86_32_LAZY_GS
>
>config ARCH_HWEIGHT_CFLAGS
> string
>- default "-fcall-saved-ecx -fcall-saved-edx" if X86_32
>- default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64
>+ default "-fcall-saved-ecx -fcall-saved-edx" if X86_32 && !LTO
>+ default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64 && !LTO
>+ default "" if LTO

By moving this last line first you can avoid modifying the other two lines.

>--- a/arch/x86/include/asm/arch_hweight.h
>+++ b/arch/x86/include/asm/arch_hweight.h
>@@ -25,9 +25,14 @@ static inline unsigned int __arch_hweight32(unsigned int w)
>{
> unsigned int res = 0;
>
>+#ifdef CONFIG_LTO
>+ res = __sw_hweight32(w);
>+#else
>+
> asm (ALTERNATIVE("call __sw_hweight32", POPCNT32, X86_FEATURE_POPCNT)
> : "="REG_OUT (res)
> : REG_IN (w));
>+#endif

Isn't this a little to harsh? Rather than not using popcnt at all, why don't you just add
the necessary clobbers to the asm() in the LTO case?

Jeremy Fitzhardinge

unread,

Aug 19, 2012, 4:30:02 AM8/19/12

to

On 08/18/2012 07:56 PM, Andi Kleen wrote:
> From: Andi Kleen <a...@linux.intel.com>
>

> The paravirt thunks use a hack of using a static reference to a static
> function to reference that function from the top level statement.
>
> This assumes that gcc always generates static function names in a specific
> format, which is not necessarily true.
>
> Simply make these functions global and asmlinkage. This way the
> static __used variables are not needed and everything works.

I'm not a huge fan of unstaticing all this stuff, but it doesn't
surprise me that the current code is brittle in the face of gcc changes.

J

>
> Changed in paravirt and in all users (Xen and vsmp)
>
> Cc: jer...@goop.org

> Signed-off-by: Andi Kleen <a...@linux.intel.com>
> ---

> arch/x86/include/asm/paravirt.h | 2 +-
> arch/x86/kernel/vsmp_64.c | 8 ++++----
> arch/x86/xen/irq.c | 8 ++++----
> arch/x86/xen/mmu.c | 16 ++++++++--------
> 4 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index a0facf3..cc733a6 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -804,9 +804,9 @@ static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
> */
> #define PV_CALLEE_SAVE_REGS_THUNK(func) \
> extern typeof(func) __raw_callee_save_##func; \
> - static void *__##func##__ __used = func; \
> \
> asm(".pushsection .text;" \
> + ".globl __raw_callee_save_" #func " ; " \
> "__raw_callee_save_" #func ": " \
> PV_SAVE_ALL_CALLER_REGS \
> "call " #func ";" \
> diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
> index 992f890..f393d6d 100644
> --- a/arch/x86/kernel/vsmp_64.c
> +++ b/arch/x86/kernel/vsmp_64.c
> @@ -33,7 +33,7 @@
> * and vice versa.
> */
>
> -static unsigned long vsmp_save_fl(void)
> +asmlinkage unsigned long vsmp_save_fl(void)
> {
> unsigned long flags = native_save_fl();
>
> @@ -43,7 +43,7 @@ static unsigned long vsmp_save_fl(void)
> }
> PV_CALLEE_SAVE_REGS_THUNK(vsmp_save_fl);
>
> -static void vsmp_restore_fl(unsigned long flags)
> +asmlinkage void vsmp_restore_fl(unsigned long flags)
> {
> if (flags & X86_EFLAGS_IF)
> flags &= ~X86_EFLAGS_AC;
> @@ -53,7 +53,7 @@ static void vsmp_restore_fl(unsigned long flags)
> }
> PV_CALLEE_SAVE_REGS_THUNK(vsmp_restore_fl);
>
> -static void vsmp_irq_disable(void)
> +asmlinkage void vsmp_irq_disable(void)
> {
> unsigned long flags = native_save_fl();
>
> @@ -61,7 +61,7 @@ static void vsmp_irq_disable(void)
> }
> PV_CALLEE_SAVE_REGS_THUNK(vsmp_irq_disable);
>
> -static void vsmp_irq_enable(void)
> +asmlinkage void vsmp_irq_enable(void)
> {
> unsigned long flags = native_save_fl();
>
> diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
> index 1573376..3dd8831 100644
> --- a/arch/x86/xen/irq.c
> +++ b/arch/x86/xen/irq.c
> @@ -21,7 +21,7 @@ void xen_force_evtchn_callback(void)
> (void)HYPERVISOR_xen_version(0, NULL);
> }
>
> -static unsigned long xen_save_fl(void)
> +asmlinkage unsigned long xen_save_fl(void)
> {
> struct vcpu_info *vcpu;
> unsigned long flags;
> @@ -39,7 +39,7 @@ static unsigned long xen_save_fl(void)
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_save_fl);
>
> -static void xen_restore_fl(unsigned long flags)
> +asmlinkage void xen_restore_fl(unsigned long flags)
> {
> struct vcpu_info *vcpu;
>
> @@ -66,7 +66,7 @@ static void xen_restore_fl(unsigned long flags)
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_restore_fl);
>
> -static void xen_irq_disable(void)
> +asmlinkage void xen_irq_disable(void)
> {
> /* There's a one instruction preempt window here. We need to
> make sure we're don't switch CPUs between getting the vcpu
> @@ -77,7 +77,7 @@ static void xen_irq_disable(void)
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable);
>
> -static void xen_irq_enable(void)
> +asmlinkage void xen_irq_enable(void)
> {
> struct vcpu_info *vcpu;
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index b65a761..9f82443 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -429,7 +429,7 @@ static pteval_t iomap_pte(pteval_t val)
> return val;
> }
>
> -static pteval_t xen_pte_val(pte_t pte)
> +asmlinkage pteval_t xen_pte_val(pte_t pte)
> {
> pteval_t pteval = pte.pte;
> #if 0
> @@ -446,7 +446,7 @@ static pteval_t xen_pte_val(pte_t pte)
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val);
>
> -static pgdval_t xen_pgd_val(pgd_t pgd)
> +asmlinkage pgdval_t xen_pgd_val(pgd_t pgd)
> {
> return pte_mfn_to_pfn(pgd.pgd);
> }
> @@ -477,7 +477,7 @@ void xen_set_pat(u64 pat)
> WARN_ON(pat != 0x0007010600070106ull);
> }
>
> -static pte_t xen_make_pte(pteval_t pte)
> +asmlinkage pte_t xen_make_pte(pteval_t pte)
> {
> phys_addr_t addr = (pte & PTE_PFN_MASK);
> #if 0
> @@ -512,14 +512,14 @@ static pte_t xen_make_pte(pteval_t pte)
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);
>
> -static pgd_t xen_make_pgd(pgdval_t pgd)
> +asmlinkage pgd_t xen_make_pgd(pgdval_t pgd)
> {
> pgd = pte_pfn_to_mfn(pgd);
> return native_make_pgd(pgd);
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_make_pgd);
>
> -static pmdval_t xen_pmd_val(pmd_t pmd)
> +asmlinkage pmdval_t xen_pmd_val(pmd_t pmd)
> {
> return pte_mfn_to_pfn(pmd.pmd);
> }
> @@ -578,7 +578,7 @@ static void xen_pmd_clear(pmd_t *pmdp)
> }
> #endif /* CONFIG_X86_PAE */
>
> -static pmd_t xen_make_pmd(pmdval_t pmd)
> +asmlinkage pmd_t xen_make_pmd(pmdval_t pmd)
> {
> pmd = pte_pfn_to_mfn(pmd);
> return native_make_pmd(pmd);
> @@ -586,13 +586,13 @@ static pmd_t xen_make_pmd(pmdval_t pmd)
> PV_CALLEE_SAVE_REGS_THUNK(xen_make_pmd);
>
> #if PAGETABLE_LEVELS == 4
> -static pudval_t xen_pud_val(pud_t pud)
> +asmlinkage pudval_t xen_pud_val(pud_t pud)
> {
> return pte_mfn_to_pfn(pud.pud);
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_pud_val);
>
> -static pud_t xen_make_pud(pudval_t pud)
> +asmlinkage pud_t xen_make_pud(pudval_t pud)
> {
> pud = pte_pfn_to_mfn(pud);

Jeremy Fitzhardinge

unread,

Aug 19, 2012, 4:30:02 AM8/19/12

to

On 08/18/2012 07:56 PM, Andi Kleen wrote:
> From: Andi Kleen <a...@linux.intel.com>
>

> The paravirt patching code assumes that it can reference a
> local assembler label between two different top level assembler
> statements. This does not work with some experimental gcc builds,
> where the assembler code may end up in different assembler files.

Egad, what are those zany gcc chaps up to now?

J

>
> Replace it with extern / global /asm linkage labels.
>
> This also removes one redundant copy of the macro.

>
> Cc: jer...@goop.org
> Signed-off-by: Andi Kleen <a...@linux.intel.com>
> ---

> arch/x86/include/asm/paravirt_types.h | 9 +++++----
> arch/x86/kernel/paravirt.c | 5 -----
> 2 files changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 4f262bc..6a464ba 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -385,10 +385,11 @@ extern struct pv_lock_ops pv_lock_ops;
> _paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")
>
> /* Simple instruction patching code. */
> -#define DEF_NATIVE(ops, name, code) \
> - extern const char start_##ops##_##name[] __visible, \
> - end_##ops##_##name[] __visible; \
> - asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
> +#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
> +
> +#define DEF_NATIVE(ops, name, code) \
> + __visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \
> + asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))
>
> unsigned paravirt_patch_nop(void);
> unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 17fff18..947255e 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -62,11 +62,6 @@ void __init default_banner(void)
> pv_info.name);
> }
>
> -/* Simple instruction patching code. */
> -#define DEF_NATIVE(ops, name, code) \
> - extern const char start_##ops##_##name[], end_##ops##_##name[]; \
> - asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
> -
> /* Undefined instruction for dealing with missing ops pointers. */
> static const unsigned char ud2a[] = { 0x0f, 0x0b };

Jan Beulich

unread,

Aug 19, 2012, 4:40:02 AM8/19/12

to

>>> Andi Kleen <an...@firstfloor.org> 08/19/12 4:59 AM >>>

>I verified this generates the same binary (on 64bit) as the original
>register variable.

This isn't very surprising given that the modified code is inside a
CONFIG_X86_32 conditional (as ought to be obvious from the code using
%%esp). Given that it's being used as operand to a binary &, the resulting
code - if the compiler handles this only half way sensibly - can hardly be
expected to be identical.

>-register unsigned long current_stack_pointer asm("esp") __used;
>+#define current_stack_pointer ({ \
>+ unsigned long sp; \
>+ asm("mov %%esp,%0" : "=r" (sp)); \
>+ sp; \
>+})

It would get closer to the original if you used "=g" (I noticed in a few
earlier patches already that you like to use "=r" in places where a register
is not strictly required, thus reducing the flexibility the compiler has).

Also, given that this is more a workaround for a compiler deficiency,
shouldn't this be conditional upon use of LTO?

Jan

Jan Beulich

unread,

Aug 19, 2012, 4:50:04 AM8/19/12

to

>>> Andi Kleen <an...@firstfloor.org> 08/19/12 5:05 AM >>>
>Work around a LTO gcc problem: when there is no reference to a variable
>in a module it will be moved to the end of the program. This causes
>reordering of initcalls which the kernel does not like.
>Add a dummy reference function to avoid this. The function is
>deleted by the linker.

This is not even true on x86, not to speak of generally.

>+#ifdef CONFIG_LTO
>+/* Work around a LTO gcc problem: when there is no reference to a variable
>+ * in a module it will be moved to the end of the program. This causes
>+ * reordering of initcalls which the kernel does not like.
>+ * Add a dummy reference function to avoid this. The function is
>+ * deleted by the linker.
>+ */
>+#define LTO_REFERENCE_INITCALL(x) \
>+ ; /* yes this is needed */ \
>+ static __used __exit void *reference_##x(void) \

Why not put it into e.g. section .discard.text? That could be expected to be
discarded by the linker without being arch dependent, as long as all arches
use DISCARDS in their linker script.

Jan Beulich

unread,

Aug 19, 2012, 5:00:01 AM8/19/12

to

>>> Andi Kleen <an...@firstfloor.org> 08/19/12 4:59 AM >>>

>@@ -1904,6 +1904,10 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
>
> switch (sym[i].st_shndx) {
> case SHN_COMMON:
>+ /* Ignore common symbols */
>+ if (!strncmp(name, "__gnu_lto", 9))
>+ break;
>+
> /* We compiled with -fno-common. These are not
> supposed to happen. */
> pr_debug("Common symbol: %s\n", name);

I think it is dangerous to just match the start of the symbol name here -
this may in the future well lead to ignoring symbols we shouldn't be
ignoring.

Also I would think the added comment ought to say "Ignore LTO symbols."
Otherwise its sort of contradicting the purpose of the case being handled
here.

Jan

Avi Kivity

unread,

Aug 19, 2012, 5:10:02 AM8/19/12

to

On 08/19/2012 05:56 AM, Andi Kleen wrote:
> From: Andi Kleen <a...@linux.intel.com>
>

> The VMX code references a local assembler label between two inline
> assembler statements. This assumes they both end up in the same
> assembler files. In some experimental builds of gcc this is not
> necessarily true, causing linker failures.
>
> Replace the local label reference with a more traditional asmlinkage
> extern.
>
> This also eliminates one assembler statement and
> generates a bit better code on 64bit: the compiler can
> use a RIP relative LEA instead of a movabs, saving
> a few bytes.

I'm happy to see work on lto-enabling the kernel.

>
> +extern __visible unsigned long kvm_vmx_return;
> +
> /*
> * Set up the vmcs's constant host-state fields, i.e., host-state fields that
> * will not change in the lifetime of the guest.
> @@ -3753,8 +3755,7 @@ static void vmx_set_constant_host_state(void)
> native_store_idt(&dt);
> vmcs_writel(HOST_IDTR_BASE, dt.address); /* 22.2.4 */
>
> - asm("mov $.Lkvm_vmx_return, %0" : "=r"(tmpl));
> - vmcs_writel(HOST_RIP, tmpl); /* 22.2.5 */
> + vmcs_writel(HOST_RIP, (unsigned long)&kvm_vmx_return); /* 22.2.5 */
>
> rdmsr(MSR_IA32_SYSENTER_CS, low32, high32);
> vmcs_write32(HOST_IA32_SYSENTER_CS, low32);
> @@ -6305,9 +6306,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
> /* Enter guest mode */
> "jne .Llaunched \n\t"
> __ex(ASM_VMX_VMLAUNCH) "\n\t"
> - "jmp .Lkvm_vmx_return \n\t"
> + "jmp kvm_vmx_return \n\t"
> ".Llaunched: " __ex(ASM_VMX_VMRESUME) "\n\t"
> - ".Lkvm_vmx_return: "
> + ".globl kvm_vmx_return\n"
> + "kvm_vmx_return: "
> /* Save guest registers, load host registers, keep flags */
> "mov %0, %c[wordsize](%%"R"sp) \n\t"
> "pop %0 \n\t"
>

The reason we use a local label is so that we the function isn't split
into two from the profiler's point of view. See cd2276a795b013d1.

One way to fix this is to have a .data variable initialized to point to
.Lkvm_vmx_return (this can be done from the same asm statement in
vmx_vcpu_run), and reference that variable in
vmx_set_constant_host_state(). If no one comes up with a better idea,
I'll write a patch doing this.

--
error compiling committee.c: too many arguments to function

Andi Kleen

unread,

Aug 19, 2012, 11:10:02 AM8/19/12

to

On Sun, Aug 19, 2012 at 09:46:04AM +0100, Jan Beulich wrote:
> >>> Andi Kleen <an...@firstfloor.org> 08/19/12 5:05 AM >>>
> >Work around a LTO gcc problem: when there is no reference to a variable
> >in a module it will be moved to the end of the program. This causes
> >reordering of initcalls which the kernel does not like.
> >Add a dummy reference function to avoid this. The function is
> >deleted by the linker.
>
> This is not even true on x86, not to speak of generally.

Why is it not true ?

__initcall is only defined for !MODULE and there __exit discards.

>
> >+#ifdef CONFIG_LTO
> >+/* Work around a LTO gcc problem: when there is no reference to a variable
> >+ * in a module it will be moved to the end of the program. This causes
> >+ * reordering of initcalls which the kernel does not like.
> >+ * Add a dummy reference function to avoid this. The function is
> >+ * deleted by the linker.
> >+ */
> >+#define LTO_REFERENCE_INITCALL(x) \
> >+ ; /* yes this is needed */ \
> >+ static __used __exit void *reference_##x(void) \
>
> Why not put it into e.g. section .discard.text? That could be expected to be
> discarded by the linker without being arch dependent, as long as all arches
> use DISCARDS in their linker script.

That's what __exit does, doesn't it?

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Andi Kleen

unread,

Aug 19, 2012, 11:20:01 AM8/19/12

to

> By moving this last line first you can avoid modifying the other two lines.

Ok.

>
> >--- a/arch/x86/include/asm/arch_hweight.h
> >+++ b/arch/x86/include/asm/arch_hweight.h
> >@@ -25,9 +25,14 @@ static inline unsigned int __arch_hweight32(unsigned int w)
> >{
> > unsigned int res = 0;
> >
> >+#ifdef CONFIG_LTO
> >+ res = __sw_hweight32(w);
> >+#else
> >+
> > asm (ALTERNATIVE("call __sw_hweight32", POPCNT32, X86_FEATURE_POPCNT)
> > : "="REG_OUT (res)
> > : REG_IN (w));
> >+#endif
>
> Isn't this a little to harsh? Rather than not using popcnt at all, why don't you just add
> the necessary clobbers to the asm() in the LTO case?

unread,

Aug 19, 2012, 11:30:02 AM8/19/12

to

On Sun, Aug 19, 2012 at 06:12:57PM +0300, Avi Kivity wrote:
> On 08/19/2012 06:09 PM, Andi Kleen wrote:
> >> The reason we use a local label is so that we the function isn't split
> >> into two from the profiler's point of view. See cd2276a795b013d1.
> >
> > Hmm that commit message is not very enlightening.
> >
> > The goal was to force a compiler error?
>
> No, the goal was to avoid a global label in the middle of a function.
> The profiler interprets it as a new function. After your patch,

Ah got it now. I always used to have the same problem with sys_call_return.`

I wonder if there shouldn't be a way to tell perf to ignore a symbol.

> >>
> >> One way to fix this is to have a .data variable initialized to point to
> >> .Lkvm_vmx_return (this can be done from the same asm statement in
> >> vmx_vcpu_run), and reference that variable in
> >> vmx_set_constant_host_state(). If no one comes up with a better idea,
> >> I'll write a patch doing this.
> >
> > I'm not clear how that is better than my patch.
>
> My patch will not generate the artifact with kvm_vmx_return.

Ok fine for me. I'll keep this patch for now, until you have
something better.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Andi Kleen

unread,

Aug 19, 2012, 11:30:02 AM8/19/12

to

On Sun, Aug 19, 2012 at 08:53:03AM +0100, Jan Beulich wrote:
> >>> Andi Kleen <an...@firstfloor.org> 08/19/12 5:02 AM >>>
> >-extern const unsigned long kallsyms_addresses[] __attribute__((weak));
> >-extern const u8 kallsyms_names[] __attribute__((weak));
> >+extern __visible const unsigned long kallsyms_addresses[] __attribute__((weak));
> >+extern __visible const u8 kallsyms_names[] __attribute__((weak));
>
> Shouldn't we minimally aim at consistency here:
> - all attributes in a one place (I personally prefer the placement between type
> and name, for compatibility with other compilers, but there are rare cases -
> iirc not on declarations though - where gcc doesn't allow this)

Ok.

> - not using open coded __attribute__(()) when a definition (here: __weak) is
> available, or alternatively open coding all of them (__attribute__((weak, ...)))?

I just kept the original code. But yes it should be using __weak.
I can change that.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Andi Kleen

unread,

Aug 19, 2012, 11:30:02 AM8/19/12

to

On Sun, Aug 19, 2012 at 09:53:02AM +0100, Jan Beulich wrote:
> >>> Andi Kleen <an...@firstfloor.org> 08/19/12 4:59 AM >>>
> >@@ -1904,6 +1904,10 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
> >
> > switch (sym[i].st_shndx) {
> > case SHN_COMMON:
> >+ /* Ignore common symbols */
> >+ if (!strncmp(name, "__gnu_lto", 9))
> >+ break;
> >+
> > /* We compiled with -fno-common. These are not
> > supposed to happen. */
> > pr_debug("Common symbol: %s\n", name);
>
> I think it is dangerous to just match the start of the symbol name here -
> this may in the future well lead to ignoring symbols we shouldn't be
> ignoring.
>
> Also I would think the added comment ought to say "Ignore LTO symbols."
> Otherwise its sort of contradicting the purpose of the case being handled
> here.

Ok maybe should error out. This case only happens with fat LTO when
the LTO step is not actually run.

It used to happen because old versions of this patchkit
didn't correctly LTO modules

I'll change it to error out. The reason for the prefix was that
there is a __gnu_lto_vXXX and the version number could change.

Thanks for the reviewws.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Andi Kleen

unread,

Aug 19, 2012, 11:30:03 AM8/19/12

to

On Sun, Aug 19, 2012 at 01:27:00AM -0700, Jeremy Fitzhardinge wrote:
> On 08/18/2012 07:56 PM, Andi Kleen wrote:
> > From: Andi Kleen <a...@linux.intel.com>
> >
> > The paravirt thunks use a hack of using a static reference to a static
> > function to reference that function from the top level statement.
> >
> > This assumes that gcc always generates static function names in a specific
> > format, which is not necessarily true.
> >
> > Simply make these functions global and asmlinkage. This way the
> > static __used variables are not needed and everything works.
>
> I'm not a huge fan of unstaticing all this stuff, but it doesn't
> surprise me that the current code is brittle in the face of gcc changes.

Hmm actually reading my own patch again it may be wrong. You need
regparm(3) here right? asmlinkage forces it to (0). I'll change it to
__visible. I think I did that earlier for all the 32bit code, but missed
this one.

-Andi

Rusty Russell

unread,

Aug 20, 2012, 3:10:01 AM8/20/12

to

On Sat, 18 Aug 2012 19:56:23 -0700, Andi Kleen <an...@firstfloor.org> wrote:
> @@ -78,11 +78,13 @@ extern struct module __this_module;
>
> #else /* !CONFIG_MODULES... */
>
> -#define EXPORT_SYMBOL(sym)
> -#define EXPORT_SYMBOL_GPL(sym)
> -#define EXPORT_SYMBOL_GPL_FUTURE(sym)
> -#define EXPORT_UNUSED_SYMBOL(sym)
> -#define EXPORT_UNUSED_SYMBOL_GPL(sym)
> +/* Even without modules keep the __visible side effect */
> +
> +#define EXPORT_SYMBOL(sym) extern typeof(sym) sym __visible
> +#define EXPORT_SYMBOL_GPL(sym) extern typeof(sym) sym __visible
> +#define EXPORT_SYMBOL_GPL_FUTURE(sym) extern typeof(sym) sym __visible
> +#define EXPORT_UNUSED_SYMBOL(sym) extern typeof(sym) sym __visible
> +#define EXPORT_UNUSED_SYMBOL_GPL(sym) extern typeof(sym) sym __visible
>
> #endif /* CONFIG_MODULES */

Really, why? Seems like a win to have them eliminated if unused.

Naively, I would think many cases of __visible should be #ifdef
CONFIG_MODULES. What am I missing?

Thanks,
Rusty.

Ingo Molnar

unread,

Aug 20, 2012, 3:50:01 AM8/20/12

to

* Andi Kleen <an...@firstfloor.org> wrote:

> This rather large patchkit enables gcc Link Time Optimization (LTO)
> support for the kernel.
>
> With LTO gcc will do whole program optimizations for
> the whole kernel and each module. This increases compile time,
> but can generate faster code.

By how much does it increase compile time?

How much faster does kernel code get?

Last time I checked LTO optimizations (half a year ago) it
resulted in significantly slower build times.

I tried out and measured the LTO speedups and was less than
impressed by them - a lot of build time increase for not much
increase in performance. There was also visible, ongoing
maintenance cost.

The combination of these seemed like a show-stopper.

It's obviously an optimization feature we should consider, but
we really need hard numbers to make a cost/benefit analysis.

Thanks,

Ingo

Herbert Xu

unread,

Aug 20, 2012, 4:30:02 AM8/20/12

to

On Sat, Aug 18, 2012 at 07:56:32PM -0700, Andi Kleen wrote:
> From: Andi Kleen <a...@linux.intel.com>
>

> Cc: her...@gondor.apana.org.au
> Signed-off-by: Andi Kleen <a...@linux.intel.com>

Acked-by: Herbert Xu <her...@gondor.apana.org.au>
--
Email: Herbert Xu <her...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Herbert Xu

unread,

Aug 20, 2012, 4:30:02 AM8/20/12

to

On Sat, Aug 18, 2012 at 07:56:31PM -0700, Andi Kleen wrote:
> From: Andi Kleen <a...@linux.intel.com>
>

> Various tables in aes_generic are accessed by assembler code.
> Mark them __visible for LTO

Takashi Iwai

unread,

Aug 20, 2012, 4:40:03 AM8/20/12

to

At Sat, 18 Aug 2012 19:56:22 -0700,

Andi Kleen wrote:
>
> From: Andi Kleen <a...@linux.intel.com>
>

> The new LTO EXPORT_SYMBOL references symbols even without CONFIG_MODULES.
> Since these functions are macros in this case this doesn't work.
> Add a ifdef to fix the build.
>
> Cc: ti...@suse.de
> Signed-off-by: Andi Kleen <a...@linux.intel.com>

Reviewed-by: Takashi Iwai <ti...@suse.de>

I haven't seen the background, so let me ask a dumb question:
is it a 3.6 fix or for 3.7?

And shall I apply this one to sound git tree, or would you like to
apply all in a single tree?

thanks,

Takashi

> ---
> sound/core/seq/seq_device.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/sound/core/seq/seq_device.c b/sound/core/seq/seq_device.c
> index 5cf8d65..60e8fc1 100644
> --- a/sound/core/seq/seq_device.c
> +++ b/sound/core/seq/seq_device.c
> @@ -569,5 +569,7 @@ EXPORT_SYMBOL(snd_seq_device_load_drivers);
> EXPORT_SYMBOL(snd_seq_device_new);
> EXPORT_SYMBOL(snd_seq_device_register_driver);
> EXPORT_SYMBOL(snd_seq_device_unregister_driver);
> +#ifdef CONFIG_MODULES
> EXPORT_SYMBOL(snd_seq_autoload_lock);
> EXPORT_SYMBOL(snd_seq_autoload_unlock);
> +#endif
> --
> 1.7.7.6

Avi Kivity

unread,

Aug 20, 2012, 5:20:02 AM8/20/12

to

On 08/19/2012 05:56 AM, Andi Kleen wrote:
> From: Andi Kleen <a...@linux.intel.com>
>

> The fancy x86 hweight uses different compiler options for the
> hweight file. This does not work with LTO. Just disable the optimization
> with LTO

>
> Signed-off-by: Andi Kleen <a...@linux.intel.com>
> ---

> arch/x86/Kconfig | 5 +++--
> arch/x86/include/asm/arch_hweight.h | 9 +++++++++
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 8ec3a1a..9382b09 100644

> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -224,8 +224,9 @@ config X86_32_LAZY_GS
>
> config ARCH_HWEIGHT_CFLAGS
> string
> - default "-fcall-saved-ecx -fcall-saved-edx" if X86_32
> - default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64
> + default "-fcall-saved-ecx -fcall-saved-edx" if X86_32 && !LTO
> + default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64 && !LTO
> + default "" if LTO
>

Seems heavy handed. How about using __attribute__((optimize(...))) instead?

--
error compiling committee.c: too many arguments to function

Andi Kleen

unread,

Aug 20, 2012, 5:50:02 AM8/20/12

to

> Really, why? Seems like a win to have them eliminated if unused.
>
> Naively, I would think many cases of __visible should be #ifdef
> CONFIG_MODULES. What am I missing?

It worked around some problem I forgot now :)

You're right it shouldn't be needed in theory for !MODULES. I'll double
check.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Andi Kleen

unread,

Aug 20, 2012, 5:50:02 AM8/20/12

to

> > config ARCH_HWEIGHT_CFLAGS
> > string
> > - default "-fcall-saved-ecx -fcall-saved-edx" if X86_32
> > - default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64
> > + default "-fcall-saved-ecx -fcall-saved-edx" if X86_32 && !LTO
> > + default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64 && !LTO
> > + default "" if LTO
> >
>
> Seems heavy handed. How about using __attribute__((optimize(...))) instead?

Doesn't work for this. In fact according to the gcc developers that
attribute is mostly broken.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.