Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Compiling Linux with "bdver2" gcc optimization option

45 views
Skip to first unread message

Franco Martelli

unread,
Aug 13, 2019, 11:20:05 AM8/13/19
to
Hi, everybody

in order to achieve Linux kernel optimized for my CPU AMD FX-8350
Bulldozer2 I changed the line 121 of linux-source-4.19/arch/x86/Makefile
from:

cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)

to:

cflags-$(CONFIG_MK8) += $(call cc-option,-march=bdver2) \
$(call cc-option,-mtune=bdver2,$(call
cc-option,-mtune=generic))

compiling the kernel up to Debian 9.x stretch all worked fine but with
Debian 10 buster I get a lot of warning messages:

<snip>
mm/memory.o: warning: objtool: remap_pfn_range()+0xd5: unsupported
intra-function call
mm/memory.o: warning: objtool: If this is a retpoline, please patch it
in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.
mm/mlock.o: warning: objtool: __munlock_isolate_lru_page()+0xd6: stack
state mismatch: cfa1=7+64 cfa2=7+56
mm/mlock.o: warning: objtool: clear_page_mlock()+0x39: unsupported
instruction in callable function
mm/mlock.o: warning: objtool: mlock_vma_page()+0x8e: sibling call from
callable instruction with modified stack frame
mm/mlock.o: warning: objtool: munlock_vma_page()+0x163: return with
modified stack frame
kernel/bpf/stackmap.o: warning: objtool: bpf_get_stack()+0x68: return
with modified stack frame
kernel/bpf/sockmap.o: warning: objtool: bpf_exec_tx_verdict()+0x436:
stack state mismatch: cfa1=7+168 cfa2=7+160
kernel/power/qos.o: warning: objtool: pm_qos_remove_request()+0x6d:
return with modified stack frame
arch/x86/kernel/dumpstack.o: warning: objtool: __die()+0xc2: return with
modified stack frame
arch/x86/kernel/cpu/amd.o: warning: objtool: bsp_init_amd()+0xb1: can't
find jump dest instruction at .text+0x193
arch/x86/kvm/x86.o: warning: objtool: kvm_set_cr3()+0x18: can't find
jump dest instruction at .text+0x5cc0
arch/x86/kernel/ldt.o: warning: objtool: write_ldt()+0x110: can't find
jump dest instruction at .text+0x56a
mm/mmap.o: warning: objtool: init_user_reserve()+0x34: return with
modified stack frame
mm/mmap.o: warning: objtool: init_admin_reserve()+0x34: return with
modified stack frame
mm/mmap.o: warning: objtool: vm_brk_flags()+0x55: stack state mismatch:
cfa1=7+80 cfa2=7+72
mm/mmap.o: warning: objtool: do_mmap()+0x1bf: stack state mismatch:
cfa1=7+80 cfa2=7+72
mm/mprotect.o: warning: objtool: change_protection()+0x4f5: can't find
jump dest instruction at .text+0x5a3
mm/mremap.o: warning: objtool: move_page_tables()+0x60: stack state
mismatch: cfa1=7+144 cfa2=7+136
mm/mremap.o: warning: objtool: __se_sys_mremap()+0x12d: stack state
mismatch: cfa1=7+160 cfa2=7+152
arch/x86/kernel/sys_x86_64.o: warning: objtool: get_align_mask()+0x1d:
can't find jump dest instruction at .text+0x2f
mm/page_vma_mapped.o: warning: objtool: check_pte()+0x108: return with
modified stack frame
mm/page_vma_mapped.o: warning: objtool: page_vma_mapped_walk()+0x15e:
stack state mismatch: cfa1=7+56 cfa2=7+40
mm/pagewalk.o: warning: objtool: __walk_page_range()+0x1be: return with
modified stack frame
kernel/printk/printk.o: warning: objtool: devkmsg_write.cold.15()+0x30:
can't find jump dest instruction at .text.unlikely+0x2a1
arch/x86/kvm/emulate.o: warning: objtool: rsm_load_state_32()+0x2e2:
can't find jump dest instruction at .text+0xfa4
arch/x86/kvm/i8254.o: warning: objtool: pit_ioport_read()+0x141: can't
find jump dest instruction at .text+0x7cb
arch/x86/kvm/mmu.o: warning: objtool:
kvm_calc_tdp_mmu_root_page_role()+0xb: can't find jump dest instruction
at .text+0x3fd
arch/x86/kvm/lapic.o: warning: objtool: recalculate_apic_map()+0x2f6:
can't find jump dest instruction at .text+0x968
mm/rmap.o: warning: objtool: try_to_unmap_one()+0x4d1: can't find jump
dest instruction at .text+0x185b
arch/x86/kvm/ioapic.o: warning: objtool:
rtc_irq_eoi_tracking_reset()+0x45: return with modified stack frame
arch/x86/kvm/ioapic.o: warning: objtool: ioapic_mmio_write()+0x62: stack
state mismatch: cfa1=7+48 cfa2=7+40
arch/x86/kvm/ioapic.o: warning: objtool: ioapic_mmio_read()+0xe5: stack
state mismatch: cfa1=7+64 cfa2=7+56
arch/x86/kvm/ioapic.o: warning: objtool: kvm_get_ioapic()+0x72: return
with modified stack frame
arch/x86/kvm/ioapic.o: warning: objtool: kvm_set_ioapic()+0x105: return
with modified stack frame
arch/x86/kvm/irq_comm.o: warning: objtool: kvm_set_msi_irq()+0x60:
return with modified stack frame
kernel/rcu/sync.o: warning: objtool: rcu_sync_init()+0x52: return with
modified stack frame
mm/vmalloc.o: warning: objtool: vmalloc_to_page()+0x150: return with
modified stack frame
mm/vmalloc.o: warning: objtool: vunmap_page_range()+0x2fc: return with
modified stack frame
mm/vmalloc.o: warning: objtool: vm_unmap_ram()+0x11f: sibling call from
callable instruction with modified stack frame
mm/vmalloc.o: warning: objtool: vmap_page_range_noflush()+0x2ec: return
with modified stack frame
mm/vmalloc.o: warning: objtool: vread()+0x1cf: stack state mismatch:
cfa1=7+96 cfa2=7+88
mm/vmalloc.o: warning: objtool: vwrite()+0x176: stack state mismatch:
cfa1=7+96 cfa2=7+88
arch/x86/kvm/cpuid.o: warning: objtool: do_cpuid_ent()+0x6b4: can't find
jump dest instruction at .text+0xba6
arch/x86/kvm/pmu.o: warning: objtool: reprogram_fixed_counter()+0xbb:
can't find jump dest instruction at .text+0x37d
kernel/rcu/srcutree.o: warning: objtool: process_srcu()+0x50: stack
state mismatch: cfa1=7+128 cfa2=7+120
kernel/rcu/srcutree.o: warning: objtool: __call_srcu()+0xba: sibling
call from callable instruction with modified stack frame
arch/x86/kernel/alternative.o: warning: objtool:
apply_alternatives()+0x10f: stack state mismatch: cfa1=7+336 cfa2=7+328
arch/x86/kernel/alternative.o: warning: objtool: apply_paravirt()+0x118:
stack state mismatch: cfa1=7+296 cfa2=7+288
arch/x86/kvm/hyperv.o: warning: objtool:
kvm_hv_notify_acked_sint()+0x4a: can't find jump dest instruction at
.text+0x20c
mm/madvise.o: warning: objtool: swapin_walk_pmd_entry()+0x1ec: stack
state mismatch: cfa1=7+88 cfa2=7+80
mm/madvise.o: warning: objtool: madvise_free_pte_range()+0x39e: stack
state mismatch: cfa1=7+136 cfa2=7+128
arch/x86/kernel/tsc_msr.o: warning: objtool: cpu_khz_from_msr()+0x99:
can't find jump dest instruction at .text+0x36
kernel/rcu/tree.o: warning: objtool: rcu_exp_wait_wake()+0x224: return
with modified stack frame
kernel/rcu/tree.o: warning: objtool:
_synchronize_rcu_expedited.constprop.55()+0x1c7: stack state mismatch:
cfa1=7+192 cfa2=7+184
arch/x86/kernel/tsc.o: warning: objtool:
pit_hpet_ptimer_calibrate_cpu()+0x1c4: stack state mismatch: cfa1=7+112
cfa2=7+104
arch/x86/kernel/tsc.o: warning: objtool:
tsc_refine_calibration_work()+0xd8: stack state mismatch: cfa1=7+48
cfa2=7+40
</snip>

what does it means? Is there a way to get the kernel optimized for my
CPU as it happened in the previous Debian versions?

Thanks for any answer
--
Franco Martelli

Franco Martelli

unread,
Aug 14, 2019, 12:10:05 PM8/14/19
to
On 13/08/19 at 19:35, Étienne Mollier wrote:
> Hi Franco,
>
> I'm not fluent enough in GCC 8 for x86_64 to answer to all the
> various warnings you indicated. Some may be harmless, and some
> may eat your data. I would do a few tests with a virtual
> machine supporting bdver2 instructions before going live anyway,
> and backups stored far away from the machine once testing, and
> possibly without contact with that kernel.

I didn't boot that kernel, I don't rely on it. Thanks if you can
investigate on what happens during compilation process.
>
> I also recall having had to move from ORC to DWARF unwinder to
> get the build working, but that was on old OS levels, not on
> newer ones, due to the libelf being too old.
>
> Some of these seem related to CPU vulnerabilities mitigations,
> and might be worth a bug report against the kernel, either
> Debian or upstream, assuming it also appears /without/ your
> -march=bdver2 flag:
>
>> mm/memory.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.

I had asked to debian-kernel mailing list but nobody answered, maybe
could be something related to gcc 8 since all previous Debian kernel
versions worked with bdver2 optimization
>
> Note that someone from the Gentoo community has developed a set
> of patches to expand the possibilities of optimization for the
> kernel, depending on Linux and GCC versions. You may be
> interested in the following one for Buster:
>
> https://github.com/graysky2/kernel_gcc_patch/blob/master/enable_additional_cpu_optimizations_for_gcc_v8.1%2B_kernel_v4.13%2B.patch
>
> These mainly apply changes in various code sections to put the
> flags in place, and provide options through the .config file of
> the source code. I haven't tested it, but I don't believe this
> will solve your warnings, reading through the patch. Yet it
> does a bit more than just replacing the compiler flag: there is
> notably a component related to L1 cache shift which is modified
> too. That should bring an appreciable performance boost if it
> corrects cache line mismatch.

Thanks, but I don't want to patch the kernel, that change to the
Makefile was enough simple in order to get the optimization that I
looking for.
>
> Please be aware that CPU optimizations in kernel, targeting Zen
> and Skylake in this case, seemed to be hardly detectable, or
> even counter productive, with various computer usage patterns,
> according to measures done by Phoronix earlier this year:
>
> https://www.phoronix.com/scan.php?page=article&item=linux-50-march&num=1
>
> Of course this may not be the case for your own typical load,
> but I would recommend to do a few measures, to assess the actual
> performance gain on your machine with, and without, CPU specific
> compiler optimizations.

I never experimented benchmark with and without bdver2 option, I assumed
that if it exists an option for k8 in the kernel then changing it to
bdver2 it would be good (I hope).

--
Franco Martelli

Franco Martelli

unread,
Aug 16, 2019, 3:40:04 PM8/16/19
to
On 16/08/19 at 17:22, Étienne Mollier wrote:
> Bonjour,
>
> Woops, this sounds a bit like I might not have used a very clear
> wording. If I were at your place, I would proceed so; but I
> don't have a Piledriver CPU to do actual testing on my side.
> I'm still stuck with an old K10, not to mention my laptop, which
> comes with an old regular Atom. :)
>
> I did try to replace the k8 option by amdfam10 though. In the
> half hundred thousand lines of logs issued by the build, I get
> something like a dozen differences between k8 and k10. There
> were a tremendous amount of warnings too, but some of the ones
> you encountered did not appear: the thing with the missing jump
> target for instance, nor the ANNOTATE_NOSPEC_ALTERNATIVE on the
> retpoline thing. I am running Debian Sid, currently shipping
> with Gcc 9, so this is a difference to take in account though.
> Finally, building an upstream Linux 5.2 kernel instead of
> Buster's 4.19 does not show most of the warnings I encountered,
> as these are being fixed as they come, but probably not as well
> in LTS kernels.
>
> Doing a third run with addition of the tuning options (-mtune)
> made almost no difference at all, except on the build number and
> the CRC hash. It seems to me that the architecture specific
> (-march) option already applies the proper tuning, at least for
> my architecture.
>
> My last manipulation consisted in building Linux upstream 5.2.9,
> released lately, with -march=amdfam10, and this one is running
> quite well so far:
>
> $ uname -rv
> 5.2.9-k10 #1 SMP PREEMPT Fri Aug 16 16:13:08 CEST 2019
>
> But again, no messages worth mentioning during the compilation.
>
> Do your warnings appear when your build targets k8?
> Or when building a generic x86_64 kernel?

Actually I run kernel built with "k8" option, it works fine, I got no
warning during the compilation.

Investigating deeper your tips about "amdfam10" I checked the gcc
options web page:
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
amdfam10 optimization was for Family 10 CPU but I have a Family 15h CPU
I notice that it also exists a "bdver1" for my CPU family so I wanted
give it a try and I compiled the kernel source with "bdver1" and
surprise I got no warning, all worked fine, :-) the command line I use
to compile is:

~/linux-source-4.19$ time make -s -j9 ; make -s -j9 modules

> Compilers may have good optimization routines to boost the speed
> of the code in several situations, but in other ones there are
> trade-offs to take between size and performance of the code. I
> personally prefer smaller sized executables (-Os): they fit in
> less pages, so uses less CPU cache, and leave more room for my
> programs to get more of their own data in cache (or I might
> simply have spent too much time on suckless.org. ;)

Do you remember which kernel CONFIG switch lets to do this optimization?

>
> Activating CPU specific options is interesting on some
> particular use cases, but newer instruction often require
> setting up various bits in the CPU before use, which tends to
> inflate the resulting executable. This may be interesting for
> scientific applications, or programs dealing with big data
> arrays in general. In kernel mode however, the only case I can
> think of where CPU specific accelerators would be beneficial are
> disk ciphering and RAID arrays, for which I believe there is
> already some runtime detection of available instructions, even
> with the generic compiler options.

I have four disks in a RAID 5 software array configuration on my system,
they are managed by mdadm this is my /proc/mdstat file:

$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sda1[0] sdb1[1] sdd1[3](S) sdc1[2]
1953258496 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
[UUU]

unused devices: <none>

>
> To be honest, I don't believe the performance gain to get from
> the compiler is tremendous here. Figures from the author of the
> patch are there to tell us there is a gain indeed; but when you
> investigate in detail the percentage of performance brought by
> the tuning, it is only about 0.03% for the selected benchmark on
> median values. See the "Data" section at the very end of the
> README, and do your own calculations:
>
> https://github.com/graysky2/kernel_gcc_patch/blob/master/README.md
>
> The best you can do here is to do your own measures with your
> own pattern of usage. If you are a developer, you can run timed
> builds of Linux, and see the time it takes. If you are inclined
> toward image rendering speeds, there are a few demo-scenes out
> there where you might get a few figures such as the frame rate
> (careful, glxgears may get capped to 60Hz when some accelerators
> are in use, prefer fancier demos. ;)
>
> There is also this other thread dealing with kernel latency
> measures; you may find a few useful tools listed in this
> discussion:
>
> https://lists.debian.org/debian-user/2019/08/msg00851.html
>
> Or just see how perform your usual programs, if there are
> visible improvements.
>
> Have fun, :)
>
Yes I agree the optimization won't impact on performance in a way that
is perceptively by an human there are tweak more important in the kernel
such as CONFIG_HZ_1000=y
I always take measurement of the time employee by kernel compilation out
of curiosity.
Thanks again for the tips, best regards

--
Franco Martelli

Franco Martelli

unread,
Aug 19, 2019, 3:00:04 PM8/19/19
to
I was thinking to submit a bug report against gcc-8 package. Now that I
have a work around, "bdver1" compiles without warnings, I can say
enough, what do you think about?
Best regards

--
Franco Martelli

Franco Martelli

unread,
Aug 20, 2019, 3:20:04 PM8/20/19
to
On 19/08/19 at 21:18, Étienne Mollier wrote:
> Franco Martelli, on 2019-08-19:
>> I was thinking to submit a bug report against gcc-8 package. Now that I
>> have a work around, "bdver1" compiles without warnings, I can say
>> enough, what do you think about?
>
> I don't know, to me it sounds more like little bugs on kernel
> side,
[ ... ]
> Gcc-8 on its side is just trying its best to help one to develop
> better code. Its heuristics may not apply very well on kernel
> object code however. If you can reproduce this issue and
> identify it as a false positive with a sample code, that is
> another story of course.

you're right, I compiled tar and hello program with -march=bdver2 option
without problem so gcc-8 is sure. I saw that all warnings that they
appear during kernel compilation process concern "objtool"

mm/memory.o: warning: objtool: remap_pfn_range()+0xd5: unsupported
intra-function call

that it's part of linux-kbuild-4.19 package maybe I should submit a bug
report to this package or is another one a better choice?

Étienne Mollier

unread,
Aug 21, 2019, 5:00:04 AM8/21/19
to
Franco Martelli, on 2019-08-20:
> mm/memory.o: warning: objtool: remap_pfn_range()+0xd5: unsupported intra-function call
>
> that it's part of linux-kbuild-4.19 package maybe I should submit a bug
> report to this package or is another one a better choice?

Hi Franco,

Should you submit a bug report, it might be a good target. The
end result would be something like a bug against the kernel,
although it has more to do with the toolbox around its building
procedure. Please make sure to include the context of your
build, the optimization with -march=bdver2, if you proceed.

But before doing that, may I suggest to have a look at the
"Compile-time stack metadata validation", available in
tools/objtool/Documentation/stack-validation.txt? It is very
interesting, I only stumbled upon it recently, it describes the
purpose of objtool. You can read it from Linux source code, or
online here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/objtool/Documentation/stack-validation.txt

Furthermore, it answers accurately to your original question
from the 13th of August:
> compiling the kernel up to Debian 9.x stretch all worked fine but with
> Debian 10 buster I get a lot of warning messages:
>
> <snip>
> mm/memory.o: warning: objtool: remap_pfn_range()+0xd5: unsupported intra-function call
[...]
> arch/x86/kernel/tsc.o: warning: objtool: tsc_refine_calibration_work()+0xd8: stack state mismatch: cfa1=7+48 cfa2=7+40
> </snip>
>
> what does it means?

Short answer, it means that the -march=bdver2 optimization flag
is interfering with the static stack frame analyser at kernel
build time, probably by adjunction of unrecognised CPU
instructions, at least unrecognised by objtool, inside the
object code.

Kind regards,
--
Étienne Mollier <etienne...@mailoo.org>
5ab1 4edf 63bb ccff 8b54 2fa9 59da 56fe fff3 882d
Note to myself: RTWM, Reread The Warning Message


signature.asc
0 new messages