Hello,
I'm enabling clang LTO to improve code size of Uefi standard (http://www.uefi.org/) firmware (https://github.com/tianocore/edk2), which is mostly C code. My project is in https://github.com/shijunjing/edk2 branch llvm : https://github.com/shijunjing/edk2/tree/llvm. I find my most firmware modules work well after enable LTO, but some X64 modules will not run (e.g. hang with CPU exception) , and these X64 modules work well if build with the LTO disabled (-fno-lto).
I don’t know how to efficiently debug these LTO wrong code and investigate if there is compiler’s bug. I appreciate if anyone can give me some suggestions about the clang LTO issue debug method, commands, or BKMs.
Below are my clang LTO build tools and options, I use clang 3.8 release with binutils 2.26 ld (I’ve pushed ld support LLVM gold plugin https://sourceware.org/bugzilla/show_bug.cgi?id=20070). Any suggestion is welcome!
##################
# CLANGLTO38 X64 definitions
##################
*_CLANGLTO38_X64_OBJCOPY_PATH = DEF(GCC53_X64_PREFIX)objcopy
*_CLANGLTO38_X64_CC_PATH = DEF(CLANG38_X64_PREFIX)clang
*_CLANGLTO38_X64_SLINK_PATH = DEF(CLANG38_X64_PREFIX)llvm-ar
*_CLANGLTO38_X64_DLINK_PATH = DEF(CLANG38_X64_PREFIX)clang
*_CLANGLTO38_X64_ASM_PATH = DEF(CLANG38_X64_PREFIX)clang
*_CLANGLTO38_X64_PP_PATH = DEF(CLANG38_X64_PREFIX)clang
*_CLANGLTO38_X64_RC_PATH = DEF(GCC53_X64_PREFIX)objcopy
*_CLANGLTO38_X64_CC_FLAGS = -c -fshort-wchar -fno-strict-aliasing -Wall -Werror -Wno-array-bounds -Wno-empty-body -ffunction-sections -fdata-sections -include AutoGen.h -DSTRING_ARRAY_NAME=$(BASE_NAME)Strings -fno-stack-protector -fno-builtin -mms-bitfields -Wno-address -Wno-shift-negative-value -Wno-parentheses-equality -Wno-unknown-pragmas -Wno-tautological-constant-out-of-range-compare -Wno-incompatible-library-redeclaration -target x86_64-pc-linux-gnu -fno-asynchronous-unwind-tables -m64 -Wno-enum-conversion "-DEFIAPI=__attribute__((ms_abi))" -mno-red-zone -mcmodel=large -g -Os -flto
*_CLANGLTO38_X64_DLINK_FLAGS = -flto -nostdlib -Wl,-n -Wl,-q -Wl,--gc-sections -Wl,-z,common-page-size=0x40 -Wl,--entry,$(IMAGE_ENTRY_POINT) -Wl,-u,$(IMAGE_ENTRY_POINT) -Wl,-Map,$(DEST_DIR_DEBUG)/$(BASE_NAME).map -Wl,-melf_x86_64 -Wl,--oformat=elf64-x86-64
*_CLANGLTO38_X64_ASM_FLAGS = -c -x assembler -imacros $(DEST_DIR_DEBUG)/AutoGen.h -m64 -target x86_64-pc-linux-gnu
*_CLANGLTO38_X64_RC_FLAGS = -I binary -O elf64-x86-64 -B i386 --rename-section .data=.hii
*_CLANGLTO38_X64_NASM_FLAGS = -f elf64
Steven Shi
Intel\SSG\STO\UEFI Firmware
Tel: +86 021-61166522
iNet: 821-6522
Brute force method is ,get the disassemble of the hanged function and
try to check the difference with and without LTO in the generated
code.
or try to attach gdb and check for the instruction ,that cause the exception .
Thank you
~Umesh
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hi Umesh,
Thank you for the suggestion. I can use the "Brute force method " to narrow down the LTO wrong instructions here and there, but I still don't know why these wrong instructions are generated, and how to let Clang LTO don't generate those wrong instructions.
I suspect the wrong code is caused by some LTO wrong optimization pass, so I hope to disable all optimizations in the LTO firstly, then enable them one by one later to narrow down my issue root cause. But when I try to disable the optimization by enforcing –O0 in the LTO build, I find the ld fails to recognize some clang bitcode library, and fail to link.
e.g. use the Clang_LTO_Fails_On_LD example in below bug attachment
https://sourceware.org/bugzilla/show_bug.cgi?id=20070
If I enforce the -O0 to disable the optimization in LTO, the ld fail to link:
~/clang38/bin/clang -o Hello.dll -flto -O0 -nostdlib -Wl,-n -Wl,-q -Wl,--gc-sections -Wl,-z,common-page-size=0x40 -Wl,--entry,_ModuleEntryPoint -Wl,-u,_ModuleEntryPoint -Wl,-Map,Hello.map -Wl,-melf_x86_64 -Wl,--oformat=elf64-x86-64 -Wl,--start-group,,@static_library_files.lst -Wl,--end-group
BaseLib.lib: error adding symbols: File format not recognized
clang-3.8: error: linker command failed with exit code 1 (use -v to see invocation)
But if I enable the -O1, -O2, or higher -On, the ld link pass:
~/clang38/bin/clang -o Hello.dll -flto -O1 -nostdlib -Wl,-n -Wl,-q -Wl,--gc-sections -Wl,-z,common-page-size=0x40 -Wl,--entry,_ModuleEntryPoint -Wl,-u,_ModuleEntryPoint -Wl,-Map,Hello.map -Wl,-melf_x86_64 -Wl,--oformat=elf64-x86-64 -Wl,--start-group,,@static_library_files.lst -Wl,--end-group
How can I correctly disable all the optimization in clang LTO? How can I know, enable and disable the specific optimizations in clang LTO? Any suggestion is welcomed!
> From: Umesh Kalappa [mailto:umesh.k...@gmail.com]
> Sent: Saturday, May 14, 2016 2:14 AM
> To: Shi, Steven <steve...@intel.com>
> Cc: llvm-dev <llvm...@lists.llvm.org>; cfe...@lists.llvm.org
> Subject: Re: [llvm-dev] How to debug if LTO generate wrong code?
Hello,
Let me ask a LTO simple question again. For the llvm LTO example in the link: http://llvm.org/docs/LinkTimeOptimization.html, I use below build commands to generate three different optimization level binary: -O0, -O1, -O2. By nm listing the foo1~4 symbols , I can see these different optimizations really works.
1. How can I know what different optimizations are used by the clang LTO among -O0, -O1 and -O2?
2. Is the compiler domain optimization (e.g. clang/llvm) or the linker (e.g. ld) domain optimization make these difference?
3. How can I explicitly enable or disable these specific optimizations besides using -O0, -O1, -O2?
$clang -emit-llvm -c main.c -o main.bc
$clang -emit-llvm -c a.c -o a.bc
$llvm-ar cr main.lib main.bc
$llvm-ar cr a.lib a.bc
$clang -O0 -flto main.lib a.lib -o main0
$clang -O1 -flto main.lib a.lib -o main1
$clang -O2 -flto main.lib a.lib -o main2
$nm main0
…
00000000004005a0 t foo1
0000000000400580 t foo2
00000000004005e0 t foo3
0000000000400530 t foo4
0000000000400500 t frame_dummy
…
$ nm main1
…
0000000000400550 t foo1
0000000000400580 t foo3
0000000000400530 t foo4
0000000000400500 t frame_dummy
…
$ nm main2
…
00000000004004d0 t frame_dummy
…
From blew verbose output, tt seems only linker( e.g. ld) is invovled to do the optimization?
$ clang -O2 -flto main.lib a.lib -o main2 -v
clang version 3.8.0 (tags/RELEASE_380/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.3.1
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.0.0
Found candidate GCC installation: /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
Selected GCC installation: /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
"/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o main2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtbegin.o -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0 -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../../../lib64 -L/usr/local/bin/../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../.. -L/usr/local/bin/../lib -L/lib -L/usr/lib -plugin /usr/local/bin/../lib/LLVMgold.so -plugin-opt=mcpu=x86-64 -plugin-opt=O2 main.lib a.lib -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o
On May 17, 2016, at 1:33 AM, Shi, Steven via llvm-dev <llvm...@lists.llvm.org> wrote:Hello,Let me ask a LTO simple question again. For the llvm LTO example in the link:http://llvm.org/docs/LinkTimeOptimization.html, I use below build commands to generate three different optimization level binary: -O0, -O1, -O2. By nm listing the foo1~4 symbols , I can see these different optimizations really works.1. How can I know what different optimizations are used by the clang LTO among -O0, -O1 and -O2?
2. Is the compiler domain optimization (e.g. clang/llvm) or the linker (e.g. ld) domain optimization make these difference?
3. How can I explicitly enable or disable these specific optimizations besides using -O0, -O1, -O2?
$clang -emit-llvm -c main.c -o main.bc$clang -emit-llvm -c a.c -o a.bc$llvm-ar cr main.lib main.bc$llvm-ar cr a.lib a.bc$clang -O0 -flto main.lib a.lib -o main0$clang -O1 -flto main.lib a.lib -o main1$clang -O2 -flto main.lib a.lib -o main2$nm main0…00000000004005a0 t foo10000000000400580 t foo200000000004005e0 t foo30000000000400530 t foo40000000000400500 t frame_dummy…$ nm main1…0000000000400550 t foo10000000000400580 t foo30000000000400530 t foo40000000000400500 t frame_dummy…$ nm main2…00000000004004d0 t frame_dummy…From blew verbose output, tt seems only linker( e.g. ld) is invovled to do the optimization?
$ clang -O2 -flto main.lib a.lib -o main2 -vclang version 3.8.0 (tags/RELEASE_380/final)Target: x86_64-unknown-linux-gnuThread model: posixInstalledDir: /usr/local/binFound candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.3.1Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.0.0Found candidate GCC installation: /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0Selected GCC installation: /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0Candidate multilib: .;@m64Candidate multilib: 32;@m32Selected multilib: .;@m64"/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o main2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtbegin.o -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0 -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../../../lib64 -L/usr/local/bin/../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/../../.. -L/usr/local/bin/../lib -L/lib -L/usr/lib -plugin /usr/local/bin/../lib/LLVMgold.so -plugin-opt=mcpu=x86-64 -plugin-opt=O2 main.lib a.lib -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/7.0.0/crtend.o /usr/lib/x86_64-linux-gnu/crtn.oSteven ShiIntel\SSG\STO\UEFI FirmwareTel: +86 021-61166522iNet: 821-6522
As mehdi stated , the optimisation level is specific to linker and it
enables Inter-Pro opts passes ,please refer function
PassManagerBuilder::addLTOOptimizationPasses() at
http://llvm.org/docs/doxygen/html/PassManagerBuilder_8cpp_source.html
internal options to disable to them ,i don't think ,you can do so.
Thank you
~Umesh
> _______________________________________________
> cfe-dev mailing list
> cfe...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
To be very clear: the -O option may trigger *linker* optimizations as well, independently of LTO.
--
Mehdi
Hi Mehdi,
After deeper debug, I found my firmware LTO wrong code issue is related to X64 code model (-mcmodel=large) is always overridden as small (-mcmodel=small) if LTO build. And I don't know how to correctly specific the large code model for my X64 firmware LTO build. Appreciate if you could let me know it.
You know, parts of my Uefi firmware (BIOS) have to been loaded to run in high address (larger than 2 GB) at the very beginning, and I need the code makes absolutely no assumptions about the addresses and data sections. But current LLVM LTO seems stick to use the small code model and generate many code with 32-bit RIP-relative addressing, which cause CPU exceptions when run in address larger than 2GB.
Below, I just simply reuse the Eli's codemodel1.c example (link: http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models) to show the LLVM LTO code model issue.
$ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin
$ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin
$ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin
$ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin
You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are exactly the same!
And if you disassemble the codemodel1_large_lto.bin, you will see it uses the small code model (32-bit RIP-relative), not large, to do addressing as below.
$ objdump -dS codemodel1_large_lto.bin
int main(int argc, const char* argv[])
{
4004f0: 55 push %rbp
4004f1: 48 89 e5 mov %rsp,%rbp
4004f4: 48 83 ec 20 sub $0x20,%rsp
4004f8: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
4004ff: 89 7d f8 mov %edi,-0x8(%rbp)
400502: 48 89 75 f0 mov %rsi,-0x10(%rbp)
int t = global_func(argc);
400506: 8b 7d f8 mov -0x8(%rbp),%edi
400509: e8 d2 ff ff ff callq 4004e0 <global_func>
40050e: 89 45 ec mov %eax,-0x14(%rbp)
t += global_arr[7];
400511: 8b 04 25 4c 10 60 00 mov 0x60104c,%eax
400518: 03 45 ec add -0x14(%rbp),%eax
40051b: 89 45 ec mov %eax,-0x14(%rbp)
t += static_arr[7];
40051e: 8b 04 25 dc 11 60 00 mov 0x6011dc,%eax
400525: 03 45 ec add -0x14(%rbp),%eax
400528: 89 45 ec mov %eax,-0x14(%rbp)
t += global_arr_big[7];
40052b: 8b 04 25 6c 13 60 00 mov 0x60136c,%eax
400532: 03 45 ec add -0x14(%rbp),%eax
400535: 89 45 ec mov %eax,-0x14(%rbp)
t += static_arr_big[7];
400538: 8b 04 25 ac 20 63 00 mov 0x6320ac,%eax
40053f: 03 45 ec add -0x14(%rbp),%eax
400542: 89 45 ec mov %eax,-0x14(%rbp)
return t;
400545: 8b 45 ec mov -0x14(%rbp),%eax
400548: 48 83 c4 20 add $0x20,%rsp
40054c: 5d pop %rbp
40054d: c3 retq
40054e: 66 90 xchg %ax,%ax
So, does LTO support large code model? How to correctly specify the LTO code model option?
> From: mehdi...@apple.com [mailto:mehdi...@apple.com]
> Sent: Wednesday, May 18, 2016 4:02 AM
> To: Umesh Kalappa <umesh.k...@gmail.com>
> Cc: Shi, Steven <steve...@intel.com>; llvm-dev <llvm...@lists.llvm.org>;
> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
Neither lld does (yet), to the best of my knowledge.
Cheers,
--
Davide
Hi Mehdi,
GCC LTO seems support large code model in my side as below, if the code model is linker specific, does the GCC LTO use a special linker which is different from the one in GNU Binutils?
I’m a bit surprised if both OS X ld64 and gold plugin do not support large code model in LTO. Since modern system widely use the 64bit, the code need to run in high address (larger than 2 GB) is a reasonable requirement.
$ gcc -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto_gcc.bin
$ objdump -dS codemodel1_large_lto_gcc.bin
int main(int argc, const char* argv[])
{
40048b: 55 push %rbp
40048c: 48 89 e5 mov %rsp,%rbp
40048f: 48 83 ec 20 sub $0x20,%rsp
400493: 89 7d ec mov %edi,-0x14(%rbp)
400496: 48 89 75 e0 mov %rsi,-0x20(%rbp)
int t = global_func(argc);
40049a: 8b 45 ec mov -0x14(%rbp),%eax
40049d: 89 c7 mov %eax,%edi
40049f: 48 b8 76 04 40 00 00 movabs $0x400476,%rax
4004a6: 00 00 00
4004a9: ff d0 callq *%rax
4004ab: 89 45 fc mov %eax,-0x4(%rbp)
t += global_arr[7];
4004ae: 48 b8 20 09 60 00 00 movabs $0x600920,%rax
4004b5: 00 00 00
4004b8: 8b 40 1c mov 0x1c(%rax),%eax
4004bb: 01 45 fc add %eax,-0x4(%rbp)
t += static_arr[7];
4004be: 48 b8 c0 0a 60 00 00 movabs $0x600ac0,%rax
4004c5: 00 00 00
4004c8: 8b 40 1c mov 0x1c(%rax),%eax
4004cb: 01 45 fc add %eax,-0x4(%rbp)
t += global_arr_big[7];
4004ce: 48 b8 60 0c 60 00 00 movabs $0x600c60,%rax
4004d5: 00 00 00
4004d8: 8b 40 1c mov 0x1c(%rax),%eax
4004db: 01 45 fc add %eax,-0x4(%rbp)
t += static_arr_big[7];
4004de: 48 b8 a0 19 63 00 00 movabs $0x6319a0,%rax
4004e5: 00 00 00
4004e8: 8b 40 1c mov 0x1c(%rax),%eax
4004eb: 01 45 fc add %eax,-0x4(%rbp)
return t;
4004ee: 8b 45 fc mov -0x4(%rbp),%eax
From: mehdi...@apple.com [mailto:mehdi...@apple.com]
Sent: Monday, May 30, 2016 4:28 AM
To: Shi, Steven <steve...@intel.com>
On May 29, 2016, at 5:10 PM, Shi, Steven <steve...@intel.com> wrote:
Hi Mehdi,
GCC LTO seems support large code model in my side as below, if the code model is linker specific, does the GCC LTO use a special linker which is different from the one in GNU Binutils?
I’m a bit surprised if both OS X ld64 and gold plugin do not support large code model in LTO. Since modern system widely use the 64bit, the code need to run in high address (larger than 2 GB) is a reasonable requirement.
Actually, given that PIC is (almost) free in terms of codegen, there is
rarely a need for the large model on AMD64. Programs with more than 2GB
of static data are moderately rare and programs with more than 2GB of
text even more so.
Joerg
(And I doubt the GNU linker supports LTO with LLVM).
[Steven]: I’ve pushed GNU Binutils ld to support LLVM gold plugin, see detail in this bug https://sourceware.org/bugzilla/show_bug.cgi?id=20070. The new GNU ld linker works well with LLVM/Clang LTO when build IA32 code in my side. And from the ld owner input in the bug comments, the current X64 LLVM LTO issue is in llvm LTO plugin.
The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.
What is the linker you're using? Are you building your own clang?
[Steven]: I’m using the standard LLVM 3.8 with the above GNU new ld linker. I can build my own clang in my side if needed. I’m happy to know it is not difficult to enable the large code model in LLVM LTO and “it is really a trivial option to add”. Could you let me know how to enable it? My lots of work have been blocked by the large code model issue. Thank you!
On May 29, 2016, at 5:44 PM, Shi, Steven <steve...@intel.com> wrote:(And I doubt the GNU linker supports LTO with LLVM).[Steven]: I’ve pushed GNU Binutils ld to support LLVM gold plugin, see detail in this bug https://sourceware.org/bugzilla/show_bug.cgi?id=20070. The new GNU ld linker works well with LLVM/Clang LTO when build IA32 code in my side. And from the ld owner input in the bug comments, the current X64 LLVM LTO issue is in llvm LTO plugin.The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.What is the linker you're using? Are you building your own clang?[Steven]: I’m using the standard LLVM 3.8 with the above GNU new ld linker. I can build my own clang in my side if needed. I’m happy to know it is not difficult to enable the large code model in LLVM LTO and “it is really a trivial option to add”. Could you let me know how to enable it? My lots of work have been blocked by the large code model issue. Thank you!
Hi Mehdi,
Should I apply your attached patch on my llvm3.8 source firstly? Or should I use the latest llvm SVN trunk instead?
From: mehdi...@apple.com [mailto:mehdi...@apple.com]
Sent: Monday, May 30, 2016 2:13 PM
To: Shi, Steven <steve...@intel.com>
Cc: Umesh Kalappa <umesh.k...@gmail.com>; eli...@gmail.com; llvm-dev <llvm...@lists.llvm.org>; cfe...@lists.llvm.org; Rafael Espíndola <rafael.e...@gmail.com>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
On May 29, 2016, at 5:44 PM, Shi, Steven <steve...@intel.com> wrote:
You need to use your linker-specific way of passing the option "-lto-use-large-codemodel=..." to the plugin.
Let me know if it works for you!
--
Mehdi
Sent: Monday, May 30, 2016 8:17 AM
To: Shi, Steven <steve...@intel.com>
Cc: Umesh Kalappa <umesh.k...@gmail.com>; eli...@gmail.com; llvm-dev <llvm...@lists.llvm.org>; cfe...@lists.llvm.org; Rafael Espíndola <rafael.e...@gmail.com>
Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
On May 29, 2016, at 5:10 PM, Shi, Steven <steve...@intel.com> wrote:
Hi Mehdi,
GCC LTO seems support large code model in my side as below, if the code model is linker specific, does the GCC LTO use a special linker which is different from the one in GNU Binutils?
I don't know anything about GCC.
(And I doubt the GNU linker supports LTO with LLVM).
I’m a bit surprised if both OS X ld64 and gold plugin do not support large code model in LTO. Since modern system widely use the 64bit, the code need to run in high address (larger than 2 GB) is a reasonable requirement.
The fact that we don't support it for now seems to indicate that it is not a widely requested feature, especially considering that it is really a trivial option to add.
What is the linker you're using? Are you building your own clang?
--
On May 29, 2016, at 11:28 PM, Shi, Steven <steve...@intel.com> wrote:
Hi Mehdi,
Should I apply your attached patch on my llvm3.8 source firstly? Or should I use the latest llvm SVN trunk instead?
Hi Mehdi,
The llvm3.8 gold-plugin.cpp is very different from the latest one on trunk. Your patch has compiling failure on llvm3.8 as below. I will try it on latest trunk later. Thank you help anyway!
Building CXX object tools/gold/CMakeFiles/LLVMgold.dir/gold-plugin.cpp.o
cd /home/jshi19/llvm38releasebuild/tools/gold && /home/jshi19/clang38/bin/clang++ -DGTEST_HAS_RTTI=0 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/jshi19/llvm38releasebuild/tools/gold -I/home/jshi19/llvm-3.8.0.src/tools/gold -I/home/jshi19/llvm38releasebuild/include -I/home/jshi19/llvm-3.8.0.src/include -I/home/jshi19/binutils-2.26/include -fPIC -fvisibility-inlines-hidden -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -std=c++11 -ffunction-sections -fdata-sections -O3 -DNDEBUG -fPIC -fno-exceptions -fno-rtti -o CMakeFiles/LLVMgold.dir/gold-plugin.cpp.o -c /home/jshi19/llvm-3.8.0.src/tools/gold/gold-plugin.cpp
/home/jshi19/llvm-3.8.0.src/tools/gold/gold-plugin.cpp:60:16: error: unknown type name 'string'; did you mean 'std::string'?
static cl::opt<string> LTOCodeModel("lto-use-large-codemodel", cl::Hidden,
^~~~~~
std::string
/usr/lib/gcc/x86_64-linux-gnu/5.3.1/../../../../include/c++/5.3.1/bits/stringfwd.h:74:33: note: 'std::string' declared here
typedef basic_string<char> string;
^
/home/jshi19/llvm-3.8.0.src/tools/gold/gold-plugin.cpp:800:9: error: no template named 'StringSwitch' in namespace 'llvm'; did you mean 'StringSet'?
llvm::StringSwitch<unsigned>(LTOCodeModel)
~~~~~~^~~~~~~~~~~~
StringSet
/home/jshi19/llvm-3.8.0.src/include/llvm/ADT/StringSet.h:23:9: note: 'StringSet' declared here
class StringSet : public llvm::StringMap<char, AllocatorTy> {
^
In file included from /home/jshi19/llvm-3.8.0.src/tools/gold/gold-plugin.cpp:16:
In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/ADT/DenseSet.h:17:
In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/ADT/DenseMap.h:17:
In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/ADT/DenseMapInfo.h:17:
In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/ADT/ArrayRef.h:13:
In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/ADT/Hashing.h:49:
In file included from /home/jshi19/llvm-3.8.0.src/include/llvm/Support/Host.h:17:
/home/jshi19/llvm-3.8.0.src/include/llvm/ADT/StringMap.h:228:12: error: multiple overloads of 'StringMap' instantiate to the same signature 'void (unsigned int)'
explicit StringMap(AllocatorTy A)
^
/home/jshi19/llvm-3.8.0.src/include/llvm/ADT/StringSet.h:23:28: note: in instantiation of template class 'llvm::StringMap<char, unsigned int>' requested here
class StringSet : public llvm::StringMap<char, AllocatorTy> {
^
/home/jshi19/llvm-3.8.0.src/tools/gold/gold-plugin.cpp:800:3: note: in instantiation of template class 'llvm::StringSet<unsigned int>' requested here
llvm::StringSwitch<unsigned>(LTOCodeModel)
^
/home/jshi19/llvm-3.8.0.src/include/llvm/ADT/StringMap.h:225:12: note: previous declaration is here
explicit StringMap(unsigned InitialSize)
^
3 errors generated.
tools/gold/CMakeFiles/LLVMgold.dir/build.make:65: recipe for target 'tools/gold/CMakeFiles/LLVMgold.dir/gold-plugin.cpp.o' failed
make[3]: *** [tools/gold/CMakeFiles/LLVMgold.dir/gold-plugin.cpp.o] Error 1
make[3]: Leaving directory '/home/jshi19/llvm38releasebuild'
CMakeFiles/Makefile2:17855: recipe for target 'tools/gold/CMakeFiles/LLVMgold.dir/all' failed
make[2]: *** [tools/gold/CMakeFiles/LLVMgold.dir/all] Error 2
make[2]: Leaving directory '/home/jshi19/llvm38releasebuild'
CMakeFiles/Makefile2:17867: recipe for target 'tools/gold/CMakeFiles/LLVMgold.dir/rule' failed
make[1]: *** [tools/gold/CMakeFiles/LLVMgold.dir/rule] Error 2
make[1]: Leaving directory '/home/jshi19/llvm38releasebuild'
Makefile:3944: recipe for target 'LLVMgold' failed
make: *** [LLVMgold] Error 2
It sounds more like there a confusion about what symbols are internal
and what not. The normal code model on AMD64 requires code and data to
fit into 2GB, but non-local symbols are accessed indirectly. That
doesn't happen in your case. For functions, that's normally partially
the job of the linker (via stubs), but for data the compiler has to be
aware of it. The load address is irrelevant for PIC.
Hi Mehdi,
Your patch cannot compile with gold-plugin.cpp in latest trunk either, the failure is same as llvm3.8 as below.
Cheers,
Rafael
> You need to use your linker-specific way of passing the option
> "-lto-use-large-codemodel=..." to the plugin.
>
> Let me know if it works for you!
>
> --
> Mehdi
>
>
>
>
> Steven Shi
> Intel\SSG\STO\UEFI Firmware
>
> Tel: +86 021-61166522
> iNet: 821-6522
>
> From: mehdi...@apple.com [mailto:mehdi...@apple.com]
> Sent: Monday, May 30, 2016 8:17 AM
> To: Shi, Steven <steve...@intel.com>
> Cc: Umesh Kalappa <umesh.k...@gmail.com>; eli...@gmail.com; llvm-dev
> <llvm...@lists.llvm.org>; cfe...@lists.llvm.org; Rafael Espíndola
> <rafael.e...@gmail.com>
> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
>
>
>
> On May 29, 2016, at 5:10 PM, Shi, Steven <steve...@intel.com> wrote:
>
> Hi Mehdi,
> GCC LTO seems support large code model in my side as below, if the code
> model is linker specific, does the GCC LTO use a special linker which is
> different from the one in GNU Binutils?
>
>
> I don't know anything about GCC.
> (And I doubt the GNU linker supports LTO with LLVM).
>
>
> I’m a bit surprised if both OS X ld64 and gold plugin do not support large
> code model in LTO. Since modern system widely use the 64bit, the code need
> to run in high address (larger than 2 GB) is a reasonable requirement.
>
>
> The fact that we don't support it for now seems to indicate that it is not a
> widely requested feature, especially considering that it is really a trivial
> option to add.
> What is the linker you're using? Are you building your own clang?
>
We don't use cl::opt in gold, instead we parse the -plugin-opts that
gold passes the plugin (see process_plugin_option).
$ grep ParseCommandLineOptions tools/gold/gold-plugin.cpp
// ParseCommandLineOptions() expects argv[0] to be program name. Lazily
cl::ParseCommandLineOptions(NumOpts, &options::extra[0]);
That is for the options that the gold plugin itself doesn't understand
and just passes to llvm. This allows you to do things like
--plugin-opt=-debug-pass=Arguments.
Cheers,
Rafael
Sent from my iPhone
> On May 30, 2016, at 4:52 PM, Rafael Espíndola <rafael.e...@gmail.com> wrote:
>
>> On 30 May 2016 at 16:56, Mehdi Amini <mehdi...@apple.com> wrote:
>>
>>
>> On 05/30/16 01:34 PM, Rafael Espíndola <rafael.e...@gmail.com> wrote:
>>
>> We don't use cl::opt in gold, instead we parse the -plugin-opts that
>> gold passes the plugin (see process_plugin_option).
>>
>> What about that:
>>
>> $ grep ParseCommandLineOptions tools/gold/gold-plugin.cpp
>>
>> // ParseCommandLineOptions() expects argv[0] to be program name.
>> Lazily
>>
>> cl::ParseCommandLineOptions(NumOpts, &options::extra[0]);
>
>
> That is for the options that the gold plugin itself doesn't understand
> and just passes to llvm. This allows you to do things like
> --plugin-opt=-debug-pass=Arguments.
This is what I expected, so my cl:opt should work, right? I don't really get your original point?
Mehdi
Just that the gold plugin itself never defines a cl::opt. It just
forwards the llvm ones.
Cheers,
Rafael
Small, but PIC.
> For example if you read a global like this the compiler will generate this code.
> int constant = 0;
>
> int get_constant(void)
> {
> return constant;
> }
Compiling for ELF with -FPIE -Os I get
get_constant: # @get_constant
# BB#0: # %entry
movl constant(%rip), %eax
retq
Which should also be able to run at any address.
Cheers,
Rafael
Does it reproduce with clang trunk?