Description of the project: Adding Debug Info (compiling with `clang -g`) shouldn't change the generated code at all. Unfortunately we have bugs. These are usually not too hard to fix and a good way to discover new part of the codebase! We suggest building object files both ways and disassembling the text sections, which will give cleaner diffs than comparing .s files.
Expected results: Reduced test cases, bug reports with analysis (e.g., which pass is responsible), possibly patches.
Confirmed Mentor: Paul Robinson
Desirable skills: Intermediate knowledge of C++, some familiarity with x86 or ARM instruction set.
"Debug info should have no effect on codegen" would be a fine project for you; nobody is working on it that I know of. Another way to contribute would be to go to our Bugzilla (bugs.llvm.org) and search for open bugs with the "beginner" keyword.
Regarding the "debug info has no effect on codegen" project, unfortunately I am having IT issues that keep me from providing much in the way of specific suggestions, so what follows is fairly generic.
In principle, you compile some piece of code with and without –g, and see if there is any difference in the generated instructions. My experience is that you want to compile to a .o file, and then use a disassembler to dump the text sections. This will give you a cleaner diff than using –S to generate assembler files.
I also recommend compiling with `-ffunction-sections` and probably `-fexceptions`. The former will put each compiled function into its own object-file section, so that differences in one function won't affect the disassembly of a later function. The latter option should work around one fairly intractable known difference: -g will cause the compiler to emit directives to produce call-frame information, and these tend to act as instruction-scheduling barriers. Using –fexceptions (I am 95% sure that is the correct option) should cause the non-dash-g compilation to use the same directives, and avoid that known difference.
You can repeat this experiment with different optimization levels, as differences are far more likely to show up with optimization.
Once you find a difference, you can begin experimenting with ways to identify specific compiler passes that are contributing to the difference. A very useful tool here is the backend option `-opt-bisect-limit=N` where N is the number of passes to execute. Because it is a backend option, you would use it this way:
clang –c –O2 –mllvm –opt-bisect-limit=100 foo.c –o foo.o
clang –c –O2 –mllvm –opt-bisect-limit=100 foo.c –g –o foo-g.o
Then disassemble and diff as usual. After you have identified a problematic pass, you can try your hand at fixing it yourself, or you can file a bug (with a reduced reproducer if at all possible) and move on to another sample.
Of course you will need some sample source code to run experiments on. This can be anything convenient. You could try it on any personal projects you have, or you could find a random code generator, or whatever you like. Some people have recommended LLVM's own 'test-suite' project although I have not looked at it in any detail.
Good luck, and feel free to post additional questions on llvm-dev if you run into any problems.
LLVM Developers mailing list
Your script looks OK, though you won't want to use the -opt-bisect-limit= option until you've found a case where code-generation changes. Instead, that's a tool which you could use to narrow down the pass inside LLVM which is causing the change.The problem is that your input code is far too simple to trigger any interesting optimisations. I'd suggest starting with either some code from the LLVM test suite (https://github.com/llvm/llvm-test-suite), or some code generated by csmith (https://embed.cs.utah.edu/csmith/). The former has the advantage of being (mostly) real code people actually write, and the latter can generate a large amount of complex code without any external dependencies (so it's easy to build).
Built LLVM 8.0.1 for debug using -DCMAKE_EXPORT_COMPILE_COMMANDS=ON.
Put together a sequence using clang/utils/check_cfc on the compile list using the same compile parameters except for the -g and -o options that check_cfc provides.
2506 files were successfully processed by check_cfc out of a total of 2833.
Three was the maximum number of differences
obtained with the possibly interesting types here.
4 of this type.
- push + sh %r
- mov + v %r
- sub + b $0
3 of this type.
- pq 66 + v -0
- v -0 + v %r
- v %r + llq 6d
1 each of these types.
- pq ef + pq ea
- pq 68 + a -0
- a -0 + llq 6c
- pq 3a + vb $0
- vb $0 + a -0
- a -0 + v %r
- jmpq + mov
- mov + and
- and + mov
Regards, Neil Nelson
Nice work! Are you planning to track down where these differences come from, or do you plan to file bugs for them?