[llvm-dev] Relationship between clang, opt and llc

Peizhao Ou via llvm-dev

unread,

Apr 10, 2017, 7:57:54 PM4/10/17

to llvm-dev

Hi folks,

I am wondering about the relationship clang, opt and llc. I understand that this has been asked, e.g., http://stackoverflow.com/questions/40350990/relationship-between-clang-opt-llc-and-llvm-linker. Sorry for posting a similar question again, but I still have something that hasn't been resolved yet.

More specifically I am wondering about the following two approaches compiling optimized executable:

1. clang -O3 -c source.c -o source.o

...

clang a.o b.o c.o ... -o executable

2. clang -O0 -c -emit-llvm -o source.bc

opt -O3 source.bc -o source.bc

llc -O3 -filetype=obj source.bc -o source.o

...

clang a.o b.o c.o ... -o executable

I took a look at the source code of the clang tool and the opt tool, they both seem to use the PassManagerBuilder::populateModulePassManager() and PassManagerBuilder::populateFunctionPassManager() functions to add passes to their optimization pipeline; and for the backend, the clang and llc both use the addPassesToEmitFile() function to generate object code.

So presumably the above two approaches to generating optimized executable file should do the same thing. However, I am seeing that the second approach is around 2% slower than the first approach (which is the way developers usually use) pretty consistently.

Can anyone point me to the reasons why this happens? Or even correct my wrong understanding of the relationship between these two approaches?

PS: I used the -debug-pass=Structure option to print out the passes, they seem the same except that the first approach has an extra pass called "-add-discriminator", but I don't think that's the reason.

Peizhao

Craig Topper via llvm-dev

unread,

Apr 10, 2017, 8:21:14 PM4/10/17

to Peizhao Ou, llvm-dev

clang -O0 does not disable all optimization passes modify the IR.; In fact it causes most functions to get tagged with noinline to prevent inlinining

What you really need to do is

clang -O3 -c emit-llvm -o source.bc -v

Find the -cc1 command line from that output. Execute that command with --disable-llvm-passes. leave the -O3 and everything else.

You should be able to feed the output from that command to opt/llc and get consistent results.

~Craig

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mehdi Amini via llvm-dev

unread,

Apr 11, 2017, 1:12:43 AM4/11/17

to Craig Topper, llvm-dev, Peizhao Ou

On Apr 10, 2017, at 5:21 PM, Craig Topper via llvm-dev <llvm...@lists.llvm.org> wrote:

clang -O0 does not disable all optimization passes modify the IR.; In fact it causes most functions to get tagged with noinline to prevent inlinining

It also disable lifetime instrinsics emission and TBAA, etc.

What you really need to do is

clang -O3 -c emit-llvm -o source.bc -v

Find the -cc1 command line from that output. Execute that command with --disable-llvm-passes. leave the -O3 and everything else.

That’s a bit complicated: CC1 options can be passed through with -Xclang, for example here just adding to the regular clang invocation ` -Xclang -disable-llvm-passes`

Best,

—

Mehdi

Peizhao Ou via llvm-dev

unread,

Apr 11, 2017, 12:12:31 PM4/11/17

to Craig Topper, llvm-dev

Thanks, Craig! That totally corrects my wrong understanding about the front end. I tried it and indeed now I can see almost the same results.

Best,

Peizhao

Peizhao Ou via llvm-dev

unread,

Apr 11, 2017, 12:15:11 PM4/11/17

to Mehdi Amini, llvm-dev

It's really nice of you pointing out the -Xclang option, it makes things much easier. I really appreciate your help!

Best,

Peizhao

toddy wang via llvm-dev

unread,

Jan 5, 2018, 4:19:26 PM1/5/18

to Peizhao Ou, llvm-dev

I tried the following on LULESH1.0 serial version (https://codesign.llnl.gov/lulesh/LULESH.cc)

1. clang++ -O3 LULESH.cc; ./a.out 20

Runtime: 9.487353 second

2. clang++ -O0 -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20

Runtime: 24.15 seconds

3. clang++ -O3 -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20

Runtime: 9.53 seconds

1 and 3 have almost the same performance, while 2 is significantly worse, while I expect 1, 2 ,3 should have trivial difference.

Is this a wrong expectation?

@Peizhao, what did you try in your last post?

Michael Kruse via llvm-dev

unread,

Jan 5, 2018, 4:43:42 PM1/5/18

to toddy wang, llvm-dev

2018-01-05 22:19 GMT+01:00 toddy wang via llvm-dev <llvm...@lists.llvm.org>:
> 2. clang++ -O0 -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc;
> opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o
> b.out; ./b.out 20
> Runtime: 24.15 seconds

clang -O0 adds a "optnone" attribute to each function that causes most
optimization passes to skip that function. Avoid with "-Xclang
-disable-O0-optnone".

Michael

Craig Topper via llvm-dev

unread,

Jan 5, 2018, 4:45:41 PM1/5/18

to toddy wang, llvm-dev

If you pass -O0 to clang, most functions will be tagged with an optnone function attribute that will prevent opt and llc even if you pass -O3 to opt and llc. This is the mostly likely cause for the slow down in 2.

You can disable the optnone function attribute behavior by passing "-Xclang -disable-O0-optnone" to clang

~Craig

toddy wang via llvm-dev

unread,

Jan 5, 2018, 7:49:49 PM1/5/18

to Craig Topper, llvm-dev, Peizhao Ou

@Zhaopei, thanks for the clarification.

@Craig and @Michael, for clang 4.0.1, -Xclang -disable-O0-optnone gives the following error message. From which version -disable-O0-optnone gets supported?

[twang15@c89 temp]$ clang++ -O0 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc

error: unknown argument: '-disable-O0-optnone'

[twang15@c89 temp]$ clang++ --version

clang version 4.0.1 (tags/RELEASE_401/final)

Target: x86_64-unknown-linux-gnu

Craig Topper via llvm-dev

unread,

Jan 5, 2018, 7:55:17 PM1/5/18

to toddy wang, llvm-dev, Peizhao Ou

O0 didn't start applying optnone until r304127 in May 2017 which is after the 4.0 family was branched. So only 5.0, 6.0, and trunk have that behavior. Commit message copied below

Author: Mehdi Amini <joke...@gmail.com>

Date: Mon May 29 05:38:20 2017 +0000

IRGen: Add optnone attribute on function during O0

Amongst other, this will help LTO to correctly handle/honor files

compiled with O0, helping debugging failures.

It also seems in line with how we handle other options, like how

-fnoinline adds the appropriate attribute as well.

Differential Revision: https://reviews.llvm.org/D28404

~Craig

toddy wang via llvm-dev

unread,

Jan 5, 2018, 8:42:03 PM1/5/18

to Craig Topper, llvm-dev, Peizhao Ou

Craig, thanks a lot!

I'm actually confused by clang optimization flags.

If I run clang -help, it will show many optimizations (denoted as set A) and non-optimization options (denoted as set B).

If I run llvm-as < /dev/null | opt -O0/1/2/3 -disable-output -debug-pass=Arguments, it also shows many optimization flags (denote as set C).

There are many options in set C while not in set A, and also options in set A but not in set C.

The general question is: what is the relationship between set A and set C, at the same optimization level O0/O1/O2/O3?

Another question is: how to specify an option in set C as a clang command line option, if it is not in A?

For example, -dse is in set C but not in set A, how can I specify it as a clang option? Or simply I cannot do that.

Craig Topper via llvm-dev

unread,

Jan 5, 2018, 10:00:20 PM1/5/18

to toddy wang, llvm-dev, Peizhao Ou

I don't think "clang -help" prints options about optimizations. Clang itself doesn't have direct support for fine grained optimization control. Just the flag for levels -O0/-O1/-O2/-O3. This is intended to be simple and sufficient interface for most users who just want to compile their code. So I don't think there's a way to pass just -dse to clang.

opt on the other hand is more of a utility for developers of llvm that provides fine grained control of optimizations for testing purposes.

~Craig

toddy wang via llvm-dev

unread,

Jan 5, 2018, 11:11:57 PM1/5/18

to Craig Topper, llvm-dev, Peizhao Ou

After build LLVM5.0, I found that clang-5.0 is extremely slow.

Even it is built with -DCMAKE_BUILD_TYPE=Release

For building LULESH.cc, it gets stucked at linkage stage.

I build it as instructed from here https://github.com/flang-compiler/flang

Maybe I should submit a bug.

toddy wang via llvm-dev

unread,

Jan 5, 2018, 11:13:20 PM1/5/18

to Craig Topper, llvm-dev, Peizhao Ou

[twang15@c92 temp]$ time clang++ -v -O3 LULESH.cc

clang version 5.0.1 (https://github.com/flang-compiler/clang.git 64043d5cec9fb02d1b0fd80c9f2c4e9e4f09cf8f) (https://github.com/llvm-mirror/llvm.git 1368f4044e62cad4316da638d919a93fd3ac3fe6)

Target: x86_64-unknown-linux-gnu

...

..

real 2m21.979s

user 2m21.842s

sys 0m0.081s

toddy wang via llvm-dev

unread,

Jan 6, 2018, 2:30:51 AM1/6/18

to Craig Topper, llvm-dev, Peizhao Ou

What I am trying is to compile a program with different sets of optimization flags.

If there is no fine-grained control over clang optimization flags, it would be impossible to achieve what I intend.

Although there is fine-grained control via opt, for a large-scale projects, clang-opt-llc pipeline may not be a drop-in solution.

toddy wang via llvm-dev

unread,

Jan 6, 2018, 3:25:28 PM1/6/18

to Craig Topper, llvm-dev, Peizhao Ou

@Craig and @Michael

After installing clang-5.0 (download from http://releases.llvm.org, does not have Flang build's slowdown mention above),

1. clang++ -O0 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20

runtime: 2.354069e+01

2. clang++ -O1 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20

runtime: 9.046271e+00

3. clang++ -O3 LULESH.cc

runtime: 9.118835e+00

4. clang++ -O2 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20

runtime: 9.091278e+00

5. clang++ -O3 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20

runtime: 9.096919e+00

Apparently, clang++ -O0 -Xclang -disable-O0-optnone does not work as expected.

The conclusion seems to be -Xclang -disable-O0-optnone works when clang optimization level is O1/O2/O3, not O0.

Any comments?

Craig Topper via llvm-dev

unread,

Jan 6, 2018, 3:43:22 PM1/6/18

to toddy wang, llvm-dev, Peizhao Ou

-disable-O0-optnone has no effect with anything other than -O0.

-O0 being passed to clang also causes all functions to be marked noinline. I don't know if there is a command line option to turn that off.

I recommend passing "-O1 -Xclang -disable-llvm-passes" to clang. Passing -O0 very specifically means disable optimizations.

~Craig

toddy wang via llvm-dev

unread,

Jan 6, 2018, 4:05:04 PM1/6/18

to Craig Topper, llvm-dev, Peizhao Ou

Thanks a lot, it is clear to me now.

BTW, for Clang's slowdown, I submit an issue here: https://github.com/flang-compiler/flang/issues/356

I have no idea about the root cause.

Maybe due to debug symbols. But, I already use -DCMAKE_BUILD_TYPE=Release.

Anyway, I believe there is a bug somewhere.

Craig Topper via llvm-dev

unread,

Jan 6, 2018, 4:19:31 PM1/6/18

to toddy wang, llvm-dev, Peizhao Ou

Why are you using build directions from "flang" which is a fortran compiler and maintained by different people than the LLVM/clang community? But then compiling C/C++ code? Their bug database should be used for filing bugs against the fortran compiler not a C/C++ compiler issue.

~Craig

toddy wang via llvm-dev

unread,

Jan 6, 2018, 4:33:03 PM1/6/18

to Craig Topper, llvm-dev, Peizhao Ou

I have a code written in Fortran but with C/C++ kernels.

So I have to use both Flang and Clang to compile, maybe I should use LLVM5.0 release for c/c++ and only their Flang for fortran code.

It should work but I have not tried yet.

Sean Silva via llvm-dev

unread,

Jan 6, 2018, 7:37:46 PM1/6/18

to toddy wang, llvm-dev, Peizhao Ou

On Jan 5, 2018 11:30 PM, "toddy wang via llvm-dev" <llvm...@lists.llvm.org> wrote:

What I am trying is to compile a program with different sets of optimization flags.
If there is no fine-grained control over clang optimization flags, it would be impossible to achieve what I intend.

LLD has -lto-newpm-passes (and the corresponding -lto-newpm-aa-pipeline) which allows you to pass a custom pass pipeline with full control. At one point I was using a similar modification to clang (see https://reviews.llvm.org/D21954) that never landed.

-- Sean Silva

toddy wang via llvm-dev

unread,

Jan 6, 2018, 10:35:58 PM1/6/18

to Sean Silva, llvm-dev, Peizhao Ou

@Sean, do you mean llc ?

For llc 4.0 and llc 5.0, I cannot find -lto-newpm-passes option, is it a hidden one?

Sean Silva via llvm-dev

unread,

Jan 7, 2018, 12:40:14 AM1/7/18

to toddy wang, llvm-dev, Peizhao Ou

No, I meant LLD, the LLVM linker. This option for LLD is relevant for exploring different pass pipelines for link time optimization.

It is essentially equivalent to the -passes flag for 'opt'.

Such a flag doesn't make much sense for 'llc' because llc mostly runs backend passes, which are much more difficult to construct custom pipelines for (backend passes are often required for correctness or have complex ordering requirements).

-- Sean Silva

Mehdi AMINI via llvm-dev

unread,

Jan 7, 2018, 10:11:59 AM1/7/18

to toddy wang, llvm-dev, Peizhao Ou

Hi,

"SetC" options are LLVM cl::opt options, they are intended for LLVM developer and experimentations. If a settings is intended to be used as a public API, there is usually a programmatic way of setting it in LLVM.

"SetA" is what clang as a C++ compiler exposes to the end-user. Internally clang will (most of the time) use one or multiple LLVM APIs to propagate a settings.

Best,

--

Mehdi

Peizhao Ou via llvm-dev

unread,

Jan 7, 2018, 10:19:50 AM1/7/18

to toddy wang, llvm-dev

@Toddy, I think I had some misunderstanding about the Clang command line options when I posted the question.

I think pipeline 1 and 3 are supposed to have only trivial difference, while pipeline 2 is supposed to be much slower than the other two because the "-O0" option in pipeline 2 can disable some of the important passes in opt (even if you use "-O3" with opt).

I tried to check the IRs generated by pipeline 2 and 3 and saw that they are not the same (e.g., pipeline 3 emits IR with more alias info that can be used in opt). And what I did was exactly pipeline 2 (mistakenly thinking it would be equivalent to pipeline 1). So from my understanding, if you want to use the clang-opt-llc pipeline, you may need to stick with pipeline 3, where the "-O3 -Xclang -disable-llvm-passes" options tell clang to generate unoptimized IR that can be later fully optimized as in "clang -O3" directly.

toddy wang via llvm-dev

unread,

Jan 7, 2018, 11:46:56 PM1/7/18

to Sean Silva, llvm-dev, Peizhao Ou

@Sean, here is my summary of several tools.

Format: (ID,tool, input->output, timing, customization, questions)

1. llc, 1 bc -> 1 obj, back-end compile-time (code generation and machine-dependent optimizations), Difficult to customize pipeline, N/A

2. LLD: all bc files and obj files -> 1 binary (passing -flto to clang for *.bc file generation), back-end link-time optimizations and code generations and machine-dependent optimizations, Easy to customize pipeline w/ -lto-newpm-passes, what is the connection between -lto-newpm-passes and -lto-newpm-aa-pipeline and how to use -lto-newpm-passes to customize pipeline?

3. gold: mixed obj files and bc files -> 1 binary (passing -flto to clang for *.bc file generation), back-end link-time optimization w/ LLVMgold.so and code generation and machine-dependent optimizations, unaware of whether it is customizable by means of command line options, can we consider LLD a more customizable gold from perspective of pipeline customization?

4. opt, 1 bc file -> 1 file at a time, middle-end machine-independent (may be others?), Easy to customize pipeline by means of command line options, N/A

5. llvm-link, many *bc file -> 1 bc file, link-time (unknown whether there is any optimization) and Unknown why it exists, unknown how to do customization, N/A

With above understandings, there are several ways to fine-grained tune clang/llvm optimization pipeline:

1. clang (c/c++ to bc translation, with minimal front-end optimizations, w/ -emit-llvm -O1 -Xclang -disable-llvm-passes), --> opt (w/ customizable middle-end optimizations for each bc file independently) --> gold (un-customizable back-end link-time optimization and code generation)

2. clang (c/c++ to bc translation, with minimal front-end optimizations, w/ -flto) -->opt ( same as 1) --> lld (w/ -lto-newpm-passes for link-time optimization pipeline customization, how?)

3. clang (c/c++ to *bc translation and optimization, customizable by mean of clang command-line options, maybe including both front-end optimization and middle-end optimizations). W/O explicitly specifying opt optimization pipeline, there may still be middle-end optimizations happening; also w/o explicitly specifying linker, it may use GNU ld / GNU gold / lld as the linker and with whichever's default link-time optimization pipeline.

So, it seems to me that 2 is the most customizable pipeline, with customizable middle-end and back-end pipeline independently, the 1 with only customizable middle-end optimization pipeline, and then 3 has the least amount of control of optimization pipeline by means of clang command-line.

Thanks for your time and welcome to any comments!

toddy wang via llvm-dev

unread,

Jan 8, 2018, 12:03:25 AM1/8/18

to Mehdi AMINI, llvm-dev, Peizhao Ou

Thanks a lot, Mehdi.

For GCC, there are around 190 optimization flags exposed as command-line options.

For Clang/LLVM, the number is 40, and many important optimization parameters are not exposed at all, such as loop unrolling factor, inline function size parameters.

I understand there is very different idea for whether or not expose many flags to end-user.

Personally, I believe it is a reasonable to keep end-user controllable command-line options minimal for user-friendliness.

However, for users who care a lot for a tiny bit performance improvement, like HPC community, it may be better to expose as many fine-grained tunables in the form of command line options as possible. Or, at least there should be a way to achieve this fairly easy.

I am curious about which way is the best for my purpose.

Please see my latest reply for 3 possible fine-grained optimization pipeline.

Looking forward to more discussions.

Thanks a lot!

Mehdi AMINI via llvm-dev

unread,

Jan 8, 2018, 12:56:59 AM1/8/18

to toddy wang, llvm-dev, Peizhao Ou

Hi Toddy,

You can achieve what you're looking for with a pipeline based on `clang -Ox` + `opt -Ox` + `llc -Ox` (or lld instead of llc), but this won't be guarantee'd to be well supported across releases of the compiler.

Otherwise, if there are some performance-releated (or not...) command line options you think clang is missing / would benefit, I invite you to propose adding them to cfe...@lists.llvm.org and submit a patch!

Best,

--

Mehdi

Sean Silva via llvm-dev

unread,

Jan 8, 2018, 1:12:57 AM1/8/18

to toddy wang, llvm-dev, Peizhao Ou

On Jan 7, 2018 8:46 PM, "toddy wang" <wenwan...@gmail.com> wrote:

@Sean, here is my summary of several tools.

Format: (ID,tool, input->output, timing, customization, questions)

1. llc, 1 bc -> 1 obj, back-end compile-time (code generation and machine-dependent optimizations), Difficult to customize pipeline, N/A
2. LLD: all bc files and obj files -> 1 binary (passing -flto to clang for *.bc file generation), back-end link-time optimizations and code generations and machine-dependent optimizations, Easy to customize pipeline w/ -lto-newpm-passes, what is the connection between -lto-newpm-passes and -lto-newpm-aa-pipeline and how to use -lto-newpm-passes to customize pipeline?

You just specify the list of passes to run, as you would to opt -passes

-lto-newpm-aa-pipeline has the same function as opt's -aa-pipeline option.

3. gold: mixed obj files and bc files -> 1 binary (passing -flto to clang for *.bc file generation), back-end link-time optimization w/ LLVMgold.so and code generation and machine-dependent optimizations, unaware of whether it is customizable by means of command line options, can we consider LLD a more customizable gold from perspective of pipeline customization?

Gold and LLD are very similar for this purpose, and LLD has some extra goodies like -lto-newpm-passes

4. opt, 1 bc file -> 1 file at a time, middle-end machine-independent (may be others?), Easy to customize pipeline by means of command line options, N/A
5. llvm-link, many *bc file -> 1 bc file, link-time (unknown whether there is any optimization) and Unknown why it exists, unknown how to do customization, N/A

llvm-link doesn't perform optimizations.

With above understandings, there are several ways to fine-grained tune clang/llvm optimization pipeline:
1. clang (c/c++ to bc translation, with minimal front-end optimizations, w/ -emit-llvm -O1 -Xclang -disable-llvm-passes), --> opt (w/ customizable middle-end optimizations for each bc file independently) --> gold (un-customizable back-end link-time optimization and code generation)
2. clang (c/c++ to bc translation, with minimal front-end optimizations, w/ -flto) -->opt ( same as 1) --> lld (w/ -lto-newpm-passes for link-time optimization pipeline customization, how?)
3. clang (c/c++ to *bc translation and optimization, customizable by mean of clang command-line options, maybe including both front-end optimization and middle-end optimizations). W/O explicitly specifying opt optimization pipeline, there may still be middle-end optimizations happening; also w/o explicitly specifying linker, it may use GNU ld / GNU gold / lld as the linker and with whichever's default link-time optimization pipeline.

So, it seems to me that 2 is the most customizable pipeline, with customizable middle-end and back-end pipeline independently,

The thing customized by -lto-newpm-passes is actually a middle-end pipeline run during link time optimization. The backend is not very customizable.

Also, note that with a clang patch like the one I linked, you don't need opt because you can directly tell clang what fine-grained set of passes to run in the middle end.

One approach I have used in the past is to compile with -flto -O0 -disable-O0-optnone and then do all optimizations at link time. This can simplify things because you only need to re-run the link command (it still takes a long time, but with sufficient ram (and compiling without debug info) you should be able to run multiple different pass pipelines in parallel). If your only goal is to test middle-end pass pipelines (e.g. synergies of different passes) then that can be a good approach. However, keep in mind that this is really just a small part of the larger design problem of getting the best code with the best compile time. In practice, profile feedback (and making good use of it), together with accurate cost modeling (especially in the middle end for optimizations like inlining and loop unrolling), together with appropriate link-time cross-module optimization tend to matter just as much (or more) than a particularly intelligently chosen sequence of passes.

Also, keep in mind that even though we in principle have a lot of flexibility with the sequence of passes in the middle-aged, in practice a lot of tuning and bug fixing has been done with the default O2/3 pipelines. If you deviate from them you may end up with pretty silly code. An example from my recent memory was that an inopportune running of GVN can cause a loop to have an unexpected set if induction variables, throwing off other optimizations.

-- Sean Silva

toddy wang via llvm-dev

unread,

Jan 8, 2018, 1:32:55 AM1/8/18

to Mehdi AMINI, llvm-dev, Peizhao Ou

Hi Mehdi,

Now we have 5 pipelines. (In addition to the first 3, which I have described in detail above, please refer my latest reply for details)

1. clang + opt + gold

2. clang + opt + lld

3. clang + GNU ld/ gold /lld

4. clang + opt + llc + clang

clang -emit-llvm -O1 -Xclang -disable-llvm-passes for c/c++ to .bc generation and minimal front-end optimization

opt for single bc file optimization

llc single bc file to obj file generation and back-end optimization (no link-time optimization is possible, since llc works on 1 bc file at a time)

clang again for linking all obj file to generate final executable. (although in principle there can be a link-time optimization even with all obj files, it requires a lot of work and is machine-dependent. This may also be the reason why modern compilers like LLVM/GCC/ICC, etc performs LTO not at obj level. But, obj level may yield extra benefit even LTO at intermediate level has been applied by compilers, because obj level can see more information.)

`clang -Ox` + `opt -Ox` + `llc -Ox` is too coarse-grain.

5. Modify clang to align with GCC/ICC so that many tunables are exposed at clang command line. Not sure how much work is needed, but at least requires an overall understanding of compiler internals, which can be gradually figured out.

I believe 5 is interesting, but 2 may be good enough. More experiments are needed before decision is made.

Sean Silva via llvm-dev

unread,

Jan 8, 2018, 2:02:57 AM1/8/18

to toddy wang, llvm-dev, Peizhao Ou

For the types of things that you are looking for, you may just want to try a bunch of -mllvm options. You can tune inlining and unrolling threshold like that, for example.

toddy wang via llvm-dev

unread,

Jan 8, 2018, 2:16:36 AM1/8/18

to Sean Silva, llvm-dev, Peizhao Ou

-mllvm <value> Additional arguments to forward to LLVM's option processing

This is dumped by clang. I am not sure what I am supposed to put as value in order to tune unrolling/inlining threshold.

toddy wang via llvm-dev

unread,

Jan 8, 2018, 3:00:56 AM1/8/18

to Sean Silva, llvm-dev, Peizhao Ou

Hi Sean,

Please check my inlined reply.

Looking forward to your comments.

Thanks for your time!

On Mon, Jan 8, 2018 at 1:12 AM, Sean Silva <chiso...@gmail.com> wrote:

On Jan 7, 2018 8:46 PM, "toddy wang" <wenwan...@gmail.com> wrote:
@Sean, here is my summary of several tools.

Format: (ID,tool, input->output, timing, customization, questions)

1. llc, 1 bc -> 1 obj, back-end compile-time (code generation and machine-dependent optimizations), Difficult to customize pipeline, N/A
2. LLD: all bc files and obj files -> 1 binary (passing -flto to clang for *.bc file generation), back-end link-time optimizations and code generations and machine-dependent optimizations, Easy to customize pipeline w/ -lto-newpm-passes, what is the connection between -lto-newpm-passes and -lto-newpm-aa-pipeline and how to use -lto-newpm-passes to customize pipeline?

You just specify the list of passes to run, as you would to opt -passes

-lto-newpm-aa-pipeline has the same function as opt's -aa-pipeline option.

I barely found this source code as document. https://github.com/Microsoft/llvm/blob/master/tools/llvm-lto2/llvm-lto2.cpp

It seems to me -aa-pipeline is just for alias analysis passes.

Could you explain what is opt -passes used for, and how to use it?

My impression is that if I want to run multiple passes with opt, e.g., dce, dse, what I need to do is

opt -dce -dse a.bc

So, what -passes is used for?

3. gold: mixed obj files and bc files -> 1 binary (passing -flto to clang for *.bc file generation), back-end link-time optimization w/ LLVMgold.so and code generation and machine-dependent optimizations, unaware of whether it is customizable by means of command line options, can we consider LLD a more customizable gold from perspective of pipeline customization?

Gold and LLD are very similar for this purpose, and LLD has some extra goodies like -lto-newpm-passes

Good to know this. LLD makes LLVM lto easier.

4. opt, 1 bc file -> 1 file at a time, middle-end machine-independent (may be others?), Easy to customize pipeline by means of command line options, N/A
5. llvm-link, many *bc file -> 1 bc file, link-time (unknown whether there is any optimization) and Unknown why it exists, unknown how to do customization, N/A

llvm-link doesn't perform optimizations.

Any usage scenario for llvm-link?

With above understandings, there are several ways to fine-grained tune clang/llvm optimization pipeline:
1. clang (c/c++ to bc translation, with minimal front-end optimizations, w/ -emit-llvm -O1 -Xclang -disable-llvm-passes), --> opt (w/ customizable middle-end optimizations for each bc file independently) --> gold (un-customizable back-end link-time optimization and code generation)
2. clang (c/c++ to bc translation, with minimal front-end optimizations, w/ -flto) -->opt ( same as 1) --> lld (w/ -lto-newpm-passes for link-time optimization pipeline customization, how?)
3. clang (c/c++ to *bc translation and optimization, customizable by mean of clang command-line options, maybe including both front-end optimization and middle-end optimizations). W/O explicitly specifying opt optimization pipeline, there may still be middle-end optimizations happening; also w/o explicitly specifying linker, it may use GNU ld / GNU gold / lld as the linker and with whichever's default link-time optimization pipeline.

So, it seems to me that 2 is the most customizable pipeline, with customizable middle-end and back-end pipeline independently,

The thing customized by -lto-newpm-passes is actually a middle-end pipeline run during link time optimization. The backend is not very customizable.

Also, note that with a clang patch like the one I linked, you don't need opt because you can directly tell clang what fine-grained set of passes to run in the middle end.

It seems opt's functionality and -lto-newpm-passes overlap with each other.

With -lto-newpm-passes, one does not need to specify how programs are linked, because middle-end optimizations (opt's optimizations), lto, and linkage can all be included by LLD.

So LLD = opt + LTO + ld, right?

One approach I have used in the past is to compile with -flto -O0 -disable-O0-optnone and then do all optimizations at link time. This can simplify things because you only need to re-run the link command (it still takes a long time, but with sufficient ram (and compiling without debug info) you should be able to run multiple different pass pipelines in parallel).

Should "-flto -O0 -disable-O0-optnone" be "-flto -O1 (not O0) -Xclang -disable-llvm-passes"? I believe the purpose is to generate unoptimized .bc files.

Then, all machine-independent optimizations are done within LLD.

If your only goal is to test middle-end pass pipelines (e.g. synergies of different passes) then that can be a good approach. However, keep in mind that this is really just a small part of the larger design problem of getting the best code with the best compile time. In practice, profile feedback (and making good use of it), together with accurate cost modeling (especially in the middle end for optimizations like inlining and loop unrolling), together with appropriate link-time cross-module optimization tend to matter just as much (or more) than a particularly intelligently chosen sequence of passes.

This is very interesting to me.

What you described is as follows:

Performance (PGO + accurate cost modeling + appropriate Link-time cross-module optimization) >= Performance (intelligently chosen sequence of passes)

Now, my question are

1. what do you mean "accurate cost modeling"?

Does you mean tune cost model for each optimization with detailed target machine micro-architectural information?

If so, is this process automated or manual? and how?

2. when you say "appropriate Link-time cross-module optimization", what does appropriate mean?

How to decide which modules/optimizations are appropriate?

Is this process manual or automated?

3. What does "intelligently chosen sequence of passes" mean?

Especially, how to decide which sequence of passes is better than another one? Random sampling and measuring runtime?

4. Do you have any evidence to support the performance of the former is better than the latter?

Mehdi AMINI via llvm-dev

unread,

Jan 8, 2018, 11:12:21 AM1/8/18

to toddy wang, llvm-dev, Peizhao Ou

2018-01-07 23:16 GMT-08:00 toddy wang <wenwan...@gmail.com>:

-mllvm <value> Additional arguments to forward to LLVM's option processing

This is dumped by clang. I am not sure what I am supposed to put as value in order to tune unrolling/inlining threshold.

As the help says, this is used to pass argument to LLVM itself. If you remember you earlier question about setA (clang options) and setC (opt options), this allows to reach setC from the clang command line.

Any option that you see in the output of `opt --help` can be set from clang using `-mllvm`. Same caveat as I mentioned before: these aren't supposed to be end-user options.

--

Mehdi

toddy wang via llvm-dev

unread,

Jan 8, 2018, 11:42:14 AM1/8/18

to Mehdi AMINI, llvm-dev, Peizhao Ou

Hi Medhi,

It seems -mllvm does not work as expected. Anything wrong?

[twang15@c92 temp]$ clang++ -O3 -mllvm -deadargelim LULESH.cc

clang (LLVM option parsing): Unknown command line argument '-deadargelim'. Try: 'clang (LLVM option parsing) -help'

clang (LLVM option parsing): Did you mean '-regalloc'?

[twang15@c92 temp]$ clang++ -O3 -mllvm deadargelim LULESH.cc

clang (LLVM option parsing): Unknown command line argument 'deadargelim'. Try: 'clang (LLVM option parsing) -help'

-Tao

Mehdi AMINI via llvm-dev

unread,

Jan 8, 2018, 11:53:41 AM1/8/18

to toddy wang, llvm-dev, Peizhao Ou

2018-01-08 8:41 GMT-08:00 toddy wang <wenwan...@gmail.com>:

Hi Medhi,

It seems -mllvm does not work as expected. Anything wrong?

[twang15@c92 temp]$ clang++ -O3 -mllvm -deadargelim LULESH.cc
clang (LLVM option parsing): Unknown command line argument '-deadargelim'. Try: 'clang (LLVM option parsing) -help'
clang (LLVM option parsing): Did you mean '-regalloc'?

[twang15@c92 temp]$ clang++ -O3 -mllvm deadargelim LULESH.cc
clang (LLVM option parsing): Unknown command line argument 'deadargelim'. Try: 'clang (LLVM option parsing) -help'

You can't schedule passes this way, only set parameters like -unroll-threshold=<uint> etc.

--

Mehdi

toddy wang via llvm-dev

unread,

Jan 8, 2018, 11:59:23 AM1/8/18

to Mehdi AMINI, llvm-dev, Peizhao Ou

On Mon, Jan 8, 2018 at 11:53 AM, Mehdi AMINI <joke...@gmail.com> wrote:

2018-01-08 8:41 GMT-08:00 toddy wang <wenwan...@gmail.com>:
Hi Medhi,

It seems -mllvm does not work as expected. Anything wrong?

[twang15@c92 temp]$ clang++ -O3 -mllvm -deadargelim LULESH.cc
clang (LLVM option parsing): Unknown command line argument '-deadargelim'. Try: 'clang (LLVM option parsing) -help'
clang (LLVM option parsing): Did you mean '-regalloc'?

[twang15@c92 temp]$ clang++ -O3 -mllvm deadargelim LULESH.cc
clang (LLVM option parsing): Unknown command line argument 'deadargelim'. Try: 'clang (LLVM option parsing) -help'

You can't schedule passes this way, only set parameters like -unroll-threshold=<uint> etc.

Where can I find options like -unroll-threshold=<uint>? I cannot find it in either opt -help or clang -help.

Mehdi AMINI via llvm-dev

unread,

Jan 8, 2018, 12:48:50 PM1/8/18

to toddy wang, llvm-dev, Peizhao Ou

2018-01-08 8:59 GMT-08:00 toddy wang <wenwan...@gmail.com>:

On Mon, Jan 8, 2018 at 11:53 AM, Mehdi AMINI <joke...@gmail.com> wrote:

2018-01-08 8:41 GMT-08:00 toddy wang <wenwan...@gmail.com>:
Hi Medhi,

It seems -mllvm does not work as expected. Anything wrong?

[twang15@c92 temp]$ clang++ -O3 -mllvm -deadargelim LULESH.cc
clang (LLVM option parsing): Unknown command line argument '-deadargelim'. Try: 'clang (LLVM option parsing) -help'
clang (LLVM option parsing): Did you mean '-regalloc'?

[twang15@c92 temp]$ clang++ -O3 -mllvm deadargelim LULESH.cc
clang (LLVM option parsing): Unknown command line argument 'deadargelim'. Try: 'clang (LLVM option parsing) -help'

You can't schedule passes this way, only set parameters like -unroll-threshold=<uint> etc.

Where can I find options like -unroll-threshold=<uint>? I cannot find it in either opt -help or clang -help.

This one shows up in `opt --help-hidden`. Otherwise in the source code for each transformation.

(remember when I mentioned these are intended for LLVM developers and not end-user facing?).

--

Mehdi

toddy wang via llvm-dev

unread,

Jan 9, 2018, 1:55:58 AM1/9/18

to Mehdi AMINI, llvm-dev, Peizhao Ou

Mehdi,

I found -unroll-max-count can be passed w/ -mllvm.

-dce, -adce, etc, are also dumped by 'opt --help-hidden'. However, they cannot be passed w/ -mllvm.

Is this what "You can't schedule passes this way, only set parameters like -unroll-threshold=<uint> etc." means?

[twang15@c89 temp]$ clang++ -mllvm -unroll-max-count=4 -mllvm -dce -save-temps LULESH.cc

clang (LLVM option parsing): Unknown command line argument '-dce'. Try: 'clang (LLVM option parsing) -help'

clang (LLVM option parsing): Did you mean '-mv4'?

Craig Topper via llvm-dev

unread,

Jan 9, 2018, 2:00:42 AM1/9/18

to toddy wang, llvm-dev, Peizhao Ou

Yes that is what he meant. "-dce, -adce, etc" are command line options consumed by tools/opt/opt.cpp to give to the PassManagerBuilder that it creates. The parsing of those options doesn't exist in any of the llvm library code that is linked into clang. Clang has its own code for populating a PassManagerBuilder in tools/clang/lib/CodeGen/BackendUtil.cpp

~Craig

toddy wang via llvm-dev

unread,

Jan 9, 2018, 3:09:42 AM1/9/18

to Craig Topper, llvm-dev, Peizhao Ou

Thanks, Craig.

So, clang -Xclang -disable-llvm-passes actually disables all the LLVM passed populated by clang so that there is no middle-end optimization on bc files.

clang -O2 LULESH.c //clang is the driver, invoking cc1, cc1as, ld

//options can be passed through to cc1 directly.

//maybe have different names, e.g. -fvectorize in clang driver and -vectorize-loops in clang -cc1

//options are dumped by clang -help and clang --help-hidden

clang -cc1 // c/c++ frontend is also referred as clang

// this is the c/c++ frontend(preprocessor + Lexer + parser) and middle-end ( LLVM-IR optimizer + IR-assembly generator)

//controlled by -Xclang <options>, Xclang options dumped by clang -cc1 -help

//mllvm Options like -unroll-max count are controlled by -mllvm <options>.

//mllvm Options can be dumped by clang -v -help -mllvm and clang -v --help-hidden

//Question: are all mllvm options for middle-end while Xclang options are for front-end?

clang -cc1as // assembly-obj assembler

ld/ldd/gold //linker (if -flto is not provided) or link-time optimizer and linker (if -flto -fuse-ld=lld is provided or -flto -fuse-ld=gold is provided)

toddy wang via llvm-dev

unread,

Jan 9, 2018, 3:12:52 AM1/9/18

to Craig Topper, llvm-dev, Peizhao Ou

//mllvm Options can be dumped by clang -v -help -mllvm and clang -v --help-hidden

-->

//mllvm Options can be dumped by clang -v -help -mllvm and clang -v --help-hidden -mllvm

toddy wang via llvm-dev

unread,

Jan 9, 2018, 3:16:26 AM1/9/18

to Craig Topper, llvm-dev, Peizhao Ou

Sorry,

//mllvm Options can be dumped by clang -v -help -mllvm and clang -v --help-hidden -mllvm

--> should be

//mllvm Options can be dumped by opt -help-hidden

Mehdi AMINI via llvm-dev

unread,

Jan 9, 2018, 10:55:24 AM1/9/18

to toddy wang, llvm-dev, Peizhao Ou

2018-01-09 0:09 GMT-08:00 toddy wang <wenwan...@gmail.com>:

Thanks, Craig.

So, clang -Xclang -disable-llvm-passes actually disables all the LLVM passed populated by clang so that there is no middle-end optimization on bc files.

clang -O2 LULESH.c //clang is the driver, invoking cc1, cc1as, ld
//options can be passed through to cc1 directly.
//maybe have different names, e.g. -fvectorize in clang driver and -vectorize-loops in clang -cc1
//options are dumped by clang -help and clang --help-hidden

clang -cc1 // c/c++ frontend is also referred as clang
// this is the c/c++ frontend(preprocessor + Lexer + parser) and middle-end ( LLVM-IR optimizer + IR-assembly generator)
//controlled by -Xclang <options>, Xclang options dumped by clang -cc1 -help
//mllvm Options like -unroll-max count are controlled by -mllvm <options>.
//mllvm Options can be dumped by clang -v -help -mllvm and clang -v --help-hidden

//Question: are all mllvm options for middle-end while Xclang options are for front-end?

mllvm options are LLVM *developer* options, which includes the middle-end and the backend.

Xclang options are intended to bypass the driver and pass directly CC1 options. These can also control the middle-end, for example the -vectorize-loops you mentioned above.