[llvm-dev] llvm and clang are getting slower

302 views
Skip to first unread message

Rafael Espíndola

unread,
Mar 8, 2016, 11:13:38 AM3/8/16
to llvm-dev
I have just benchmarked building trunk llvm and clang in Debug,
Release and LTO modes (see the attached scrip for the cmake lines).

The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
cases I used the system libgcc and libstdc++.

For release builds there is a monotonic increase in each version. From
163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc
5.3.2 takes 205 minutes.

Debug and LTO show an improvement in 3.7, but have regressed again in 3.8.

Cheers,
Rafael
run.sh
LTO.time
Debug.time
Release.time

Mehdi Amini via llvm-dev

unread,
Mar 8, 2016, 12:41:12 PM3/8/16
to Rafael Espíndola, llvm-dev, cfe-dev
Hi Rafael,

CC: cfe-dev

Thanks for sharing. We also noticed this internally, and I know that Bruno and Chris are working on some infrastructure and tooling to help tracking closely compile time regressions.

We had this conversation internally about the tradeoff between compile-time and runtime performance, and I planned to bring-up the topic on the list in the coming months, this looks like a good occasion to plant the seed. Apparently in the past (years/decade ago?) the project was very conservative on adding any optimizations that would impact compile time, however there is no explicit policy (that I know of) to address this tradeoff.
The closest I could find would be what Chandler wrote in: http://reviews.llvm.org/D12826 ; for instance for O2 he stated that "if an optimization increases compile time by 5% or increases code size by 5% for a particular benchmark, that benchmark should also be one which sees a 5% runtime improvement".

My hope is that with better tooling for tracking compile time in the future, we'll reach a state where we'll be able to consider "breaking" the compile-time regression test as important as breaking any test: i.e. the offending commit should be reverted unless it has been shown to significantly (hand wavy...) improve the runtime performance.

<troll>
With the current trend, the Polly developers don't have to worry about improving their compile time, we'll catch up with them ;)
</troll>

--
Mehdi

> <run.sh><LTO.time><Debug.time><Release.time>_______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Jonas Paulsson via llvm-dev

unread,
Mar 8, 2016, 12:41:53 PM3/8/16
to llvm...@lists.llvm.org
Hi,

There is a possibility that r259673 could play a role here.

For the buildSchedGraph() method, there is the -dag-maps-huge-region that has the default value of 1000. When I commited the patch, I was expecting people to lower this value as needed and also suggested this, but this has not happened. 1000 is very high, basically "unlimited".

It would be interesting to see what results you get with e.g. -mllvm -dag-maps-huge-region=50. Of course, since this is a trade-off between compile time and scheduler freedom, some care should be taken before lowering this in trunk.

Just a thought,

Jonas

Hal Finkel via llvm-dev

unread,
Mar 8, 2016, 12:56:30 PM3/8/16
to Mehdi Amini, llvm-dev, cfe-dev

My two largest pet peeves in this area are:

1. We often use functions from ValueTracking (to get known bits, the number of sign bits, etc.) as through they're low cost. They're not really low cost. The problem is that they *should* be. These functions do bottom-up walks, and could cache their results. Instead, they do a limited walk and recompute everything each time. This is expensive, and a significant amount of our InstCombine time goes to ValueTracking, and that shouldn't be the case. The more we add to InstCombine (and related passes), and the more we run InstCombine, the worse this gets. On the other hand, fixing this will help both compile time and code quality.

Furthermore, BasicAA has the same problem.

2. We have "cleanup" passes in the pipeline, such as those that run after loop unrolling and/or vectorization, that run regardless of whether the preceding pass actually did anything. We've been adding more of these, and they catch important use cases, but we need a better infrastructure for this (either with the new pass manager or otherwise).

Also, I'm very hopeful that as our new MemorySSA and GVN improvements materialize, we'll see large compile-time improvements from that work. We spend a huge amount of time in GVN computing memory-dependency information (the dwarfs the time spent by GVN doing actual value numbering work by an order of magnitude or more).

-Hal

>
> --
> Mehdi
>
>
>
>
>
>
> > On Mar 8, 2016, at 8:13 AM, Rafael Espíndola via llvm-dev
> > <llvm...@lists.llvm.org> wrote:
> >
> > I have just benchmarked building trunk llvm and clang in Debug,
> > Release and LTO modes (see the attached scrip for the cmake lines).
> >
> > The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
> > cases I used the system libgcc and libstdc++.
> >
> > For release builds there is a monotonic increase in each version.
> > From
> > 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc
> > 5.3.2 takes 205 minutes.
> >
> > Debug and LTO show an improvement in 3.7, but have regressed again
> > in 3.8.
> >
> > Cheers,
> > Rafael
> > <run.sh><LTO.time><Debug.time><Release.time>_______________________________________________
> > LLVM Developers mailing list
> > llvm...@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________

> cfe-dev mailing list
> cfe...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Adam Nemet via llvm-dev

unread,
Mar 8, 2016, 1:22:52 PM3/8/16
to Hal Finkel, llvm-dev, cfe-dev
A related issue is that if an analysis is not preserved by a pass, it gets invalidated *even if* the pass doesn’t end up modifying the code.  Because of this for example we invalidate SCEV’s cache unnecessarily.   The new pass manager should fix this.

Adam

Daniel Berlin via llvm-dev

unread,
Mar 8, 2016, 1:23:54 PM3/8/16
to Hal Finkel, llvm-dev, cfe-dev
I think you hit on something that i would expand on:

We don't hold the line very well on adding little things to passes and analysis over time.
We add 1000 little walkers and pattern matchers to try to get better code, and then often add knobs to try to control their overall compile time.
At some point, these all add up. You end up with the same flat profile if you do this everywhere, but your compiler gets slower.
At some point, someone has to stop and say "well, wait a minute, are there better algorithms or architecture we should be using to do this", and either do it, or not let it get worse :) I'd suggest, in most cases, we know better ways to do almost all of these things.

Don't get me wrong, i don't believe there is any theoretically pure way to do everything that we can just implement and never have to tweak.  But it's a continuum, and at some point you have to stop and re-evaluate whether the current approach is really the right one if you have to have a billion little things to it get what you want.
We often don't do that.
We go *very* far down the path of a billion tweaks and adding knobs, and what we have now, compile time wise, is what you get when you do that :)
I suspect this is because we don't really want to try to force work on people who are just trying to get crap done.  We're all good contributors trying to do the right thing, and saying no often seems obstructionist, etc.
The problem is at some point you end up with the tragedy of the commons.

(also, not everything in the compiler has to catch every case to get good code)


 1. We often use functions from ValueTracking (to get known bits, the number of sign bits, etc.) as through they're low cost. They're not really low cost. The problem is that they *should* be. These functions do bottom-up walks, and could cache their results. Instead, they do a limited walk and recompute everything each time. This is expensive, and a significant amount of our InstCombine time goes to ValueTracking, and that shouldn't be the case. The more we add to InstCombine (and related passes), and the more we run InstCombine, the worse this gets. On the other hand, fixing this will help both compile time and code quality.

(LVI is another great example. Fun fact: If you ask for value info for everything, it's no longer lazy ....) 

  Furthermore, BasicAA has the same problem.

 2. We have "cleanup" passes in the pipeline, such as those that run after loop unrolling and/or vectorization, that run regardless of whether the preceding pass actually did anything. We've been adding more of these, and they catch important use cases, but we need a better infrastructure for this (either with the new pass manager or otherwise).

Also, I'm very hopeful that as our new MemorySSA and GVN improvements materialize, we'll see large compile-time improvements from that work. We spend a huge amount of time in GVN computing memory-dependency information (the dwarfs the time spent by GVN doing actual value numbering work by an order of magnitude or more).

I'm a working on it ;)
 

Richard Smith via llvm-dev

unread,
Mar 8, 2016, 1:42:56 PM3/8/16
to Rafael Espíndola, llvm-dev

I'm curious how these times divide across Clang and various parts of
LLVM; rerunning with -ftime-report and summing the numbers across all
compiles could be interesting.

Xinliang David Li via llvm-dev

unread,
Mar 8, 2016, 1:50:05 PM3/8/16
to Daniel Berlin, llvm-dev, cfe-dev
Yep -- see the bug Wei is working on:  https://llvm.org/bugs/show_bug.cgi?id=10584

David
 

  Furthermore, BasicAA has the same problem.

 2. We have "cleanup" passes in the pipeline, such as those that run after loop unrolling and/or vectorization, that run regardless of whether the preceding pass actually did anything. We've been adding more of these, and they catch important use cases, but we need a better infrastructure for this (either with the new pass manager or otherwise).

Also, I'm very hopeful that as our new MemorySSA and GVN improvements materialize, we'll see large compile-time improvements from that work. We spend a huge amount of time in GVN computing memory-dependency information (the dwarfs the time spent by GVN doing actual value numbering work by an order of magnitude or more).

I'm a working on it ;)
 

mats petersson via llvm-dev

unread,
Mar 8, 2016, 1:55:29 PM3/8/16
to Richard Smith, llvm-dev
I have noticed that LLVM doesn't seem to "like" large functions, as a general rule. Admittedly, my experience is similar with gcc, so I'm not sure it's something that can be easily fixed. And I'm probably sounding like a broken record, because I have said this before.

My experience is that the time it takes to compile something is growing above linear with size of function.

Of course, the LLVM code is growing over time, both to support more features and to support more architectures, new processor types and instruction sets, at least of which will lead to larger functions in general [and this is the function "after inlining", so splitting small 'called once' functions out doesn't really help much].

I will have a little play to see if I can identify more of a cuplrit [at the very least if it's "large basic blocks" or "large functions" that is the problem] - of course, this could be unrelated and irellevant to the problem Daniel is pointing at, and it may or may not be easily resolved...

--
Mats

Nico Weber via llvm-dev

unread,
Mar 8, 2016, 2:11:34 PM3/8/16
to Xinliang David Li, llvm-dev, cfe-dev
On a somewhat smaller (but hopefully more actionable) scale, we noticed that build time regressed ~10% recently in 262315:262447. I'm still trying to repro locally (no luck so far; maybe it's a bot config thing, not a clang-side problem), but if this rings a bell to anyone, please let me know :-)

Xinliang David Li via llvm-dev

unread,
Mar 8, 2016, 2:18:52 PM3/8/16
to mats petersson, llvm-dev
On Tue, Mar 8, 2016 at 10:55 AM, mats petersson via llvm-dev <llvm...@lists.llvm.org> wrote:
I have noticed that LLVM doesn't seem to "like" large functions, as a general rule. Admittedly, my experience is similar with gcc, so I'm not sure it's something that can be easily fixed. And I'm probably sounding like a broken record, because I have said this before.

My experience is that the time it takes to compile something is growing above linear with size of function.


The number of BBs -- Kosyia can point you to the compile time bug that is exposed by asan .

David

Sean Silva via llvm-dev

unread,
Mar 8, 2016, 2:47:42 PM3/8/16
to Rafael Espíndola, llvm-dev
In case someone finds it useful, this is some indication of the breakdown of where time is spent during a build of Clang.

tl;dr: in Debug+Asserts about 10% of time is spent in the backend and in Release without asserts (and without debug info IIRC) about 33% of time is spent in the backend.


These are the charts I collected a while back breaking down the time it takes clang to compile itself.
See the thread "[cfe-dev] Some DTrace probes for measuring per-file time" for how I collected this information. The raw data is basically aggregated CPU time spent textually parsing each header (and IRGen'ing them, since clang does that as it parses. There are also a couple "phony" headers to cover stuff like the backend/optimizer.

Since there a large number of files, the pie charts below are grouped into rough categories. E.g. the "llvm headers" includes the time spent on include/llvm/Support/raw_ostream.h and all other headers in include/llvm. The "libc++" pie slice contains the time spent in the libc++ system headers (this data was collected on a mac, so libc++ was the C++ standard library). "system" are C system headers.

All time spent inside the LLVM optimizer is in the "after parsing" pie slice.


Debug with asserts:


Inline image 1



Release without asserts (and without debug info IIRC):


Inline image 2



-- Sean Silva

On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola <llvm...@lists.llvm.org> wrote:

Sean Silva via llvm-dev

unread,
Mar 8, 2016, 4:09:36 PM3/8/16
to Richard Smith, llvm-dev
On Tue, Mar 8, 2016 at 10:42 AM, Richard Smith via llvm-dev <llvm...@lists.llvm.org> wrote:
On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola
<llvm...@lists.llvm.org> wrote:
> I have just benchmarked building trunk llvm and clang in Debug,
> Release and LTO modes (see the attached scrip for the cmake lines).
>
> The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
> cases I used the system libgcc and libstdc++.
>
> For release builds there is a monotonic increase in each version. From
> 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc
> 5.3.2 takes 205 minutes.
>
> Debug and LTO show an improvement in 3.7, but have regressed again in 3.8.

I'm curious how these times divide across Clang and various parts of
LLVM; rerunning with -ftime-report and summing the numbers across all
compiles could be interesting.

Based on the results I posted upthread about the relative time spend in the backend for debug vs release, we can estimate this.
To summarize:
10% of time spent in LLVM for Debug
33% of time spent in LLVM for Release
(I'll abbreviate "in LLVM" as just "backend"; this is "backend" from clang's perspective)

Let's look at the difference between 3.5 and trunk.

For debug, the user time jumps from 174m50.251s to 197m9.932s.
That's {10490.3, 11829.9} seconds, respectively.
For release, the corresponding numbers are:
{9826.71, 12714.3} seconds.

debug35 = 10490.251
debugTrunk = 11829.932

debugTrunk/debug35 == 1.12771
debugRatio = 1.12771

release35 = 9826.705
releaseTrunk = 12714.288

releaseTrunk/release35 == 1.29385
releaseRatio = 1.29385

For simplicity, let's use a simple linear model for the distribution of slowdown between the frontend and backend: a constant factor slowdown for the backend, and an independent constant factor slowdown for the frontend. This gives the following linear system:
debugRatio = .1 * backendRatio + (1 - .1) * frontendRatio
releaseRatio = .33 * backendRatio + (1 - .33) * frontendRatio

Solving this linear system we find that under this simple model, the expected slowdown factors are:
backendRatio = 1.77783
frontendRatio = 1.05547

Intuitively, backendRatio comes out larger in this comparison because we see the biggest slowdown during release (1.29 vs 1.12), and during release we are spending a larger fraction of time in the backend (33% vs 10%).

Applying this same model to across Rafael's data, we find the following (numbers have been rounded for clarity):

transition       backendRatio   frontendRatio
3.5->3.6         1.08           1.03
3.6->3.7         1.30           0.95
3.7->3.8         1.34           1.07
3.8->trunk       0.98           1.02                

Note that in Rafael's measurements LTO is pretty similar to Release from a CPU time (user time) standpoint. While the final LTO link takes a large amount of real time, it is single threaded. Based on the real time numbers the LTO link was only spending about 20 minutes single-threaded (i.e. about 20 minutes CPU time), which is pretty small compared to the 300-400 minutes of total CPU time. It would be interesting to see the numbers for -O0 or -O1 per-TU together with LTO.

-- Sean Silva

Sean Silva via llvm-dev

unread,
Mar 8, 2016, 4:10:50 PM3/8/16
to Richard Smith, llvm-dev
On Tue, Mar 8, 2016 at 1:09 PM, Sean Silva <chiso...@gmail.com> wrote:


On Tue, Mar 8, 2016 at 10:42 AM, Richard Smith via llvm-dev <llvm...@lists.llvm.org> wrote:
On Tue, Mar 8, 2016 at 8:13 AM, Rafael Espíndola
<llvm...@lists.llvm.org> wrote:
> I have just benchmarked building trunk llvm and clang in Debug,
> Release and LTO modes (see the attached scrip for the cmake lines).
>
> The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
> cases I used the system libgcc and libstdc++.
>
> For release builds there is a monotonic increase in each version. From
> 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc
> 5.3.2 takes 205 minutes.
>
> Debug and LTO show an improvement in 3.7, but have regressed again in 3.8.

I'm curious how these times divide across Clang and various parts of
LLVM; rerunning with -ftime-report and summing the numbers across all
compiles could be interesting.

Based on the results I posted upthread about the relative time spend in the backend for debug vs release, we can estimate this.
To summarize:

That is, to summarize the post upthread that I'm referring to. The summary of this post is that most of the slowdown seems to be in the backend.

-- Sean Silva

Sean Silva via llvm-dev

unread,
Mar 8, 2016, 4:13:46 PM3/8/16
to Xinliang David Li, llvm-dev
On Tue, Mar 8, 2016 at 11:18 AM, Xinliang David Li via llvm-dev <llvm...@lists.llvm.org> wrote:


On Tue, Mar 8, 2016 at 10:55 AM, mats petersson via llvm-dev <llvm...@lists.llvm.org> wrote:
I have noticed that LLVM doesn't seem to "like" large functions, as a general rule. Admittedly, my experience is similar with gcc, so I'm not sure it's something that can be easily fixed. And I'm probably sounding like a broken record, because I have said this before.

My experience is that the time it takes to compile something is growing above linear with size of function.


The number of BBs -- Kosyia can point you to the compile time bug that is exposed by asan .

I believe we also have some superlinear behavior with BB size as well.

-- Sean Silva

Xinliang David Li via llvm-dev

unread,
Mar 8, 2016, 4:27:39 PM3/8/16
to Sean Silva, llvm-dev
See https://llvm.org/bugs/show_bug.cgi?id=17409

I believe much of the compile issues related to BlockFrequency computation has been fixed by Cong's recent work to convert weight based interfaces to BranchProbability based interfaces.  The other issue related to Spiller probably still remains.

David

Mehdi Amini via llvm-dev

unread,
Mar 8, 2016, 5:25:35 PM3/8/16
to Sean Silva, llvm-dev
Just a note about LTO being sequential: Rafael mentioned he was "building trunk llvm and clang". By default I believe it is ~56 link targets that can be run in parallel (provided you have enough RAM to avoid swapping).

-- 
Mehdi

Rafael Espíndola

unread,
Mar 8, 2016, 5:52:22 PM3/8/16
to Mehdi Amini, llvm-dev
> Just a note about LTO being sequential: Rafael mentioned he was "building
> trunk llvm and clang". By default I believe it is ~56 link targets that can
> be run in parallel (provided you have enough RAM to avoid swapping).

Correct. The machine has no swap :-)

But some targets (clang) are much larger and I have the impression
that the last minute or so of the build is just finishing that one
link.

Cheers,
Rafael

Renato Golin via llvm-dev

unread,
Mar 8, 2016, 8:15:26 PM3/8/16
to Adam Nemet, LLVM Dev, cfe-dev

On 9 Mar 2016 1:22 a.m., "Adam Nemet via cfe-dev" <cfe...@lists.llvm.org> wrote:
> A related issue is that if an analysis is not preserved by a pass, it gets invalidated *even if* the pass doesn’t end up modifying the code.  Because of this for example we invalidate SCEV’s cache unnecessarily.   The new pass manager should fix this.

+1

Sean Silva via llvm-dev

unread,
Mar 9, 2016, 3:10:11 PM3/9/16
to Mehdi Amini, llvm-dev
D'oh! I was looking at the data wrong since I broke my Fundamental Rule of Looking At Data, namely: don't look at raw numbers in a table since you are likely to look at things wrong or form biases based on the order in which you look at the data points; *always* visualize. There is a significant difference between release and LTO. About 2x consistently.

Inline image 3

This is actually curious because during the release build, we were spending 33% of CPU time in the backend (as clang sees it; i.e. mid-level optimizer and codegen). This data is inconsistent with LTO simply being another run through the backend (which would be just +33% CPU time at worst). There seems to be something nonlinear happening.
To make it worse, the LTO build has approximately a full Release optimization running per-TU, so the actual LTO step should be seeing inlined/"cleaned up" IR which should be much smaller than what the per-TU optimizer is seeing, so naively it should take *even less* than "another 33% CPU time" chunk.
Yet we see 1.5x-2x difference:

Inline image 4

-- Sean Silva
 

-- 
Mehdi


Xinliang David Li via llvm-dev

unread,
Mar 9, 2016, 3:39:28 PM3/9/16
to Sean Silva, llvm-dev
The lto time could be explained by second order effect due to increased dcache/dtlb pressures due to increased memory footprint and poor locality.

David

Sean Silva via llvm-dev

unread,
Mar 9, 2016, 5:02:23 PM3/9/16
to Xinliang David Li, llvm-dev
On Wed, Mar 9, 2016 at 12:38 PM, Xinliang David Li <xinli...@gmail.com> wrote:
The lto time could be explained by second order effect due to increased dcache/dtlb pressures due to increased memory footprint and poor locality.

Actually thinking more about this, I was totally wrong. Mehdi said that we LTO ~56 binaries. If we naively assume that each binary is like clang and links in "everything" and that the LTO process takes CPU time equivalent to "-O3 for every TU", then we would expect that *for each binary* we would see +33% (total increase >1800% vs Release). Clearly that is not happening since the actual overhead is only 50%-100%, so we need a more refined explanation.

There are a couple factors that I can think of.
a) there are 56 binaries being LTO'd (this will tend to increase our estimate)
b) not all 56 binaries are the size of clang (this will tend to decrease our estimate)
c) per-TU processing only is doing mid-level optimizations and no codegen (this will tend to decrease our estimate)
d) IR seen during LTO has already been "cleaned up" and has less overall size/amount of optimizations that will apply during the LTO process (this will tend to decrease our estimate)
e) comdat folding in the linker means that we only codegen (this will tend to decrease our estimate)

Starting from a (normalized) release build with
releaseBackend = .33
releaseFrontend = .67
release = releaseBackend + releaseFrontend  = 1

Let us try to obtain
LTO = (some expression involving releaseFrontend and releaseBackend) = 1.5-2

For starters, let us apply a), with a naive assumption that for each of the numBinaries = 52 binaries we add the cost of releaseBackend (I just checked and 52 is the exact number for LLVM+Clang+LLD+clang-tools-extra, ignoring symlinks). This gives
LTO = release + 52 * releaseBackend = 21.46, which is way high.

Let us apply b). A quick check gives 371,515,392 total bytes of text in a release build across all 52 binaries (Mac, x86_64). Clang is 45,182,976 bytes of text. So using final text size in Release as an indicator of the total code seen by the LTO process, we can use a coefficient of 1/8, i.e. the average binary links in about avgTextFraction = 1/8 of "everything".
LTO = release + 52 * (.125 * releaseBackend) = 3.14

We are still high. For c), Let us assume that half of releaseBackend is spend after mid-level optimizations. So let codegenFraction = .5 be the fraction of releaseBackend that is spend after mid-level optimizations. We can discount this time from the LTO build since it does not that work per-TU.
LTO = release + 52 * (.125 * releaseBackend) - (codegenFraction * releaseBackend) = 2.98
So this is not a significant reduction.

I don't have a reasonable estimate a priori for d) or e), but altogether they reduce to a constant factor otherSavingsFraction that multiplies the second term
LTO = release + 52 * (.125 * otherSavingsFraction * releaseBackend) - (codegenFraction * releaseBackend) =? 1.5-2x

Given the empirical data, this suggests that otherSavingsFraction must have a value around 1/2, which seems reasonable.

For a moment I was rather surprised that we could have 52 binaries and it would be only 2x, but this closer examination shows that between avgTextFraction = .125 and releaseBackend = .33 the "52" is brought under control.

-- Sean Silva

Kostya Serebryany via llvm-dev

unread,
Mar 9, 2016, 5:07:13 PM3/9/16
to Xinliang David Li, llvm-dev
On Tue, Mar 8, 2016 at 11:18 AM, Xinliang David Li <xinli...@gmail.com> wrote:


On Tue, Mar 8, 2016 at 10:55 AM, mats petersson via llvm-dev <llvm...@lists.llvm.org> wrote:
I have noticed that LLVM doesn't seem to "like" large functions, as a general rule. Admittedly, my experience is similar with gcc, so I'm not sure it's something that can be easily fixed. And I'm probably sounding like a broken record, because I have said this before.

My experience is that the time it takes to compile something is growing above linear with size of function.


The number of BBs -- Kosyia can point you to the compile time bug that is exposed by asan .

Not just asan, this bug reproduces in a wide range of cases.
By now I am not even sure if this is a single bug or a set of independent (but similarly looking) problems.  

Xinliang David Li via llvm-dev

unread,
Mar 9, 2016, 5:32:44 PM3/9/16
to Sean Silva, llvm-dev
On Wed, Mar 9, 2016 at 1:55 PM, Sean Silva <chiso...@gmail.com> wrote:


On Wed, Mar 9, 2016 at 12:38 PM, Xinliang David Li <xinli...@gmail.com> wrote:
The lto time could be explained by second order effect due to increased dcache/dtlb pressures due to increased memory footprint and poor locality.

Actually thinking more about this, I was totally wrong. Mehdi said that we LTO ~56 binaries. If we naively assume that each binary is like clang and links in "everything" and that the LTO process takes CPU time equivalent to "-O3 for every TU", then we would expect that *for each binary* we would see +33% (total increase >1800% vs Release). Clearly that is not happening since the actual overhead is only 50%-100%, so we need a more refined explanation.

There are a couple factors that I can think of.
a) there are 56 binaries being LTO'd (this will tend to increase our estimate)
b) not all 56 binaries are the size of clang (this will tend to decrease our estimate)
c) per-TU processing only is doing mid-level optimizations and no codegen (this will tend to decrease our estimate)
d) IR seen during LTO has already been "cleaned up" and has less overall size/amount of optimizations that will apply during the LTO process (this will tend to decrease our estimate)
e) comdat folding in the linker means that we only codegen (this will tend to decrease our estimate)

Starting from a (normalized) release build with
releaseBackend = .33
releaseFrontend = .67
release = releaseBackend + releaseFrontend  = 1

Let us try to obtain
LTO = (some expression involving releaseFrontend and releaseBackend) = 1.5-2

For starters, let us apply a), with a naive assumption that for each of the numBinaries = 52 binaries we add the cost of releaseBackend (I just checked and 52 is the exact number for LLVM+Clang+LLD+clang-tools-extra, ignoring symlinks). This gives
LTO = release + 52 * releaseBackend = 21.46, which is way high.

Some bitcode .o files (such as in support libs) are linked in by more than one targets, but not all .o files are. Suppose the average duplication factor is DupFactor, then LTO time should be approximated by

LTO = releaseFrontend + DupFactor*ReleaseBackend

Consider comdat elimination and let DedupFactor is the ratio of total number of unique functions over total number of functions produced by FE, the LTO time is approximated by:

LTO = releaseFrontend + DupFactor*DedupFactor*ReleaseBackend

David

Duraid Madina via llvm-dev

unread,
Mar 9, 2016, 6:57:41 PM3/9/16
to mats petersson, llvm-dev
A historical note:

Back in the pre-Clang LLVM 1.x dark ages you could, if you
pressed the right buttons, run LLVM as a very fast portable
codegen. MB/s was a reasonable measure as the speed was (or
could be made to be) fairly independent of the input structure.

Since ~2006, as LLVM has shifted from "awesome research
plaything" to "compiler people depend on", there has been a
focus on ensuring that typical software compiles quickly and
well. Many good things have followed as a result, but you are
certainly correct that LLVM doesn't handle large input
particularly well. Having said that, some projects (the Gambit
Scheme->C and Verilator Verilog->C compilers come to mind)
routinely see runtimes 10~100x that of GCC in typical use. So
perhaps we are thinking of different things if you're seeing
similar issues with GCC.

I suspect that despite the passage of time the problem remains
solvable - there's probably *more* work to be done now, but I
don't think there are any massively *difficult* problems to be
solved. Properly quantifying/tracking the problem would be a
good first step.

Best,
Duraid

Mehdi Amini via llvm-dev

unread,
Mar 9, 2016, 7:18:28 PM3/9/16
to Duraid Madina, llvm-dev

> On Mar 9, 2016, at 3:57 PM, Duraid Madina via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> A historical note:
>
> Back in the pre-Clang LLVM 1.x dark ages you could, if you
> pressed the right buttons, run LLVM as a very fast portable
> codegen. MB/s was a reasonable measure as the speed was (or
> could be made to be) fairly independent of the input structure.
>
> Since ~2006, as LLVM has shifted from "awesome research
> plaything" to "compiler people depend on", there has been a
> focus on ensuring that typical software compiles quickly and
> well. Many good things have followed as a result, but you are
> certainly correct that LLVM doesn't handle large input
> particularly well. Having said that, some projects (the Gambit
> Scheme->C and Verilator Verilog->C compilers come to mind)
> routinely see runtimes 10~100x that of GCC in typical use. So
> perhaps we are thinking of different things if you're seeing
> similar issues with GCC.

Bug reports (with pre-processed source files preferably) are always welcome.

Collecting the test cases in a "compile time test suite" is what should follow naturally.

Best,

--
Mehdi

Bruno Cardoso Lopes via llvm-dev

unread,
Mar 14, 2016, 3:14:49 PM3/14/16
to Jonas Paulsson, llvm...@lists.llvm.org
Hi,

> There is a possibility that r259673 could play a role here.
>
> For the buildSchedGraph() method, there is the -dag-maps-huge-region that
> has the default value of 1000. When I commited the patch, I was expecting
> people to lower this value as needed and also suggested this, but this has
> not happened. 1000 is very high, basically "unlimited".
>
> It would be interesting to see what results you get with e.g. -mllvm
> -dag-maps-huge-region=50. Of course, since this is a trade-off between
> compile time and scheduler freedom, some care should be taken before
> lowering this in trunk.

Indeed we hit this internally, filed a PR:
https://llvm.org/bugs/show_bug.cgi?id=26940

As a general comment on this thread and as mentioned by Mehdi, we care
a lot about compile time and we're looking forward to contribute more
in this area in the following months; by collecting compile time
testcases into a testsuite and publicly tracking results on those we
should be able to start a RFC on a tradeoff policy.

--
Bruno Cardoso Lopes
http://www.brunocardoso.cc

Jack Howarth via llvm-dev

unread,
Mar 23, 2016, 11:39:23 AM3/23/16
to Rafael Espíndola, llvm-dev
Honza recently posted some benchmarks for building libreoffice with
GCC 6 and LTO and found a similar compile time regression for recent
llvm trunk...

http://hubicka.blogspot.nl/2016/03/building-libreoffice-with-gcc-6-and-lto.html#more

Compared to llvm 3.5,0. the builds with llvm 3.9.0 svn were 24% slower.


On Tue, Mar 8, 2016 at 11:13 AM, Rafael Espíndola


<llvm...@lists.llvm.org> wrote:
> I have just benchmarked building trunk llvm and clang in Debug,
> Release and LTO modes (see the attached scrip for the cmake lines).
>
> The compilers used were clang 3.5, 3.6, 3.7, 3.8 and trunk. In all
> cases I used the system libgcc and libstdc++.
>
> For release builds there is a monotonic increase in each version. From
> 163 minutes with 3.5 to 212 minutes with trunk. For comparison, gcc
> 5.3.2 takes 205 minutes.
>
> Debug and LTO show an improvement in 3.7, but have regressed again in 3.8.
>

> Cheers,
> Rafael

Chandler Carruth via llvm-dev

unread,
Mar 31, 2016, 4:28:48 PM3/31/16
to llvm-dev
LLVM has a wonderful policy regarding broken commits: we revert to green. We ask that a test case be available within a reasonable time frame (preferably before, but some exceptions can be made), but otherwise we revert the offending patch, even if it contains nice features that people want, and keep the tree green. This is an awesome policy.

I would like to suggest we adopt and follow the same policy for compile time regressions that are large, and especially for ones that are super-linear. As an example from the previous thread:

On Mon, Mar 14, 2016 at 12:14 PM Bruno Cardoso Lopes via llvm-dev <llvm...@lists.llvm.org> wrote:
> There is a possibility that r259673 could play a role here.
>
> For the buildSchedGraph() method, there is the -dag-maps-huge-region that
> has the default value of 1000. When I commited the patch, I was expecting
> people to lower this value as needed and also suggested this, but this has
> not happened. 1000 is very high, basically "unlimited".
>
> It would be interesting to see what results you get with e.g. -mllvm
> -dag-maps-huge-region=50. Of course, since this is a trade-off between
> compile time and scheduler freedom, some care should be taken before
> lowering this in trunk.

Indeed we hit this internally, filed a PR:
https://llvm.org/bugs/show_bug.cgi?id=26940

I think we should have rolled back r259673 as soon as the test case was available.

Thoughts?

Justin Bogner via llvm-dev

unread,
Mar 31, 2016, 4:36:38 PM3/31/16
to Chandler Carruth via llvm-dev

+1. Reverting is easy when a commit is fresh, but gets rapidly more
difficult as other changes (related or not) come after it, whereas
re-applying a commit later is usually straightforward.

Keeping the top of tree compiler in good shape improves everyone's
lives.

Bruno Cardoso Lopes via llvm-dev

unread,
Mar 31, 2016, 4:38:42 PM3/31/16
to Chandler Carruth, llvm-dev
On Thu, Mar 31, 2016 at 1:28 PM, Chandler Carruth via llvm-dev
<llvm...@lists.llvm.org> wrote:
> LLVM has a wonderful policy regarding broken commits: we revert to green. We
> ask that a test case be available within a reasonable time frame (preferably
> before, but some exceptions can be made), but otherwise we revert the
> offending patch, even if it contains nice features that people want, and
> keep the tree green. This is an awesome policy.
>
> I would like to suggest we adopt and follow the same policy for compile time
> regressions that are large, and especially for ones that are super-linear.
> As an example from the previous thread:

+1

> On Mon, Mar 14, 2016 at 12:14 PM Bruno Cardoso Lopes via llvm-dev
> <llvm...@lists.llvm.org> wrote:
>>
>> > There is a possibility that r259673 could play a role here.
>> >
>> > For the buildSchedGraph() method, there is the -dag-maps-huge-region
>> > that
>> > has the default value of 1000. When I commited the patch, I was
>> > expecting
>> > people to lower this value as needed and also suggested this, but this
>> > has
>> > not happened. 1000 is very high, basically "unlimited".
>> >
>> > It would be interesting to see what results you get with e.g. -mllvm
>> > -dag-maps-huge-region=50. Of course, since this is a trade-off between
>> > compile time and scheduler freedom, some care should be taken before
>> > lowering this in trunk.
>>
>> Indeed we hit this internally, filed a PR:
>> https://llvm.org/bugs/show_bug.cgi?id=26940
>
>
> I think we should have rolled back r259673 as soon as the test case was
> available.

I agree, but since we didn't have a policy about it, I was kind of
unsure on what to do about it. Glad you begin this discussion :-)

> Thoughts?

Ideally it would be good to have more compile time sensitive
benchmarks on the test-suite to detect those. We're are working on
collecting what we have internally and upstream to help track the
results in a public way.

Mehdi Amini via llvm-dev

unread,
Mar 31, 2016, 4:41:49 PM3/31/16
to Chandler Carruth, llvm-dev
Hi,

TLDR: I totally support considering compile time regression as bug.

I'm glad you bring this topic. Also it is worth pointing at this recent thread: http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html
And also this blog post comparing the evolution of clang and gcc on this aspect: http://hubicka.blogspot.nl/2016/03/building-libreoffice-with-gcc-6-and-lto.html

I will repeat myself here, since we also noticed internally that compile time was slowly degrading with time. Bruno and Chris are working on some infrastructure and tooling to help tracking closely compile time regressions.

We had this conversation internally about the tradeoff between compile-time and runtime performance, and I planned to bring-up the topic on the list in the coming months, but was waiting for more tooling to be ready.
Apparently in the past (years/decade ago?) the project was very conservative on adding any optimizations that would impact compile time, however there is no explicit policy (that I know of) to address this tradeoff. 
The closest I could find would be what Chandler wrote in: http://reviews.llvm.org/D12826 ; for instance for O2 he stated that "if an optimization increases compile time by 5% or increases code size by 5% for a particular benchmark, that benchmark should also be one which sees a 5% runtime improvement".

My hope is that with better tooling for tracking compile time in the future, we'll reach a state where we'll be able to consider "breaking" the compile-time regression test as important as breaking any test: i.e. the offending commit should be reverted unless it has been shown to significantly (hand wavy...) improve the runtime performance.

Since you raise the discussion now, I take the opportunity to push on the "more aggressive" side: I think the policy should be a balance between the improvement the commit brings compared to the compile time slow down. Something along the line as what you wrote in my quote above.
You are referring to "large" compile time regressions (aside: what is "large"?), while Bruno has graphs that shows that the compile time regressions are mostly a lot of 1-3% regressions in general, spread over tens of commits. 
Also (and this where we need better tooling) *unexpected* compile-time slow down are what makes me worried: i.e. the author of the commit adds something but didn't expect the compile time to be "significantly" impacted. This is motivated by Bruno/Chris data.
Tracking this more closely may also help to triage thing between O2 and O3 when a commit introduces a compile time slow-down but also brings significant enough runtime improvements.

-- 
Mehdi





Renato Golin via llvm-dev

unread,
Mar 31, 2016, 5:46:36 PM3/31/16
to Mehdi Amini, llvm-dev
On 31 March 2016 at 21:41, Mehdi Amini via llvm-dev

<llvm...@lists.llvm.org> wrote:
> TLDR: I totally support considering compile time regression as bug.

Me too.

I also agree that reverting fresh and reapplying is *much* easier than
trying to revert late.

But I'd like to avoid dubious metrics.


> The closest I could find would be what Chandler wrote in:
> http://reviews.llvm.org/D12826 ; for instance for O2 he stated that "if an
> optimization increases compile time by 5% or increases code size by 5% for a
> particular benchmark, that benchmark should also be one which sees a 5%
> runtime improvement".

I think this is a bit limited and can lead to which hunts, especially
wrt performance measurements.

Chandler's title is perfect though... Large can be vague, but
"super-linear" is not. We used to have the concept that any large
super-linear (quadratic+) compile time introductions had to be in O3
or, for really bad cases, behind additional flags. I think we should
keep that mindset.


> My hope is that with better tooling for tracking compile time in the future,
> we'll reach a state where we'll be able to consider "breaking" the
> compile-time regression test as important as breaking any test: i.e. the
> offending commit should be reverted unless it has been shown to
> significantly (hand wavy...) improve the runtime performance.

In order to have any kind of threshold, we'd have to monitor with some
accuracy the performance of both compiler and compiled code for the
main platforms. We do that to certain extent with the test-suite bots,
but that's very far from ideal.

So, I'd recommend we steer away from any kind of percentage or ratio
and keep at least the quadratic changes and beyond on special flags
(n.logn is ok for most cases).


> Since you raise the discussion now, I take the opportunity to push on the
> "more aggressive" side: I think the policy should be a balance between the
> improvement the commit brings compared to the compile time slow down.

This is a fallacy.

Compile time often regress across all targets, while execution
improvements are focused on specific targets and can have negative
effects on those that were not benchmarked on. Overall, though,
compile time regressions dilute over the improvements, but not on a
commit per commit basis. That's what I meant by which hunt.

I think we should keep an eye on those changes, ask for numbers in
code review and even maybe do some benchmarking on our own before
accepting it. Also, we should not commit code that we know hurts
performance that badly, even if we believe people will replace them in
the future. It always takes too long. I myself have done that last
year, and I learnt my lesson.

Metrics are often more dangerous than helpful, as they tend to be used
as a substitute for thinking.

My tuppence.

--renato

Mehdi Amini via llvm-dev

unread,
Mar 31, 2016, 6:34:37 PM3/31/16
to Renato Golin, llvm-dev
Hi Renato,

> On Mar 31, 2016, at 2:46 PM, Renato Golin <renato...@linaro.org> wrote:
>
> On 31 March 2016 at 21:41, Mehdi Amini via llvm-dev
> <llvm...@lists.llvm.org> wrote:
>> TLDR: I totally support considering compile time regression as bug.
>
> Me too.
>
> I also agree that reverting fresh and reapplying is *much* easier than
> trying to revert late.
>
> But I'd like to avoid dubious metrics.

I'm not sure about how "this commit regress the compile time by 2%" is a dubious metric.
The metric is not dubious IMO, it is what it is: a measurement.
You just have to cast a good process around it to exploit this measurement in a useful way for the project.

>> The closest I could find would be what Chandler wrote in:
>> http://reviews.llvm.org/D12826 ; for instance for O2 he stated that "if an
>> optimization increases compile time by 5% or increases code size by 5% for a
>> particular benchmark, that benchmark should also be one which sees a 5%
>> runtime improvement".
>
> I think this is a bit limited and can lead to which hunts, especially
> wrt performance measurements.
>
> Chandler's title is perfect though... Large can be vague, but
> "super-linear" is not. We used to have the concept that any large
> super-linear (quadratic+) compile time introductions had to be in O3
> or, for really bad cases, behind additional flags. I think we should
> keep that mindset.
>
>
>> My hope is that with better tooling for tracking compile time in the future,
>> we'll reach a state where we'll be able to consider "breaking" the
>> compile-time regression test as important as breaking any test: i.e. the
>> offending commit should be reverted unless it has been shown to
>> significantly (hand wavy...) improve the runtime performance.
>
> In order to have any kind of threshold, we'd have to monitor with some
> accuracy the performance of both compiler and compiled code for the
> main platforms. We do that to certain extent with the test-suite bots,
> but that's very far from ideal.

I agree. Did you read the part where I was mentioning that we're working in the tooling part and that I was waiting for it to be done to start this thread?

>
> So, I'd recommend we steer away from any kind of percentage or ratio
> and keep at least the quadratic changes and beyond on special flags
> (n.logn is ok for most cases).


How to do you suggest we address the long trail of 1-3% slow down that lead to the current situation (cf the two links I posted in my previous email)?
Because there *is* a problem here, and I'd really like someone to come up with a solution for that.


>> Since you raise the discussion now, I take the opportunity to push on the
>> "more aggressive" side: I think the policy should be a balance between the
>> improvement the commit brings compared to the compile time slow down.
>
> This is a fallacy.

Not sure why or what you mean? The fact that an optimization improves only some target does not invalidate the point.

>
> Compile time often regress across all targets, while execution
> improvements are focused on specific targets and can have negative
> effects on those that were not benchmarked on.

Yeah, as usual in LLVM: if you care about something on your platform, setup a bot and track trunk closely, otherwise you're less of a priority.

> Overall, though,
> compile time regressions dilute over the improvements, but not on a
> commit per commit basis. That's what I meant by which hunt.

There is no "witch hunt", at least that's not my objective.
I think everyone is pretty enthusiastic with every new perf improvement (I do), but just like without bot in general (and policy) we would break them all the time unintentionally.
I talking about chasing and tracking every single commit were a developer would regress compile time *without even being aware*.
I'd personally love to have a bot or someone emailing me with compile time regression I would introduce.

>
> I think we should keep an eye on those changes, ask for numbers in
> code review and even maybe do some benchmarking on our own before
> accepting it. Also, we should not commit code that we know hurts
> performance that badly, even if we believe people will replace them in
> the future. It always takes too long. I myself have done that last
> year, and I learnt my lesson.

Agree.

> Metrics are often more dangerous than helpful, as they tend to be used
> as a substitute for thinking.

I don't relate this sentence to anything concrete at stance here.
I think this list is full of people that are very good at thinking and won't substitute it :)

Best,

--
Mehdi

Renato Golin via llvm-dev

unread,
Mar 31, 2016, 7:40:54 PM3/31/16
to Mehdi Amini, llvm-dev
On 31 March 2016 at 23:34, Mehdi Amini <mehdi...@apple.com> wrote:
> I'm not sure about how "this commit regress the compile time by 2%" is a dubious metric.
> The metric is not dubious IMO, it is what it is: a measurement.

Ignoring for a moment the slippery slope we recently had on compile
time performance, 2% is an acceptable regression for a change that
improves most targets around 2% execution time, more than if only one
target was affected.

Different people see performance with different eyes, and companies
have different expectations about it, too, so those percentages can
have different impact on different people for the same change.

I guess my point is that no threshold will please everybody, and
people are more likely to "abuse" of the metric if the results are far
from what they see as acceptable, even if everyone else is ok with it.

My point about replacing metrics for thinking is not to the lazy
programmers (of which there are very few here), but to how far does
the encoded threshold fall from your own. Bias is a *very* hard thing
to remove, even for extremely smart and experienced people.

So, while "which hunt" is a very strong term for the mild bias we'll
all have personally, we have seen recently how some discussions end up
in rage when a group of people strongly disagree with the rest,
self-reinforcing their bias to levels that they would never reach
alone. In those cases, the term stops being strong, and may be
fitting... Makes sense?


> I agree. Did you read the part where I was mentioning that we're working in the tooling part and that I was waiting for it to be done to start this thread?

I did, and should have mentioned on my reply. I think you guys (and
ARM) are doing an amazing job at quality measurement. I wasn't trying
to reduce your efforts, but IMHO, the relationship between effort and
bias removal is not linear, ie. you'll have to improve quality
exponentially to remove bias linearly. So, the threshold we're
prepared to stop might not remove all the problems and metrics could
still play a negative role.

I think I'm just asking for us to be aware of the fact, not to stop
any attempt to introduce metrics. If they remain relevant to the final
objective, and we're allowed to break them with enough arguments, it
should work fine.


> How to do you suggest we address the long trail of 1-3% slow down that lead to the current situation (cf the two links I posted in my previous email)?
> Because there *is* a problem here, and I'd really like someone to come up with a solution for that.

Indeed, we're now slower than GCC, and that's a place that looked
impossible two years ago. But I doubt reverting a few patches will
help. For this problem, we'll need a task force to hunt for all the
dragons, and surgically alter them, since at this time, all relevant
patches are too far in the past.

For the future, emailing on compile time regressions (as well as run
time) is a good thing to have and I vouch for it. But I don't want
that to become a tool that will increase stress in the community.


> Not sure why or what you mean? The fact that an optimization improves only some target does not invalidate the point.

Sorry, I seem to have misinterpreted your point.

The fallacy is about the measurement of "benefit" versus the
regression "effect". The former is very hard to measure, while the
latter is very precise. Comparisons with radically different standard
deviations can easily fall into "undefined behaviour" land, and be
seed for rage threads.


> I talking about chasing and tracking every single commit were a developer would regress compile time *without even being aware*.

That's a goal worth pursuing, regardless of the patch's benefit, I
agree wholeheartedly. And for that, I'm very grateful of the work you
guys are doing.

cheers,
--renato

Mehdi Amini via llvm-dev

unread,
Mar 31, 2016, 8:09:36 PM3/31/16
to Renato Golin, llvm-dev

> On Mar 31, 2016, at 4:40 PM, Renato Golin <renato...@linaro.org> wrote:
>
> On 31 March 2016 at 23:34, Mehdi Amini <mehdi...@apple.com> wrote:
>> I'm not sure about how "this commit regress the compile time by 2%" is a dubious metric.
>> The metric is not dubious IMO, it is what it is: a measurement.
>
> Ignoring for a moment the slippery slope we recently had on compile
> time performance, 2% is an acceptable regression for a change that
> improves most targets around 2% execution time, more than if only one
> target was affected.

Sure, I don't think I have suggested anything else, if I did it is because I don't express myself correctly then :)
I'm excited about runtime performance, and I'm willing to spend compile-time budget to achieve these.
I'd even say that my view is that by tracking compile-time on other things, it'll help to preserve more compile-time budget for the kind of commit you mention above.

>
> Different people see performance with different eyes, and companies
> have different expectations about it, too, so those percentages can
> have different impact on different people for the same change.
>
> I guess my point is that no threshold

I don't suggest a threshold that says "a commit can't regress x%", and that would be set in stone.

What I have in mind is more: if a commit regress the build above a threshold (1% on average for instance), then we should be able to have a discussion about this commit to evaluate if it belongs to O2 or if it should go to O3 for instance.
Also if the commit is about refactoring, or introducing a new feature, the regression might not be intended at all by the author!


> will please everybody, and
> people are more likely to "abuse" of the metric if the results are far
> from what they see as acceptable, even if everyone else is ok with it.

The metric is "the commit regressed 1%". The natural thing that follows is what happens usually in the community: we look at the data (what is the performance improvement), and decide on a case by case if it is fine as is or not.
I feel like you're talking about the "metric" like an automatic threshold that triggers an automatic revert and block things, this is not the goal and that is not what I mean when I use of the word metric (but hey, I'm not a native speaker!).
As I said before, I'm mostly chasing *untracked* and *unintentional* compile time regression.


> My point about replacing metrics for thinking is not to the lazy
> programmers (of which there are very few here), but to how far does
> the encoded threshold fall from your own. Bias is a *very* hard thing
> to remove, even for extremely smart and experienced people.
>
> So, while "which hunt" is a very strong term for the mild bias we'll
> all have personally, we have seen recently how some discussions end up
> in rage when a group of people strongly disagree with the rest,
> self-reinforcing their bias to levels that they would never reach
> alone. In those cases, the term stops being strong, and may be
> fitting... Makes sense?
>
>
>> I agree. Did you read the part where I was mentioning that we're working in the tooling part and that I was waiting for it to be done to start this thread?
>
> I did, and should have mentioned on my reply. I think you guys (and
> ARM) are doing an amazing job at quality measurement. I wasn't trying
> to reduce your efforts, but IMHO, the relationship between effort and
> bias removal is not linear, ie. you'll have to improve quality
> exponentially to remove bias linearly. So, the threshold we're
> prepared to stop might not remove all the problems and metrics could
> still play a negative role.

I'm not sure I really totally understand everything you mean.


>
> I think I'm just asking for us to be aware of the fact, not to stop
> any attempt to introduce metrics. If they remain relevant to the final
> objective, and we're allowed to break them with enough arguments, it
> should work fine.
>
>
>> How to do you suggest we address the long trail of 1-3% slow down that lead to the current situation (cf the two links I posted in my previous email)?
>> Because there *is* a problem here, and I'd really like someone to come up with a solution for that.
>
> Indeed, we're now slower than GCC, and that's a place that looked
> impossible two years ago. But I doubt reverting a few patches will
> help. For this problem, we'll need a task force to hunt for all the
> dragons, and surgically alter them, since at this time, all relevant
> patches are too far in the past.

Obviously, my immediate concern is "what tools and process to make sure it does not get worse", and starting with "community awareness" is not bad. Improving and recovering from the current state is valuable, but orthogonal to what I'm trying to achieve.
Another things is the complain from multiple people that are trying to JIT using LLVM, we know LLVM is not designed in a way that helps with latency and memory consumption, but getting worse is not nice.

> For the future, emailing on compile time regressions (as well as run
> time) is a good thing to have and I vouch for it. But I don't want
> that to become a tool that will increase stress in the community.

Sure, I'm glad you step up to make sure it does not happen. So please continue to voice up in the future as we try to roll thing.
I hope we're on the same track past the initial misunderstanding we had each other?

What I'd really like is to have a consensus on the goal to pursue (knowing to not be alone to care about compile time is a great start!), so that the tooling can be set up to serve this goal the best way possible (and decreasing stress instead of increasing it).

Best,

--
Mehdi

Renato Golin via llvm-dev

unread,
Mar 31, 2016, 8:26:54 PM3/31/16
to Mehdi Amini, llvm-dev
On 1 April 2016 at 01:09, Mehdi Amini <mehdi...@apple.com> wrote:
> What I have in mind is more: if a commit regress the build above a threshold (1% on average for instance), then we should be able to have a discussion about this commit to evaluate if it belongs to O2 or if it should go to O3 for instance.
> Also if the commit is about refactoring, or introducing a new feature, the regression might not be intended at all by the author!

Thresholds as trigger for discussion is exactly what I was looking for.

But Chandler goes further (or so I gathered), that some commits are
really bad and could be candidates for reversion before discussion.
Those, more extreme measures, may be justified if, for example, the
commit is quadratic or more in a core part of the compiler, or double
the testing time, etc.

I agree with both proposals, but we have to make sure what goes where,
to avoid (unintentionally) heavy handing other people's work.


> The metric is "the commit regressed 1%". The natural thing that follows is what happens usually in the community: we look at the data (what is the performance improvement), and decide on a case by case if it is fine as is or not.
> I feel like you're talking about the "metric" like an automatic threshold that triggers an automatic revert and block things, this is not the goal and that is not what I mean when I use of the word metric (but hey, I'm not a native speaker!).

I wasn't talking about automatic reversal, but about pre-discussion
reversal, as I mention above.


> As I said before, I'm mostly chasing *untracked* and *unintentional* compile time regression.

That's is obviously good. :)


> I'm not sure I really totally understand everything you mean.

It's about the threshold between what promotes discussion and what
promotes pre-discussion reverts. This is a hard line to draw with so
many people (and companies) involved.


> Sure, I'm glad you step up to make sure it does not happen. So please continue to voice up in the future as we try to roll thing.
> I hope we're on the same track past the initial misunderstanding we had each other?

Yes. :)

Sean Silva via llvm-dev

unread,
Mar 31, 2016, 10:14:08 PM3/31/16
to Renato Golin, llvm-dev
One of my favorite quotes:
"The results are definitely numbers, but do they have very much at all to do with the problem?" - Forman Acton, "Numerical Methods that (usually) Work"

-- Sean Silva
Reply all
Reply to author
Forward
0 new messages