Compile time analysis and experiment for Source/core/html

88 views
Skip to first unread message

Daniel Bratell

unread,
Mar 6, 2017, 12:59:47 PM3/6/17
to blink-dev
I mentioned in a mail earlier that compile times can seem a bit long at times. I analyzed a random, small, file in Blink: core/html/track/VideoTrack.cpp which should on the surface be very easy to compile and yet needs a couple of seconds.

Turns out that the whole compilation unit is about 180,000 lines after preprocessing in clang (a bit longer in gcc) and the compilation time is more or less proportional to the number of lines.





So of 180,000 lines, everything but 65 lines are headers.
0.04% is the cpp file, 99.96% are headers.

There is nothing special with the headers, just the normal mix you get when you include a lot of them:



It is not very efficient to compile 180,000 lines of headers to compile a few dozen lines of "real" code, so there are already some ideas how to work around this. One idea is to precompile headers so that, while 180,000 lines, they take less time to compile.  

Another idea is to increase those 66 lines to something larger so that instead of 0.04%  it's 1%, 10% or even 60% of the code. In an existing code base like Blink you can easily do that by merging different cpp files in a wrapper file.

As a testing ground I choose the core/html folder and got these results:



CPU time went from 25.5 minutes to 1.0 minutes, a speedup of 25x. The wall time speedup was less, but only because I didn't use all cores anymore. If this is applied to several places the wall time improvement would become similar.

I repeated the PoC change for layout, svg, dom and css making it 5 different large files (still not enough to fill all logical cores) and got this improvement:


In CPU time it is 5 minutes instead of 76 minutes. In wall time  it is 1 minute instead of 10 minutes.

For the record, here is a similar graph as the first one for all of Source/core/html:



The downside to merging classes like this is that if you only want to compile a single class (assuming you just changed the cpp file) you will now have to compile more than one. In the example with mergine 250 files into one, you might lose 40-50 seconds compared to the fastest possible turnaround. Smaller groupings and the difference will of course be less.

I think this is something worth trying, but primarily I wanted to check how large the improvement seemed to be and how large effort was required. The effort was very small, but then I handled the handful of name collisions mostly by giving one instance a dummy suffix. A proper fix should have been deduplication or better chosen names.

/Daniel

[1] Proof-of-concept code is available to see at https://codereview.chromium.org/2732033002/

--
/* Opera Software, Linköping, Sweden: CET (UTC+1) */

Nico Weber

unread,
Mar 6, 2017, 1:27:31 PM3/6/17
to Daniel Bratell, blink-dev
Any guesses on why blink files build much slower than e.g. files in content/, which also use C++ and Linux headers (the largest chunk)?

"Build time is proportional to number of lines" is surprising to me and doesn't match data I've collected in the past (http://amnoid.de/notes/2012/03/viz.html).

Jeremy Roman

unread,
Mar 6, 2017, 3:22:05 PM3/6/17
to Nico Weber, Daniel Bratell, blink-dev
I wonder what the impact on the generated code of this approach is. It would seem to provide more inlining/optimization opportunities to the compiler (because more function definitions are available in the same TU) before LTO. Is the resulting binary bigger and/or faster?

Fernando Serboncini

unread,
Mar 6, 2017, 4:24:21 PM3/6/17
to Jeremy Roman, Nico Weber, Daniel Bratell, blink-dev
Do you have a tool to generate the header size and split among modules or did you do it by hand? If you have a script, it would be nice to share. :)

bruce...@chromium.org

unread,
Mar 6, 2017, 8:02:28 PM3/6/17
to blink-dev, jbr...@chromium.org, tha...@chromium.org, bra...@opera.com, fs...@google.com
Daniel, thanks for doing these tests. I have been meaning to do this exact work and it looks like you've done a great job.

> It is not very efficient to compile 180,000 lines of headers to compile a few dozen lines of "real" code

Yep, that is the fundamental problem. Tiny .cpp files are incredibly inefficient to compile - you end up with 99.999% overhead.

> In an existing code base like Blink you can easily do that by merging different cpp files in a wrapper file.

The standard name for this technique is "Unity builds" or "Single Compilation Unit (SCU)". They are popular in game development. Some developers like to take this to an extreme and include all source files in a single source file. That tends to cause a whole host of new problems, however less extreme versions of this technique work extremely well.

It's worth noting that if you reduce the number of compilation units you not only improve compile times, you also improve link times because you get rid of some of the redundant information in object files. This means that there is a sweet spot where combining of source files improves both full build times and incremental build times. And, as Jeremy said, unity builds give a certain amount of cross-TU inlining, so non-LTO builds will often have slightly better performance.

There are some aspects of unity builds that I find quite distasteful. #including a .cpp file feels like a violation of the natural order and can be confusing. Taking it too far (#including hundreds of files into one) can make incremental build times slower, but the potential benefits of doing some level of unity builds are so huge that it seems well worth discussing.

The obvious alternative is to manually merge the contents of source files. This gives the same benefits of a unity build while avoiding "#include <blah.cpp>", and it ensures that the visible structure cleanly maps to how we compile the code. However I'm not sure that we could merge files enough to really help - we'd have to at least halve the number of Blink source files in order to start making a difference. Ideally we want to do much more than just halve the number of source files.

Bruce

Daniel Bratell

unread,
Mar 7, 2017, 5:50:00 AM3/7/17
to Nico Weber, blink-dev
I think your data supports this. Your graphs have a distinct "up and to the right" pattern, though there are large variations. In there are no large compilation units that compile fast (pch excluded), and only a few small compilation units that compile slow (triggering some other kind of slowdown).

Looking at a random file in content, content/browser/child_process_launcher.cc, it is 110,000 lines long preprocessed, about half of the ones I looked at in Blink, and about half the number of headers (370 instead of 740). It compiles roughly twice as fast as the Blink file I tested (1.0s vs 2.3s).

Below is the breakup of content. Both end up including about the same amount of library and system headers but the Blink one also includes 70k lines from Blink's own headers.

content/browser/child_process_launcher.cc


third_party/WebKit/Source/core/html/track/VideoTrack.h



/Daniel
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Daniel Bratell

unread,
Mar 7, 2017, 5:53:16 AM3/7/17
to Nico Weber, Jeremy Roman, blink-dev
With LTO it doesn't matter as much but I have looked at this in the past and for gcc it does have an impact because of gcc's (version 4 at the time) per compilation unit limits. So you got less inlining for instance, which can be both good and bad. For clang there were only small changes but whether it was for the better or worse I couldn't say.

So it's hard to say. I would expect no visible changes except possibly at micro benchmarks and whether they would be faster or slower I don't want to guess.

/Daniel

Daniel Bratell

unread,
Mar 7, 2017, 7:59:05 AM3/7/17
to blink-dev, bruce...@chromium.org, jbr...@chromium.org, tha...@chromium.org, fs...@google.com
On Tue, 07 Mar 2017 02:02:27 +0100, <bruce...@chromium.org> wrote:

There are some aspects of unity builds that I find quite distasteful. #including a .cpp file feels like a violation of the natural order and can be confusing. Taking it too far (#including hundreds of files into one) can make incremental build times slower, but the potential benefits of doing some level of unity builds are so huge that it seems well worth discussing.

The obvious alternative is to manually merge the contents of source files. This gives the same benefits of a unity build while avoiding "#include <blah.cpp>", and it ensures that the visible structure cleanly maps to how we compile the code. However I'm not sure that we could merge files enough to really help - we'd have to at least halve the number of Blink source files in order to start making a difference. Ideally we want to do much more than just halve the number of source files.

One approach I have considered is having this done by gn. You create source lists in gn and through some flag you hint that they should be compiled together. Then gn creates a build step that both puts all the files in the same compilation unit, inserts #line directives when needed and creates a rule to compile that mega-file.

The only downside there would be that you wouldn't be able to see the code that will compile so errors can seem magic unless you know what gn does.

Of course I don't know if this is something gn could easily do but it seems like it would be possible and useful for others beside Chromium.

/Daniel

Dirk Pranke

unread,
Mar 7, 2017, 11:46:56 AM3/7/17
to Daniel Bratell, blink-dev, Bruce Dawson, Jeremy Roman, Nico Weber, Fernando Serboncini
Yes, having GN get involved is the first thing that crossed my mind yesterday when I read this. It seems like it'd be an interesting project for someone to work on and could make a significant difference in build time.

-- Dirk

Joe Mason

unread,
Mar 7, 2017, 11:58:58 AM3/7/17
to Daniel Bratell, blink-dev, Bruce Dawson, Jeremy Roman, Nico Weber, Fernando Serboncini
If you are going to concatenate files automatically, you'd also want to translate error messages back to the filename:linenum before displaying them. Hopefully that'd take care of most of the "magic" errors.

Lucas Gadani

unread,
Mar 7, 2017, 12:41:29 PM3/7/17
to bruce...@chromium.org, blink-dev, jbr...@chromium.org, tha...@chromium.org, bra...@opera.com, fs...@google.com
On Mon, Mar 6, 2017 at 8:02 PM <bruce...@chromium.org> wrote:
The standard name for this technique is "Unity builds" or "Single Compilation Unit (SCU)". They are popular in game development. Some developers like to take this to an extreme and include all source files in a single source file. That tends to cause a whole host of new problems, however less extreme versions of this technique work extremely well.

It's worth noting that if you reduce the number of compilation units you not only improve compile times, you also improve link times because you get rid of some of the redundant information in object files. This means that there is a sweet spot where combining of source files improves both full build times and incremental build times. And, as Jeremy said, unity builds give a certain amount of cross-TU inlining, so non-LTO builds will often have slightly better performance.

I've worked with a system in the past that would remove the .cpp files that were modified from the unity build, and compile just that unit separately. This way you get best of both worlds, as your incremental builds gets better, as long as you have the assumption that you are changing just a few .cpp files multiple times as you work, which is very reasonable.

Hans Wennborg

unread,
Mar 7, 2017, 1:24:34 PM3/7/17
to Daniel Bratell, blink-dev
On Mon, Mar 6, 2017 at 9:59 AM, Daniel Bratell <bra...@opera.com> wrote:
I mentioned in a mail earlier that compile times can seem a bit long at times. I analyzed a random, small, file in Blink: core/html/track/VideoTrack.cpp which should on the surface be very easy to compile and yet needs a couple of seconds.

Turns out that the whole compilation unit is about 180,000 lines after preprocessing in clang (a bit longer in gcc) and the compilation time is more or less proportional to the number of lines.

This is the classic C++ compile-time problem.

It has two parts:
1) Parsing.
The same large set of headers gets pre-processed and parsed in many source files. There are solutions for this though; PCH, and even better, modules (https://clang.llvm.org/docs/Modules.html) allow the compiler to only deserialize the parts that are needed from a previously created AST. Don't we already use PCH when building webkit? Does that not help here?

2) Codegen.
With functions defined in headers, especially in templates, the same code will get codegenned multiple times, only for the linker to discard all but one copy. This is obviously a lot of redundant work and I don't think there's a good solution besides restructuring the code.

I suspect it's mostly 1) that's the problem in your example though, and in slow compile-times for Blink in general, and PCH should help.

Unity-builds, or merging some source files (automatically or not) sounds like a bad idea to me. Build parallelism is generally a good thing, and merging source files would reduce it.

Cheers,
Hans

Dirk Pranke

unread,
Mar 7, 2017, 1:33:32 PM3/7/17
to Hans Wennborg, Daniel Bratell, blink-dev
On Tue, Mar 7, 2017 at 10:24 AM, Hans Wennborg <ha...@chromium.org> wrote:
On Mon, Mar 6, 2017 at 9:59 AM, Daniel Bratell <bra...@opera.com> wrote:
I mentioned in a mail earlier that compile times can seem a bit long at times. I analyzed a random, small, file in Blink: core/html/track/VideoTrack.cpp which should on the surface be very easy to compile and yet needs a couple of seconds.

Turns out that the whole compilation unit is about 180,000 lines after preprocessing in clang (a bit longer in gcc) and the compilation time is more or less proportional to the number of lines.

This is the classic C++ compile-time problem.

It has two parts:
1) Parsing.
The same large set of headers gets pre-processed and parsed in many source files. There are solutions for this though; PCH, and even better, modules (https://clang.llvm.org/docs/Modules.html) allow the compiler to only deserialize the parts that are needed from a previously created AST. Don't we already use PCH when building webkit? Does that not help here?

I believe we do use PCH and it helps but it's still an issue. And, unfortunately, we don't yet have modules and I don't know when we will.
 

2) Codegen.
With functions defined in headers, especially in templates, the same code will get codegenned multiple times, only for the linker to discard all but one copy. This is obviously a lot of redundant work and I don't think there's a good solution besides restructuring the code.

I suspect it's mostly 1) that's the problem in your example though, and in slow compile-times for Blink in general, and PCH should help.

Unity-builds, or merging some source files (automatically or not) sounds like a bad idea to me. Build parallelism is generally a good thing, and merging source files would reduce it.

This is a fair point. You'd have to figure out the tradeoff between compiling TUs in parallel but re-compiling all of the headers each time, and compiling a single TU. 

I expect the latter beats the former most of the time, and by a pretty wide margin. And, assuming you have one merged file per target, we still have many many more targets than cores (or even parallel goma jobs).

The numbers Daniel cites (and that Bruce and others have seen before) sound like a pretty compelling case to at least do more experimenting.

-- Dirk

Bruce Dawson

unread,
Mar 7, 2017, 2:07:36 PM3/7/17
to Dirk Pranke, Hans Wennborg, Daniel Bratell, blink-dev
> Build parallelism is generally a good thing, and merging source files would reduce it.

Right now we have ~30,000 build steps in order to build Chrome. We are a long way from running out of build parallelism.

When I look at .ninja_log files (using ninjatracing) the times when I see us running out of parallelism are when linking. Conveniently enough, unity builds (or otherwise reducing the number of translation units) improves linking speeds. As long as we don't go too crazy (say, 500 translation units total) we shouldn't run out of build parallelism.

--
You received this message because you are subscribed to a topic in the Google Groups "blink-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/chromium.org/d/topic/blink-dev/X7BHlt40RDk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to blink-dev+unsubscribe@chromium.org.

Hans Wennborg

unread,
Mar 7, 2017, 2:21:30 PM3/7/17
to Bruce Dawson, Dirk Pranke, Daniel Bratell, blink-dev
On Tue, Mar 7, 2017 at 11:06 AM, Bruce Dawson <bruce...@chromium.org> wrote:
>> Build parallelism is generally a good thing, and merging source files
>> would reduce it.
>
> Right now we have ~30,000 build steps in order to build Chrome. We are a
> long way from running out of build parallelism.
>
> When I look at .ninja_log files (using ninjatracing) the times when I see us
> running out of parallelism are when linking. Conveniently enough, unity
> builds (or otherwise reducing the number of translation units) improves
> linking speeds. As long as we don't go too crazy (say, 500 translation units
> total) we shouldn't run out of build parallelism.

The ~30k build steps are for a full build though. For incremental
builds, the number of steps is often much smaller, and having some of
those steps be merged files (which would have more dependencies and be
more likely to require re-build) sound like a potential loss.

Linkers are good at dealing with redundant symbol definitions (it
doesn't need to read a definition into memory unless it's selected to
be part of the output), so I wouldn't expect unity builds to make a
big difference.

I'm not saying this isn't worh investigating, but I'm worried about
reducing parallelism and increasing complexity.
>> blink-dev+...@chromium.org.
>
>

Bruce Dawson

unread,
Mar 7, 2017, 2:28:49 PM3/7/17
to Hans Wennborg, Dirk Pranke, Daniel Bratell, blink-dev
> Linkers are good at dealing with redundant symbol definitions

Linkers are effective at dealing with redundant symbol definitions, but it costs time. I have seen enormous improvements in link times from unity builds

For incremental builds, the number of steps is often much smaller

Agreed. If we merge too many source files together then the compilation times for individual translation units may be too great. So let's not do that.

Your concerns are legitimate, and I have seen teams go too far (Let's have one translation unit for the entire game!!!) but the concerns are avoidable, simply by not going too crazy on file-merging. I suspect we could do 10:1 merging of source files without it making more than a 10-20% difference in the time to compile a single translation unit. This would give us a huge (5x?) reduction in full-build compile times, a slight reduction in link times, and would probably be a wash for incremental builds.

100:1 merging of source files might start hitting your concerns, so let's not do that. Rather, let's measure and make sure we don't harm incremental builds - they are crucial.


Hans Wennborg

unread,
Mar 7, 2017, 2:51:01 PM3/7/17
to Bruce Dawson, Dirk Pranke, Daniel Bratell, blink-dev
On Tue, Mar 7, 2017 at 11:28 AM, Bruce Dawson <bruce...@chromium.org> wrote:
>> Linkers are good at dealing with redundant symbol definitions
>
> Linkers are effective at dealing with redundant symbol definitions, but it
> costs time. I have seen enormous improvements in link times from unity
> builds
>
>> For incremental builds, the number of steps is often much smaller
>
> Agreed. If we merge too many source files together then the compilation
> times for individual translation units may be too great. So let's not do
> that.
>
> Your concerns are legitimate, and I have seen teams go too far (Let's have
> one translation unit for the entire game!!!) but the concerns are avoidable,
> simply by not going too crazy on file-merging. I suspect we could do 10:1
> merging of source files without it making more than a 10-20% difference in
> the time to compile a single translation unit. This would give us a huge
> (5x?) reduction in full-build compile times, a slight reduction in link
> times, and would probably be a wash for incremental builds.
>
> 100:1 merging of source files might start hitting your concerns, so let's
> not do that. Rather, let's measure and make sure we don't harm incremental
> builds - they are crucial.

The textual include problem is something Google's compiler team has
thought a lot about, and they're pushing Modules as the solution (with
apologies for the internal link, see go/ti-scalable-cpp-builds esp.
slide 11 and forward).

As Clang is coming to more of Blink's supported targets, Modules will
become an available option for us too.
>> >> blink-dev+...@chromium.org.
>> >
>> >
>
>

Daniel Bratell

unread,
Mar 8, 2017, 8:29:26 AM3/8/17
to 'Joe Mason' via blink-dev, Joe Mason, Bruce Dawson, Jeremy Roman, Nico Weber, Fernando Serboncini
On Tue, 07 Mar 2017 17:58:52 +0100, 'Joe Mason' via blink-dev <blin...@chromium.org> wrote:

If you are going to concatenate files automatically, you'd also want to translate error messages back to the filename:linenum before displaying them. Hopefully that'd take care of most of the "magic" errors.

The #line directive takes care of such things so that is no worry.
Reply all
Reply to author
Forward
0 new messages