I have a proposal for improving the V8 build times, which I think are
a big issue for many who want to contribute to V8. For example, if I
make a change to a src/objects/objects.h it takes 18 minutes to
recompile V8 on a laptop. This is with gn, ninja, ccache. This is for
building all of V8 including d8, cctest, unittests.
The only solution I know is to compile .cc files in batches (see below
for why this helps). I have some changes to the gni files that add a
compilation flag, v8_enable_cluster_build so that this happens
automatically. It's giving me a 3x improvement on compile times on
both big desktops and smaller laptops.
There are some .cc files that have global name clashes with each
other. I have a set of CLs (linked off the bug at
https://issues.chromium.org/issues/483903200) that fix these name
clashes. For example
https://chromium-review.googlesource.com/c/v8/v8/+/7562658
An alternative approach, which is in
https://chromium-review.googlesource.com/c/v8/v8/+/7585474 is to have
exclusion-lists of such problematic .cc files and just build these
files in the ordinary way. This reduces the source changes to a
minimum. Even with this approach I'm getting close to 3x speedup on
my workstation, but I would personally prefer to fix the .cc files.
I don't envision having a CI bot for the cluster build. Those of us
who benefit from it would maintain it by either updating the
exclude-lists or fixing issues with name clashes in the .cc files. As
such it would not be much of a burden for Google if they choose not to
use it.
With the change we call out to python from the gni files (at gn gen
time). This only happens if the v8_enable_cluster_build flag is
activated, which I don't expect it to be by default. This costs a few
hundred milliseconds at gn gen time, but this is well worth it since
it only affects those using the cluster build option.
Why compiling .cc files together works:
The root of the problem is that there is a large number of .h files
that get pulled into each .cc file. This means, for example, that each
auto-generated .cc file that is made from a .tq file takes 20 seconds
of CPU to compile, even on a fast modern core. This is true even if
the .cc file is less than 100 lines long. (Clang is single-threaded.)
Over the years the number of .cc files has increased dramatically. For
example the regexp engine was a handful of arch-independent .cc files,
plus one or two arch-dependent files. There are now 24
arch-independent .cc files for the regexp engine.
I hear there have been some attempts to improve the situation with .h
files, so that a smaller number of them get pulled into a given .cc
compilation. This is complicated by the fact that some optimization
decisions are based on the compiler seeing a .h (or -inl.h) file that
may not be necessary for correctness. So reducing the number of .h
files in a compilation causes performance regressions that cannot be
entirely fixed with PGO.
I'm still in favour of fixes to the .h files to improve compile times,
but given that people have been trying to do this for some years I
don't think that should be a blocker for a different approach that
actually works now.
Let me know what you think.
--
Erik Corry