Generating compilation database from GN

1,672 views
Skip to first unread message

Petr Hosek

unread,
May 21, 2018, 4:16:34 PM5/21/18
to gn-dev
I'd like to propose generating compilation database directly from GN.

Today, the compilation database can be generated by Ninja using the compdb tool:

ninja -C out/Default -t compdb cc cxx

However, this approach has several limitations. First, the compilation database isn't being automatically re-generated on GN changes, so it's not sufficient to run Ninja to rebuild the project, you need to rerun the combdb tool on each GN change to update the database. Second, compdb is a fairly low-level tool requiring you to specify the name of Ninja rules which are generated by GN and can be non-obvious (e.g. in Fuchsia we use toolchains to handle things like shared libraries or sanitizers and the rule names are pretty complicated).

The idea I had is similar to CMake's CMAKE_EXPORT_COMPILE_COMMANDS option. This would be implemented as a new export_compile_commands variable in the GN dotfile:

# Enable/Disable output of compile commands during generation.
export_compile_commands = true

When enabled, GN would write compile_commands.json to root_build_dir file on every gn gen run.

Brett Wilson

unread,
May 21, 2018, 4:20:06 PM5/21/18
to Petr Hosek, gn-dev
What is the compilation database used for?

Brett

Roland McGrath

unread,
May 21, 2018, 4:52:58 PM5/21/18
to Brett Wilson, Petr Hosek, gn-dev
It's used to drive a variety of analysis tools that need to do effectively dry runs of compiling all the code or a given source file with the right options, e.g. clang-tidy et al.

Dirk Pranke

unread,
May 21, 2018, 5:16:07 PM5/21/18
to Roland McGrath, Brett Wilson, Petr Hosek, gn-dev
Can you basically make this work via the --json-ide-script arg to `gn gen`?

Roland McGrath

unread,
May 21, 2018, 5:29:42 PM5/21/18
to Dirk Pranke, Brett Wilson, Petr Hosek, gn-dev
I'm not familiar with the exact details of the JSON schema either for compdb or --json-ide-script.  If they contain all the same information then I'd suggest we should be converging on compdb because it's a thing supported by a variety of producers and consumers already.  If the --json-ide-script output contains more information missing form compdb, then it might need to stay a separate format (or perhaps we can get consensus on additions to compdb format).  But they should certainly be closely related.

Peter Collingbourne

unread,
May 21, 2018, 6:09:44 PM5/21/18
to pho...@chromium.org, gn-...@chromium.org
How are you planning to implement export_compile_commands=true? Probably the simplest approach would be to teach GN to collect compiler rule names and then pass them to "ninja -t compdb". (This was always the intent behind the compdb tool: a user wouldn't invoke it directly, instead the generator would invoke it via some mechanism.)

Peter

On Mon, May 21, 2018 at 1:16 PM Petr Hosek <pho...@chromium.org> wrote:

Petr Hosek

unread,
May 21, 2018, 6:29:21 PM5/21/18
to Roland McGrath, Dirk Pranke, Brett Wilson, gn-dev
Compilation Database is being used to drive a lot of different Clang based tools (e.g. clang-tidy, clang-include-fixer) as well as IDE/editor plugins (e.g. YCM, clangd, cquery). --ide=json output has far more detail compared to compdb which is pretty barebones. I think we could implement this via --json-ide-script, i.e. have a Python script that consumes the --ide=json output and transforms it into compile_commands.json. There're two issues: (1) it means passing extra arguments every time you're invoking gn gen which can be annoying, (2) you cannot use --ide=json with any other --ide option, but that might be less of an issue since many IDE's are now moving towards compdb (via LSP).

--
You received this message because you are subscribed to the Google Groups "gn-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gn-dev+un...@chromium.org.

Petr Hosek

unread,
May 21, 2018, 6:36:19 PM5/21/18
to p...@chromium.org, gn-dev
I was planning on writing the compdb directly. There's one issue with invoking Ninja: we don't know what ninja executable to use when invoking gn gen since there's no such thing as CMAKE_MAKE_PROGRAM.

Dirk Pranke

unread,
May 21, 2018, 8:02:47 PM5/21/18
to Petr Hosek, Peter Collingbourne, gn-dev
The JSON data contains things not in the compdb, so I don't think we can get rid of it.

The suggestion to use --json-ide-script was simply to give you the hook that would get called when ninja would re-invoke GN. I agree it's a bit crufty compared to GN generating the compdb directly, but on the other hand I'm not sure how useful this feature really is so it make sense to prototype it to me at least at first w/ the hook and then reassess.

-- Dirk

Roland McGrath

unread,
May 22, 2018, 6:20:16 PM5/22/18
to Dirk Pranke, Petr Hosek, p...@chromium.org, gn-dev
In practical terms I think the main disadvantage of ninja -t compdb is that it needs to be fed a list of rule names that for GN has to be derived from a subset of the list of GN toolchains, which only GN knows in any general way.  The issue of wiring up the action to regenerate the compdb when build.ninja is regenerated is more of a wart that can be figured out some way or another.   OTOH, in the abstract it really ought to be gn gen that emits compdb since it's what decides all that information and ninja is just a "dumb" runner of commands at its behest.

I think there's no question that having reliable compdb generation is a very useful feature.  Important consumers of compdb already exist and being able to integrate some of them directly into a GN build is a worthwhile goal.  

Dirk Pranke

unread,
May 22, 2018, 6:51:07 PM5/22/18
to Roland McGrath, Petr Hosek, Peter Collingbourne, gn-dev
On Tue, May 22, 2018 at 3:20 PM, Roland McGrath <mcgr...@chromium.org> wrote:
In practical terms I think the main disadvantage of ninja -t compdb is that it needs to be fed a list of rule names that for GN has to be derived from a subset of the list of GN toolchains, which only GN knows in any general way.  The issue of wiring up the action to regenerate the compdb when build.ninja is regenerated is more of a wart that can be figured out some way or another.   OTOH, in the abstract it really ought to be gn gen that emits compdb since it's what decides all that information and ninja is just a "dumb" runner of commands at its behest.

I think there's no question that having reliable compdb generation is a very useful feature.  Important consumers of compdb already exist and being able to integrate some of them directly into a GN build is a worthwhile goal.  

While I am aware that the compdb format is relatively standardized, I think one could argue about what the definition of "important consumers" are ... perhaps it would be helpful to actually spell out what some of those integrations are?

In addition, one could make an argument for every IDE and tool integration, and that's why we added the IDE hooks, so that we wouldn't have to actually bake them into the core binary. It would be good to know if the existing hooks are not sufficient for this use case; perhaps we need a slightly different command line interface, or to provide a stock script for compdb generation (and other IDEs)?

Petr Hosek

unread,
May 22, 2018, 7:00:03 PM5/22/18
to Dirk Pranke, Roland McGrath, p...@chromium.org, gn-dev
I can provide my experience: I use YCM with Vim (and more recently I started experimenting with clangd) so running ninja -t compdb regularly is my daily bread and butter. CMake handling of compilation database is significantly nicer in that respect. Recently we've also started implementing Fuchsia-specific Clang-Tidy checks and we would like developers to use these which means they need compilation database to begin with.

We have plenty of developers on the team using Atom and VSCode who are also using clangd or cquery for indexing, completion, etc. so they need compilation database as well and I think these are pretty popular choices among C/C++ developers these days.

I agree that this could be implemented with the IDE hook. The only downside to me is that it makes invoking GN less pleasant because I need to remember to also pass --ide=json --json-ide-script=path/to/script everytime I run gn gen. We could wrap GN in a script to hide that detail, but we spent 2 years working to eliminate all the GN wrappers we had in Fuchsia and I'd like to avoid going back.

Dirk Pranke

unread,
May 22, 2018, 7:15:39 PM5/22/18
to Petr Hosek, Roland McGrath, Peter Collingbourne, gn-dev
I wonder if it would make sense to add a setting to the dotfile for a project so that you could specify a default IDE script for this (or similar things).

I'd guess that we might not want to just generate the compilation database all the time, because it would slow GN down, but someone could run some benchmarks to see what the real impact would be.

-- Dirk

To unsubscribe from this group and stop receiving emails from it, send an email to gn-dev+unsubscribe@chromium.org.

Brett Wilson

unread,
May 22, 2018, 7:40:13 PM5/22/18
to Dirk Pranke, Petr Hosek, Roland McGrath, p...@chromium.org, gn-dev
On Tue, May 22, 2018 at 4:15 PM Dirk Pranke <dpr...@chromium.org> wrote:
I wonder if it would make sense to add a setting to the dotfile for a project so that you could specify a default IDE script for this (or similar things).

I'd guess that we might not want to just generate the compilation database all the time, because it would slow GN down, but someone could run some benchmarks to see what the real impact would be.

I agree we should not be generating this by default. That includes putting it in the dotfile (which means by default for everybody on the project) unless some important part of the build is using it.

Are there examples of other things that might need this that aren't random per-user configuration?

Brett 

Petr Hosek

unread,
Jul 23, 2018, 8:45:12 PM7/23/18
to Brett Wilson, Dirk Pranke, Roland McGrath, p...@chromium.org, gn-dev, Julie Hockett
This is a follow up to the thread I started a while back. Julie Hockett went ahead and tried both approaches discussed before, i.e. generating the compilation database directly from GN vs using the --ide=json --json-ide-script=. The latter quickly turned out to be both more complicated and slower due to the amount of data that's being produced. She later submitted the former as a change to GN in https://gn-review.googlesource.com/c/gn/+/2040 which eventually got merged.

There's at least one point that was raised in the review for that change that remains unresolved which is how to control this option. Brett suggested using a dedicated flag --export-compile-commands which is what is currently implemented but Roland argued that this hampers the usability of this option since for Fuchsia we would like people to use "gn gen ..." without requiring any other flags for baseline normal behavior, and we want everybody to always generate the compilation database so we can have scripts and such rely on it. So for Fuchsia we would like to enable this in .gn or BUILDCONFIG.gn rather than on the command line. Would that be an acceptable change?

Once we agree on what the behavior should be, we'll also add the documentation for this option which is currently missing as pointed out by Dirk on that change.

Dirk Pranke

unread,
Jul 23, 2018, 9:31:18 PM7/23/18
to Petr Hosek, Brett Wilson, Roland McGrath, Peter Collingbourne, gn-dev, Julie Hockett
As long as whatever we implement doesn't introduce a dependency on things that aren't explicit and reproducable (like environment variables), I'd be happy for GN to support project-wide defaults for things that might otherwise be passed by command lines (like --export-compile-commands, --ide and the various ide-specific flags to `gn gen`). It would be nice if we had some design for doing this that didn't create a bunch of globally-scoped names.

It might be nice to have some way to bypass or override said project-wide defaults (along the lines of a --no-export-compile-commands), but I'm not actually sure we need that badly enough and often enough that you couldn't just ask the user to comment out the defaults in the dotfile.

-- Dirk

Brett Wilson

unread,
Jul 23, 2018, 11:55:38 PM7/23/18
to Petr Hosek, Dirk Pranke, Roland McGrath, p...@chromium.org, gn-dev, Julie Hockett
Can you talk about why we need to force a compilation database on for all users of a project? I'm quite unhappy with how long GN takes to run and how Fuchis keeps adding build steps without regard to runtime. On my slowish Macbook it takes 18 seconds to run! This is roughtly half the time it took GYP when we decided it was completely unreasonable

I think if people need to run scripts that require a comp DB, it's perfectly reasonable for them to be required to run something that generates it (I'm quite supportive adding a new command that generates it one-off). If people always want a compilation DB for an IDE or whatnot, the current command line flag should work well (I'm also supportive making command-line flags more sticky across running "gen" explicitly).

Brett

Roland McGrath

unread,
Jul 24, 2018, 10:56:10 PM7/24/18
to Brett Wilson, Petr Hosek, Dirk Pranke, p...@chromium.org, gn-dev, Julie Hockett
For the compilation DB in particular, I think it would be fine to have this be a separate step as long as it's one that we can cleanly integrate into the build (if we want to).  That is, if it's possible (and clean) to write an action target that can run some new gn subcommand to emit the DB, with proper dependencies so that it is regenerated when necessary and not otherwise.  In practice I think you always need to do a build before actually using the compilation DB to do source analysis or whatever anyway, since it may describe generated sources that the analysis needs to see.  So there's no strong reason that `gn gen` has to be the place to do it.

The other thing that is now a switch to `gn gen` and that I'm interested in a project-wide way to make a default for simple `gn gen DIR` invocations is `--check`.  But that too is something that doesn't necessarily have to be part of `gn gen`.  In fact, it might be a better developer workflow for us if the `gn check` step was done some other way that's part of the build.  `gn gen` only needs to be repeated if you've changed GN files, but `gn check` might produce new errors from source file changes that don't involve any GN file changes.  So here too I think I'd be happy with a solution that made it easy to run as needed via a GN action.  That would entail a way for that action to say it depends on every source file in the build, and probably needs `gn check` to produce a depfile so that header changes trigger re-running the action.

For both of these cases, there is a need to specify the deps/inputs to the action in a way that's not currently possible (or clean).
  • For compdb, it's "when re-running gn gen"; that could kludgily be done by depending on the top build.ninja file or something like that, but something cleaner would be better.
  • For check, it's "all sources check would look at", which is just special magic GN would have to support somehow I think.
Both also need a way to find the gn binary to run in lieu of an action script.  GN already figures that out for use in the re-gen rule.  But there would need to be a way to get it into an action command somehow.

So I started out wanting ways to default-enable things like compdb and check that are currently command-line switches to `gn gen`.
But I don't actually have any use cases for default switches per se if we have different approaches that work for the compdb and check cases.

Julie Hockett

unread,
Jul 24, 2018, 11:41:06 PM7/24/18
to Roland McGrath, Brett Wilson, Petr Hosek, Dirk Pranke, p...@chromium.org, gn-dev
Just want to chime in and clarify briefly --you don't actually need to do any sort of build to use the compdb in the normal case. The point of it is to allow for running different types of static analyses on a single file without doing a full build. Generated files and anything produced during the build won't appear in the database, and it doesn't contain anything produced past the `gn gen` stage. 

It's also not quite like `--check` in that the only time the compdb would change would be if a GN file changes. That said, I'm happy to implement it in whatever way(s) we determine to be useful. 

Julie

Brett Wilson

unread,
Jul 25, 2018, 2:01:23 PM7/25/18
to Roland McGrath, Petr Hosek, Dirk Pranke, p...@chromium.org, gn-dev, Julie Hockett
It would indeed be nice to enable running check and other stuff at build-time. On several occasions people have wanted to run "gn desc" as part of various build steps (leaving aside whether that's a good idea).

The main problem currently is that GN has to do a full run to do this stuff, and a full run has side effects. The build can do things like exec_script and write_file that change parts of the build which are definitely a no-no in the middle of the build process itself. We could make write_file have no effects in this mode, but exec_script is not really possible to control and I think will be a recipe for problems.

In an ideal world GN might serialize the build graph and use that for these operations. This would also make interactive "desc" commands run faster which would be nice. But this is a pretty big project that I don't think is worth the effort.

Brett

Dirk Pranke

unread,
Jul 25, 2018, 2:34:52 PM7/25/18
to Brett Wilson, Roland McGrath, Petr Hosek, Peter Collingbourne, gn-dev, Julie Hockett
A different approach that I've often thought about would be to split GN's work into two phases: the first would compute the graph and call exec_script(), but not write anything. The second would writes all the files, and would not run for things like `desc`. 

That would still rely on exec_script() being read-only (and hopefully idempotent, at least for a given version of the checkout), and that you didn't have weird interactions between write_runtime_deps and exec_script(), but I think it's a much smaller change than serializing the graph. I'm not sure what the perf impact would be but hopefully it wouldn't be too bad.

A third approach would be to add a bazel-like daemon mode, where GN stuck around after running. In that case, subsequent reloads and queries would be far faster.

It's unclear which of these changes might be valuable enough, but it would vary with the project. In the chromium try server builds, we call GN twice (3 times if you retry without the patch), with check enabled each time (though it only needs to be enabled once), and that ends up being a half minute of a build. Not terrible, but not insignificant either. Best case, you'd build the graph and write the ninja files once, query the graph once (for analyze), and run check once.

-- Dirk

Petr Hosek

unread,
Jul 25, 2018, 3:01:09 PM7/25/18
to Dirk Pranke, Brett Wilson, Roland McGrath, p...@chromium.org, gn-dev, Julie Hockett
I agree with Brett's complain about GN's slowness in Fuchsia. AFAIK that's mostly due to excessive use of exec_script to do all kinds of things. These generally fall into a few categories 
  1. Generating GN files from's Zircon's Make build in https://fuchsia.googlesource.com/build/+/master/gn/BUILD.gn#78. This is probably the most expensive step. The only solution is to not do that which requires changes to Zircon build that are under way.
  2. Generating the set of targets to build via the "package system" in https://fuchsia.googlesource.com/build/+/master/gn/packages.gni; each package (a JSON file with list of GN labels) describes GN targets that will be built and combined into a final system image, which is the default target that needs to depend on all these targets that aren't known ahead of time. I believe the tagging/metadata feature should allow replacing that part to large extent. 
  3. Basic string manipulations which is heavily used in integration of things like Rust, Dart or FIDL (e.g. https://fuchsia.googlesource.com/build/+/master/cpp/fidl_cpp_stem_from_library.py, https://fuchsia.googlesource.com/build/+/master/dart/fidl_package_from_library.py, https://fuchsia.googlesource.com/build/+/master/dart/fidl_package_from_library.py) to do things like replace('-', '_') and replace('.', '_'). One way to avoid that would be to provide a few basic string manipulation functions (e.g. replace) directly in GN.
Going back to the original point of the compilation database, I'm personally fine having an option that enables generation of compdb as long as that flag is preserved across gn gen reruns. The problem is that it makes the gn gen invocation more complicated, which means that many people will use some kind of wrapper to avoid having to remember to always pass the option.

One possible alternative I could think of would be a support for dotfile so developers could create e.g. .gn file in the root of their project and put extra arguments they'd like to pass to each gn gen invocation there. We would put .gn into .gitignore to make sure people don't accidentally check it into the repository.

--
You received this message because you are subscribed to the Google Groups "gn-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gn-dev+un...@chromium.org.

Nico Weber

unread,
Jul 25, 2018, 3:05:38 PM7/25/18
to Petr Hosek, Dirk Pranke, Brett Wilson, Roland McGrath, Peter Collingbourne, gn-dev, julieh...@google.com
On Wed, Jul 25, 2018 at 3:01 PM Petr Hosek <pho...@chromium.org> wrote:
I agree with Brett's complain about GN's slowness in Fuchsia. AFAIK that's mostly due to excessive use of exec_script to do all kinds of things. These generally fall into a few categories 
  1. Generating GN files from's Zircon's Make build in https://fuchsia.googlesource.com/build/+/master/gn/BUILD.gn#78. This is probably the most expensive step. The only solution is to not do that which requires changes to Zircon build that are under way.
  2. Generating the set of targets to build via the "package system" in https://fuchsia.googlesource.com/build/+/master/gn/packages.gni; each package (a JSON file with list of GN labels) describes GN targets that will be built and combined into a final system image, which is the default target that needs to depend on all these targets that aren't known ahead of time. I believe the tagging/metadata feature should allow replacing that part to large extent. 
  3. Basic string manipulations which is heavily used in integration of things like Rust, Dart or FIDL (e.g. https://fuchsia.googlesource.com/build/+/master/cpp/fidl_cpp_stem_from_library.py, https://fuchsia.googlesource.com/build/+/master/dart/fidl_package_from_library.py, https://fuchsia.googlesource.com/build/+/master/dart/fidl_package_from_library.py) to do things like replace('-', '_') and replace('.', '_'). One way to avoid that would be to provide a few basic string manipulation functions (e.g. replace) directly in GN.
Going back to the original point of the compilation database, I'm personally fine having an option that enables generation of compdb as long as that flag is preserved across gn gen reruns. The problem is that it makes the gn gen invocation more complicated, which means that many people will use some kind of wrapper to avoid having to remember to always pass the option.

One possible alternative I could think of would be a support for dotfile so developers could create e.g. .gn file in the root of their project and put extra arguments they'd like to pass to each gn gen invocation there. We would put .gn into .gitignore to make sure people don't accidentally check it into the repository.

cmake implicitly remembers all flags and settings passed to it. This is imho in practice overly magical and confusing. gn already has args.gn, maybe that could grow some feature for passing flags instead? Then the gen state would remain in that one file.

(I don't have any opinion on anything in this thread. This proposal sounded a bit like the cmake behavior though, which I do have experience with, so I figured I'd chime in. Feel free to ignore :-) )

Roland McGrath

unread,
Jul 25, 2018, 3:08:09 PM7/25/18
to Petr Hosek, Dirk Pranke, Brett Wilson, p...@chromium.org, gn-dev, Julie Hockett
This thread (and even this list) are not really the right place to
discuss the details of Fuchsia's build.
When we on the Fuchsia team conclude that we can improve things by
adding new GN features, we'll bring the feature proposals here to
discuss.
The metadata/tags thread is exactly that happening as part of our
team's plans to address #1 and #2.
For #3 I don't think we have good conclusions yet about how best we'd
like to handle those languages in our build ideally, so we're not
really ready to propose GN changes.

Roland McGrath

unread,
Jul 25, 2018, 3:09:33 PM7/25/18
to Petr Hosek, Dirk Pranke, Brett Wilson, p...@chromium.org, gn-dev, Julie Hockett
This thread (and even this list) are not really the right place to discuss the details of Fuchsia's build.
When we on the Fuchsia team conclude that we can improve things by adding new GN features, we'll bring the feature proposals here to discuss.
The metadata/tags thread is exactly that happening as part of our team's plans to address #1 and #2.
For #3 I don't think we have good conclusions yet about how best we'd like to handle those languages in our build ideally, so we're not really ready to propose GN changes.


Brett Wilson

unread,
Jul 25, 2018, 3:20:56 PM7/25/18
to Dirk Pranke, Roland McGrath, Petr Hosek, p...@chromium.org, gn-dev, Julie Hockett
On Wed, Jul 25, 2018 at 11:34 AM Dirk Pranke <dpr...@chromium.org> wrote:
A different approach that I've often thought about would be to split GN's work into two phases: the first would compute the graph and call exec_script(), but not write anything. The second would writes all the files, and would not run for things like `desc`. 

That would still rely on exec_script() being read-only (and hopefully idempotent, at least for a given version of the checkout), and that you didn't have weird interactions between write_runtime_deps and exec_script(), but I think it's a much smaller change than serializing the graph. I'm not sure what the perf impact would be but hopefully it wouldn't be too bad.

There are some uses of exec_script that just compute stuff and return it to the BUILD.gn file. Other uses configure the build itself, like setup_toolchain.py in the Chrome Windows build that copies some libraries around for the build to use * and* returns a bunch of state. Allowing exec_script means guaranteeing these scripts don't have side effects, which I don't think is practical.

Brett

Dirk Pranke

unread,
Jul 25, 2018, 3:53:17 PM7/25/18
to Brett Wilson, Roland McGrath, Petr Hosek, Peter Collingbourne, gn-dev, Julie Hockett
You're right, but I would argue that setup_toolchain.py having side effects is bad style, for exactly this reason. I don't think most scripts (at least in the chromium build) have side effects. I can't speak to other projects. Whether or not it's practical to eliminate them, I can't say. 

-- dirk
Reply all
Reply to author
Forward
0 new messages