Exposing C++ headers to the Ninja build plan

15 views
Skip to first unread message

David Turner

unread,
Dec 1, 2025, 6:52:17 PMDec 1
to gn-dev
Hello gn-dev,

I'd like to propose a new .gn flag to ensure that GN writes C++ headers as implicit inputs in the Ninja build plan. And I'd like to ask:

- Any reason why this would be a bad idea (e.g. for incremental correctness or build performance?)

- My prototype uses the name "headers_as_ninja_inputs" for the flag, any better suggestion?

Now for the details:

I recently discovered that GN treats headers listed in C++ targets (e.g. through the "public" and "sources" attributes) in a surprising way:
  • It never writes the header paths as inputs in the Ninja build plan (it relies on compiler-generated depfiles to list them for incremental correctness).

  • It never checks that the path is correct, i.e. the corresponding file exists.
    And GN include checks cannot detect that if that path is never included.
For our work on the Fuchsia build, we need to perform queries over the Ninja graph to find all possible inputs of a tree of dependencies [1]. This must happen *before* the build, so we cannot rely on depfile content. Hence why this is important for us.

I implemented a prototype CL, that shows that our Ninja build plan grows by about 7% when this is enabled, which is for us completely acceptable.

This also allowed us to catch many incorrect header definitions, either simply typos, missing "include/" path prefixes, or more frequently definitions in //third_party/foo/BUILD.gn that drifted significantly from the upstream headers under //third_party/foo/src/... and went unnoticed so far.

Making this optional should avoid breaking other builds (I suspect the Chromium build may suffer from the same issues, but I have not tried to build it with the feature enabled).

We also have an alternative solution that works by adding metadata to each C++ target type (through BUILDCONFIG.gn wrappers) then doing collection at `gn gen` time. However, the result ends up being slightly slower, and much larger, and requires lengthy special processing on top of our Ninja query.

Thanks in advance for your feedback,

- Digit

PS: Yes, I realize this doesn't deal with some cases of generated headers that must remain dynamic / depfile inputs, but we have other strategies to deal with these.

[1] In particular to wrap Ninja sub-build invocations as Bazel action. Since Bazel does not and cannot support depfiles, it must know all possible outputs before the build. We perform a Ninja query in a repository rule to generate a BUILD.bazel containing a filegroup() listing the right files.

Andrew Grieve

unread,
Dec 1, 2025, 8:35:44 PMDec 1
to David Turner, gn-dev
I'm a selfish +1 to this.

I recently worked on using ninja inputs to figure out if non-open-source code is being used, but it lacks the ability to detect internal-only .h files. This proposal would fix that :)

Dirk Pranke

unread,
Dec 1, 2025, 11:00:54 PMDec 1
to Andrew Grieve, David Turner, gn-dev
Am I missing something, or (a) wouldn't these inputs be fairly incorrect unless you actually ran `gn check` as part of the generation, and (b) even then, still be kinda incorrect, since GN doesn't implement a full C preprocessor (and can't handle conditionals, etc. correctly)? It seems like it would be easy to both specify dependencies that weren't there (more commonly) and miss dependencies that were (especially for system headers, or dependencies where gn check was disabled)?

I believe the fact that GN doesn't check if the file exists is intentional. IIRC, in some cases, it can't (e.g., when you #include a file that is generated but hasn't been generated yet), and in some cases when you depend on system headers it might not know how to find them. In addition, I'm guessing there would be a reasonable hit to performance if it tried to do so, since you'd have to search the include path for every header (though perhaps you could cache results and it wouldn't be so bad ...).

I suppose if it's optional those concerns might be okay (i.e., caveat emptor), but I have misgivings about implementing a feature that by design can't be correct.

-- Dirk 

Andrew Grieve

unread,
Dec 2, 2025, 9:50:29 AMDec 2
to Dirk Pranke, Andrew Grieve, David Turner, gn-dev
Wouldn't that argument mean we shouldn't list .h files as sources at all?

David Turner

unread,
Dec 2, 2025, 12:48:39 PMDec 2
to Dirk Pranke, Andrew Grieve, gn-dev
On Mon, Dec 1, 2025 at 8:00 PM Dirk Pranke <dpr...@chromium.org> wrote:
Am I missing something, or (a) wouldn't these inputs be fairly incorrect unless you actually ran `gn check` as part of the generation, and (b) even then, still be kinda incorrect, since GN doesn't implement a full C preprocessor (and can't handle conditionals, etc. correctly)? It seems like it would be easy to both specify dependencies that weren't there (more commonly) and miss dependencies that were (especially for system headers, or dependencies where gn check was disabled)?

I am not sure to understand what you mean by "fairly incorrect" and "kinda incorrect". Maybe you could provide examples?

The main issue is that with a definition that contains a typo like:

source_set("foo") {
  sources = [ "foo.cc", "fooo.h" ],
}
 
Where foo.cc includes "foo.h", and not "fooo.h", the following happens:

- GN include checks see the #include "foo.h" and then ignores that because this file path is not listed in any target definition. The check passes.
- GN ignores "fooo.h" entirely, it does not check that the file exists (where Bazel would immediately error), and does not write it to the Ninja build plan, so the typo is silently ignored.

If the new feature flag is enabled, then fooo.h is written as an implicit input in the Ninja build plan, and Ninja will error immediately telling you that ihis file does not exist.

This only changes GN's behavior for declared headers (from "public" and "sources").

Dirk Pranke

unread,
Dec 2, 2025, 3:09:16 PMDec 2
to Andrew Grieve, David Turner, gn-dev
I'm distinguishing between the ninja build graph and the GN build graph, and I'll distinguish between "incomplete" and "incorrect", in the sense "incomplete" means that not every build dependency is specified, and "incorrect" means that every dependency that *is* specified is correct (should be there) and that no dependency that is essential is missing.

The GN build graph is both incomplete and incorrect because of the C preprocessor limitations discussed earlier, although there are no missing essential dependencies. 

The generated Ninja build graph is initially incomplete in that the header dependencies are missing, but correct in that the fact that those dependencies are missing doesn't matter, because the initial build edges will be out of date regardless and the build will work correctly. Once the depsfiles are generated, the build is then complete as well as correct (+/- subsequent changes to the source).

We've historically considered the level of incompleteness and incorrectness in the GN graph to be acceptable enough, but we have not (as far as I can recall) considered any incorrectness in the Ninja graph to be acceptable, and we expect it to be eventually complete as well.

If I understand the proposal correctly, it (optionally) changes that last statement.

Does that make sense?

-- Dirk

Dirk Pranke

unread,
Dec 2, 2025, 3:36:17 PMDec 2
to David Turner, Andrew Grieve, gn-dev
On Tue, Dec 2, 2025 at 9:48 AM David Turner <di...@google.com> wrote:


On Mon, Dec 1, 2025 at 8:00 PM Dirk Pranke <dpr...@chromium.org> wrote:
Am I missing something, or (a) wouldn't these inputs be fairly incorrect unless you actually ran `gn check` as part of the generation, and (b) even then, still be kinda incorrect, since GN doesn't implement a full C preprocessor (and can't handle conditionals, etc. correctly)? It seems like it would be easy to both specify dependencies that weren't there (more commonly) and miss dependencies that were (especially for system headers, or dependencies where gn check was disabled)?

I am not sure to understand what you mean by "fairly incorrect" and "kinda incorrect". Maybe you could provide examples?

By "fairly incorrect" (in the sense I used above, not in the sense of my other reply to Andrew just now), I mean that without running `check` and actually scanning each individual C++ file, you will have no idea which header files are actually used by which source files in a given target. By "kinda incorrect" I mean that even if you scan each file, you won't know the true set of dependencies for each file: conditional defines and `// nogncheck` will obscure the true dependency graph..


The main issue is that with a definition that contains a typo like:

source_set("foo") {
  sources = [ "foo.cc", "fooo.h" ],
}
 
Where foo.cc includes "foo.h", and not "fooo.h", the following happens:

- GN include checks see the #include "foo.h" and then ignores that because this file path is not listed in any target definition. The check passes.
- GN ignores "fooo.h" entirely, it does not check that the file exists (where Bazel would immediately error), and does not write it to the Ninja build plan, so the typo is silently ignored.

If the new feature flag is enabled, then fooo.h is written as an implicit input in the Ninja build plan, and Ninja will error immediately telling you that ihis file does not exist.

This only changes GN's behavior for declared headers (from "public" and "sources").

Yes, understood. And, I believe that both issues are unfortunately necessary in some situations for some builds.

For example, (IIRC, I haven't tested this recently to be sure), in a Chromium build, C/C++ files on windows will commonly include windows.h (either directly or indirectly), but that file is never specified in a target in GN (and you probably wouldn't want to have to specify *every* system header), and GN wouldn't know where to look for it (because GN doesn't know how to parse the right compiler flags to determine the directory). So, you can't complain about header files that are #included but unknown to GN. You also can't complain that fooo.h is missing (in the file system) because of the generated file issue I mentioned earlier, and you can't complain about `#include <fooo.h>` because you don't know the full header search path and you don't have a perfect C preprocessor.

It's possible that there are builds that are not affected by the limitations of GN's preprocessing, and it's possible that there are builds where every header file (including system headers) are known to GN, and perhaps Fuchsia is one such build, and so perhaps these limitations are not relevant to you.

It might be possible to fix or work around some of these issues by treating user includes separate from system includes (so that you would require GN to know how to find `#include "foo.h"` but not `#include <foo.h>` and by being able to propagate knowledge of which files are expected to be generated. As far as I recall, GN doesn't currently do either of these things. I suspect fixing the latter is doable but we just don't do it now. I'm not sure if fixing the former is totally doable without at least some form of opt-in, since code is often inconsistent over using "" vs <> in includes, and I'm not sure if either or both of these changes would fix all of the issues.

-- Dirk

Matt Stark

unread,
Dec 2, 2025, 11:29:21 PMDec 2
to Dirk Pranke, David Turner, Andrew Grieve, gn-dev, chrome-build-team
Speaking for the chrome build team, we'd love to see this feature. Remote execution requires both complete and correct inputs to the build. We currently achieve this via a bunch of rules, heuristics, and include scanning. Unlike GN, we do have access to a C preprocessor, but (I think, not 100% sure) we mostly don't use it (for performance reasons).

What I would like to see out of this feature is:
  • GN outputs a set of allowed inputs to the ninja file for each action, but does not need to validate their existence (or perhaps only validates them if they're not in the generated directory)
  • Siso would validate that all of those files exist (as you said, some header files may not yet exist in GN). This fixes the "incorrect" builds such as a typo in a header file.
  • Siso would optionally perform some sort of minification process where we determine that not all inputs are actually required, to minimize what we need to send to our remote execution workers
  • Siso would send that build to a remote worker. Siso would be disallowed to send headers that do not exist in the ninja graph. This, in combination with `gn check` (which also runs in our presubmits), would fix the completeness issue. Siso could have an allowlist of exceptions to this rule.
    • Some might be permenant, like "always allow third_party/libc++/* even if they're not specified"
    • Some might be temporary until we fix the build to be truly complete

To unsubscribe from this group and stop receiving emails from it, send an email to gn-dev+un...@chromium.org.


--
Thanks, Matt.

Dirk Pranke

unread,
Dec 3, 2025, 12:39:11 PMDec 3
to Matt Stark, David Turner, Andrew Grieve, gn-dev, chrome-build-team
Just to check my understanding and make sure we're talking about the same things ...

IIRC, remote execution itself does not require "correct" inputs to the build in my sense of the word. While you have to be able to determine every file that is actually needed, it's actually okay (if undesirable) to send files that aren't needed; they will just be ignored. The actual decision whether to build a target (i.e., whether any of the inputs are out of date) is still done client side by siso, and the list of files sent to the remote endpoint does not directly affect that decision. 

Put differently, if a file that siso thinks needs to be sent remotely is out of date but is not actually needed to build the target (as determined by the ninja files and the deps files), that won't cause a rebuild. Is that right?

From what i remember, Siso's preprocessor is used at "ninja" time (to actually help figure out which files to send to the remote endpoint), but it is not used at "gn" time, because GN would have to call out to Siso to do the preprocessing, and that would be so expensive that it wouldn't be worth it (and, of course, GN currently doesn't have the code to anything like that anyway).

I do agree that something like what David is proposing would probably help siso figure out which files don't need to be actually sent over, but that is a different thing from proposing that we actually change the build graph itself. You could achieve something like what you're looking for without requiring that you change the build graph to do so, i.e., what you want is not necessarily David's proposed feature, it could probably be something different but fairly similar. 

-- Dirk

Matt Stark

unread,
Dec 3, 2025, 5:36:24 PMDec 3
to Dirk Pranke, Fumitoshi Ukai, David Turner, Andrew Grieve, gn-dev, chrome-build-team
IIRC, remote execution itself does not require "correct" inputs to the build in my sense of the word. While you have to be able to determine every file that is actually needed, it's actually okay (if undesirable) to send files that aren't needed; they will just be ignored.

Ah, so what you meant by "correct" is what II would describe as "minimal" then? It appears I misunderstood what you meant by correct. Remote execution requires inputs to be complete, but not minimal.
 
The actual decision whether to build a target (i.e., whether any of the inputs are out of date) is still done client side by siso, and the list of files sent to the remote endpoint does not directly affect that decision. 

The decision is made by siso (sort of). I'm not 100% familiar with siso's implementation, but I suspect the list of files sent to the remote endpoint does affect that decision, as I suspect that's what calculates the input hash, which is used for caching.

Put differently, if a file that siso thinks needs to be sent remotely is out of date but is not actually needed to build the target (as determined by the ninja files and the deps files), that won't cause a rebuild. Is that right?

I suspect so, @Fumitoshi Ukai can correct me if I'm wrong. Inputs not sent to the remote executor probably don't contribute to the action digest (read: cache key).

From what i remember, Siso's preprocessor is used at "ninja" time (to actually help figure out which files to send to the remote endpoint), but it is not used at "gn" time, because GN would have to call out to Siso to do the preprocessing, and that would be so expensive that it wouldn't be worth it (and, of course, GN currently doesn't have the code to anything like that anyway).

That's correct, yes (and we have no intent to change this)
 
I do agree that something like what David is proposing would probably help siso figure out which files don't need to be actually sent over, but that is a different thing from proposing that we actually change the build graph itself. You could achieve something like what you're looking for without requiring that you change the build graph to do so, i.e., what you want is not necessarily David's proposed feature, it could probably be something different but fairly similar. 

Ah, I think I may understand what you're concerned about now. Is your concern that with large targets containing many .cc files, every cc file would have a dependency on every header file any of them depend on, thus triggering a rebuild of every one when only one of them actually depends on it? If that's your concern, I hadn't previously considered that, but it's a valid concern and I can brainstorm some solutions with my team.


--
Thanks, Matt.

Dirk Pranke

unread,
Dec 3, 2025, 8:12:44 PMDec 3
to Matt Stark, Fumitoshi Ukai, David Turner, Andrew Grieve, gn-dev, chrome-build-team
On Wed, Dec 3, 2025 at 2:36 PM Matt Stark <ms...@google.com> wrote:
IIRC, remote execution itself does not require "correct" inputs to the build in my sense of the word. While you have to be able to determine every file that is actually needed, it's actually okay (if undesirable) to send files that aren't needed; they will just be ignored.

Ah, so what you meant by "correct" is what II would describe as "minimal" then? It appears I misunderstood what you meant by correct. Remote execution requires inputs to be complete, but not minimal.

Yes.
 
 
The actual decision whether to build a target (i.e., whether any of the inputs are out of date) is still done client side by siso, and the list of files sent to the remote endpoint does not directly affect that decision. 

The decision is made by siso (sort of). I'm not 100% familiar with siso's implementation, but I suspect the list of files sent to the remote endpoint does affect that decision, as I suspect that's what calculates the input hash, which is used for caching.

Put differently, if a file that siso thinks needs to be sent remotely is out of date but is not actually needed to build the target (as determined by the ninja files and the deps files), that won't cause a rebuild. Is that right?

I suspect so, @Fumitoshi Ukai can correct me if I'm wrong. Inputs not sent to the remote executor probably don't contribute to the action digest (read: cache key).

From what i remember, Siso's preprocessor is used at "ninja" time (to actually help figure out which files to send to the remote endpoint), but it is not used at "gn" time, because GN would have to call out to Siso to do the preprocessing, and that would be so expensive that it wouldn't be worth it (and, of course, GN currently doesn't have the code to anything like that anyway).

That's correct, yes (and we have no intent to change this)
 
I do agree that something like what David is proposing would probably help siso figure out which files don't need to be actually sent over, but that is a different thing from proposing that we actually change the build graph itself. You could achieve something like what you're looking for without requiring that you change the build graph to do so, i.e., what you want is not necessarily David's proposed feature, it could probably be something different but fairly similar. 

Ah, I think I may understand what you're concerned about now. Is your concern that with large targets containing many .cc files, every cc file would have a dependency on every header file any of them depend on, thus triggering a rebuild of every one when only one of them actually depends on it? If that's your concern, I hadn't previously considered that, but it's a valid concern and I can brainstorm some solutions with my team.

Yes.

-- Dirk
Reply all
Reply to author
Forward
0 new messages