Strategies for checking in generated files?

57 views
Skip to first unread message

Charles Nicholson

unread,
Oct 19, 2024, 4:21:04 PM10/19/24
to gn-dev
Heya GN folks-

Sometimes I want to check in built artifacts. For example, our GN build sets up python virtual environments, does third-party package installations, etc, and for reproducibility it would be very helpful to have a GN target run a tool that "freezes" a package version manifest into the source tree. We'd check that manifest in so that when we later revisit that particular tag, we can build with the exact third-party package versions from that time.

I'm curious what patterns people use for accomplishing this- of course GN will only allow outputs under the build directory and not in the source tree. There are at least these options:

1. Go behind GN's back and have my tool "tee" the file both to $target_gen_dir and also to the in-source location. Obviously not great: it hurts determinism, gets missed by ninja -t clean, is a build-system antipattern, etc.

2. Wrap ninja in a tool that does the manual copy fix-up (really step 1 but without "cheating" inside of GN). This is cumbersome and adds overall complexity; it's simplest to just use the native "gn" and "ninja" tools. Though, we already do this to copy compile_commands.json from whichever configuration subdirectory that was most recently built into a known location for IDE / LSP support.

3. Require users to invoke a "copy this manifest from $target_gen_dir to the source tree" script manually when they want to update the manifest. This is the most explicit + simplest solution, but it requires that human diligence replace robot tooling, which never scales. Also, we always want the most recent + updated manifest; there's no reason not to update it.

4. Don't burden users at all and instead run step 3 from CI, and do this from the CI clean room build. This is nice and "free" but slows our CI down (each job clone takes ~2-3 minutes) and costs more $ on builds.

I should note that this file isn't used in the build after generation; it's either used from a clean build via a GN arg (rebuilding a historical SHA) or it's an output (generated + checked-in). It's never both in one build.

If any GN users out there do anything like this, I'd love to hear your thoughts. We strive to not lie to GN, but option #1 is looking like the simplest solution for this case, so I figured I'd ask.

Best,
Charles

Nico Weber

unread,
Oct 20, 2024, 1:22:05 PM10/20/24
to Charles Nicholson, gn-dev
I usually write a script that does 3 (both the build invocation, and the copying), and document how to run that script.

Ben Boeckel

unread,
Oct 21, 2024, 6:20:51 PM10/21/24
to Charles Nicholson, gn-dev
On Sat, Oct 19, 2024 at 16:20:49 -0400, Charles Nicholson wrote:
> 3. Require users to invoke a "copy this manifest from $target_gen_dir to
> the source tree" script manually when they want to update the manifest.
> This is the most explicit + simplest solution, but it requires that human
> diligence replace robot tooling, which never scales. Also, we always want
> the most recent + updated manifest; there's no reason not to update it.

Having this but also some "generate from scratch and diff against source
tree state" might be a good compromise. Assuming you can make a target
that depends on the "make generated sources" step that doesn't drag in
too much of the build, it can even be a dedicated CI step separate from
builds in general.

No idea how to do this from GN though.

--Ben

Roland McGrath

unread,
Oct 22, 2024, 3:23:22 PM10/22/24
to Charles Nicholson, gn-dev
https://cs.opensource.google/fuchsia/fuchsia/+/main:build/testing/golden_files.gni and nearby files demonstrate a generalized approach for this (it also has some extra features you may not need to bother with).  The basic model is that there is a checked-in file that should match the results of some generation step in the build (i.e. output of a GN target).  The build action around that generation step (the `golden_files()` target) compares the checked-in file with the just-generated file.  It has two modes: copy the new version back into the source file so you can check it in; or fail the build if the two don't match.  By default the build will generate and check.  If there's a mismatch, the failing step prints the diff and instructions for a `cp` command to update the source manually.  So a developer can just copy&paste that command and restart the incremental build.  Or a developer can set `update_goldens=true` in `args.gn` and just expect that `git status` (or appropriate equivalent) will show a modified source file after the build.

In the Fuchsia build, we use this both for generated things that are actually inputs to the build and for things that the build itself doesn't use at all (like your example).  In either case, the reason is because we want a checkout (or live browsing directly from the repository state) to have the generated files already available without any local build step.  Sometimes that's just because the generation is too slow or otherwise ungainly to have every developer and CI build do every time.  Sometimes it's because we want the generated files to be available as browseable content in the source repository (e.g. documentation extracted from source code).

Note that when taking this approach, it's important to be sure that your CI jobs run with dependencies wired up for the `golden_files()` targets, preferably in a check that prevents failing changes from landing at all.  It's very easy for developers to either fix or ignore (whether manually or automatically with `update_goldens=true`) the generated-file updates and omit them from their commits that change the sources of truth, and accidentally wind up with CI and other developers using something different from the developer who tested their change.

The extra features I mentioned in our implementation include running a normalizing step before comparing.  That is especially useful for things like checked-in source code or other text files, where otherwise it's difficult to coordinate changes to the formatting conventions generated code should meet with other changes to the generation steps or their inputs.  (For example, we have C++ source generation code where we don't bother to get its whitespace formatting quite right and just put it through clang-format as we generate each source file.  But before the normalization feature, that meant that checking in a new version of clang-format or a change to its configuration would have to be done in lock-step with updating all the checked-in generated C++ source files.)

However, I'll also point out that we do have a separate category as well.  That is stuff that is checked in, but is not required to be up to date by the build.  These are cases where we just have a CI job that does a checkout, performs a generation step, and then does an automatic commit if the generated files changed.  This is generally the least-preferred solution.  It's only an option for things like generated documentation where it's acceptable to have perhaps several hours of lag between a source-of-truth commit and the whole checkout becoming fully self-consistent again.  It's chosen only for cases where either the generation step is impractical to require developers to do locally for some reason or there's an explicit intent to avoid the churn of the generated thing changing as often as its sources of truth change (e.g. one big file whose changes require lots of incremental rebuilding but where each change to its source of truth isn't immediately useful or important to most developers).

Charles Nicholson

unread,
Oct 24, 2024, 2:48:13 PM10/24/24
to Roland McGrath, gn-dev
Thanks for all the replies, everyone!

I think the simplest way for me to address this issue is to just hook it up to a CI job that runs weekly and runs a custom script that copies the artifacts from the $target_gen_dir location into the tree, and then opens a pull request if git reports any diffs. The end result is dependabot-ish, and lets us review the changes as if a human had performed them.

Best,
Charles
Reply all
Reply to author
Forward
0 new messages