get_metadata() vs phase ordering and dependency graphs

73 views
Skip to first unread message

Roland McGrath

unread,
Nov 6, 2018, 10:26:53 PM11/6/18
to Brett Wilson, Julie Hockett, gn-...@chromium.org
Brett's explanation about how GN's implementation works helped me a great deal in refining my thinking about this.

I'd say that in GN today from the perspective of the user-visible semantics the two "big" phases I described in the doc do exist in the abstract.  That is, the hermetic nature of phase 1 means that in the semantics, phase 1 happens independently for each BUILD.gn and thus "before" anything else (modulo print() and such side-effects, which I'm saying do not exist in the abstract semantics).  Then phase 2 is "instantaneous" because no ordering is observable in the semantics.  The implementation optimizes the reality behind that "observable semantics", but the user doesn't have to think about that except to appreciate that `gn gen` goes fast.  get_metadata() now would complicate that.

I think there is another way to formulate the phase ordering semantics that's closer to how the implementation actually works but that also supports the generality I want in get_metadata().

We think about dependencies in GN as being arcs between labels (targets, configs, toolchains).  But for structuring the work of `gn gen` the most important dependencies are the arcs between build files.  Each build file has a set of label-level dependencies from *deps, *configs, and the toolchain labels inside any of those.  (Here I'm considering //a(tc1) and //a(tc2) to be two independent build files.)  That yields a set of other build files on which this one depends.
  • Each file starts independently with its own phase 1, running the GN expression language program in that one build file.
  • When phase 1 finishes, the build file knows which other build files it depends on.  It kicks them off to independently start their own phase 1 if not already done or in progress.
  • When all the build files that are direct dependencies have finished their phase 1, this build file is ready for "propagation operations".
  • When propagation is resolved, this build file can finally notify its own dependents that its phase 1 is complete and queue its phase 2 (G'ing the N).
  • Currently propagation happens synchronously on the main thread when phase 1 workers finish, but it could happen in parallel with other tasks.
    • Existing propagation operations are per-target, not per-build-file.
      • Collecting switches from configs
      • Collecting link inputs from deps
    • Each target's propagation could theoretically be a separate parallel task.
  • When the root build file reaches phase 2, there is nothing left to do but wait for all the parallel phase 2 tasks to complete.
While we let that ruminate, I'll say some things and then come back around to how it all ties in.

When get_metadata() is used in the inputs/sources of an action() target, this is an exception to the usual constraint that inputs anywhere under $root_build_dir must be outputs of a direct deps dependency of the action() target.  That is a synthetic constraint that doesn't matter to the eventual ninja behavior--all ninja dependencies are at file granularity, and the action's output file depend on all its input files.  AIUI this constraint is imposed by GN as a policy just to make it easier to understand the relationships in the build, not because there is any underlying material constraint.

One of the key features of the get_metadata() proposal is that an invocation of get_metadata() is not tied to any particular target.  It gives its own list of labels to start the walk, and the "walk keys" in the metadata can direct the walk distinctly from the deps/data_deps of each node in the walk.  But another key aspect is that when a metaresult is actually used, its use is context-specific.  So while a get_metadata() call and the metaresult object that represents it are not related to a specific target, most *uses* of a metaresult will be within a particular target.

When a write_file() appears inside a target definition, it's reasonable to construe that as being related to that target.  When a write_file() appears outside any target definition, that's not related to any particular target.

An important use of write_file() for us is not related to any particular target.  We think of the build itself as being an API.  That API is used by consumers of the build, which include infra recipes and other scripts and tools as well as humans.  The `gn gen` step acts as a "factory method" that takes GN build arguments as its parameters and yields an API object we call "the build".  The build can be interrogated by examining JSON files emitted by `gn gen` into $root_build_dir.  These tell the consumer what things the build can do, and what ninja target requests each thing.

I think write_file() is generally desirable.  GN is a source language written by humans and humans want to maintain the "source of truth" for various things in one place, and for some things that place is in GN.  So `gn gen` producing some sort of machine-readable output from the truths humans wrote in .gn files as and "end product" is the natural thing.  It's not at all natural to me to force such things into actions, which for such cases are just make-work at the ninja phase that never needed to happen after the `gn gen` phase.

In fact, I am somewhat bewildered by the preference for response files over write_file().  This is work that is fully decided at `gn gen` time and never changes thereafter.  Yet we have ninja write response files for the things it runs to consume (and then remove them).  `gn gen` should be done much less often than `ninja`.  Work done at `gn gen` is amortized over all the `ninja` runs that follow.  Why isn't it better to do that work only once per gen and then reuse it in repeated ninja runs?  From the perspective of overall efficiency, the preference should be to have `gn gen` write all those response files once and then repeated `ninja` runs just use them.

I'd like to consider visibility vs get_metadata().  The direct list of labels in a get_metadata() call should have their visibility lists checked against the context in which the metaresult is used.  When a metaresult is used inside a target (in its sources, inputs, args or *flags, or script), then that target is the context.  If it's not admitted by the visibility constraints of each directly-listed target label, that should be an error.  When a metaresult is used outside any target, i.e. in a write_file() outside any target, then I think it makes sense to enforce visibility as if this were a reference from some arbitrarily-named target in the same file.  That is, if the reference would be allowed from //a:* then it's OK for a metaresult used in write_file() at top-level in //a/BUILD.gn.  Then, during the walk, at each target visited, the labels listed in the value for a "walk key" should have their visibility lists checked against that target.  If you're not allowed to list a label in your deps, you're not allowed to list it in your metadata.walk_key_foo list either.

Now, to come back around to the original issue.  While only some uses of a metaresult are tied to a target, every use is of course tied to the build file it's in.  I think it's an entirely reasonable constraint to say that get_metadata() calls in a build file cannot add to the set of other build files on which that one depends.  Likewise, the metadata.walk_key_foo lists in targets in some build file cannot add to the set of other build files on which that one depends.  With those constraints, I think the file-centric phase ordering I described above still works for resolving metaresults.


agr...@chromium.org

unread,
Nov 27, 2018, 9:51:03 AM11/27/18
to gn-dev, bre...@chromium.org, julieh...@google.com
Great to hear about your progress here!

Your comment about response files vs write_file() resonated with me. I always run ninja with "-d keeprsp" so that I can reproduce steps that fail.

Nico Weber

unread,
Nov 27, 2018, 10:02:31 AM11/27/18
to Roland McGrath, Brett Wilson, Julie Hockett, gn-...@chromium.org
On Tue, Nov 6, 2018 at 10:26 PM Roland McGrath <mcgr...@chromium.org> wrote:
Brett's explanation about how GN's implementation works helped me a great deal in refining my thinking about this.

I'd say that in GN today from the perspective of the user-visible semantics the two "big" phases I described in the doc do exist in the abstract.  That is, the hermetic nature of phase 1 means that in the semantics, phase 1 happens independently for each BUILD.gn and thus "before" anything else (modulo print() and such side-effects, which I'm saying do not exist in the abstract semantics).  Then phase 2 is "instantaneous" because no ordering is observable in the semantics.  The implementation optimizes the reality behind that "observable semantics", but the user doesn't have to think about that except to appreciate that `gn gen` goes fast.  get_metadata() now would complicate that.

I think there is another way to formulate the phase ordering semantics that's closer to how the implementation actually works but that also supports the generality I want in get_metadata().

We think about dependencies in GN as being arcs between labels (targets, configs, toolchains).  But for structuring the work of `gn gen` the most important dependencies are the arcs between build files.  Each build file has a set of label-level dependencies from *deps, *configs, and the toolchain labels inside any of those.  (Here I'm considering //a(tc1) and //a(tc2) to be two independent build files.)  That yields a set of other build files on which this one depends.
  • Each file starts independently with its own phase 1, running the GN expression language program in that one build file.
  • When phase 1 finishes, the build file knows which other build files it depends on.  It kicks them off to independently start their own phase 1 if not already done or in progress.
  • When all the build files that are direct dependencies have finished their phase 1, this build file is ready for "propagation operations".
  • When propagation is resolved, this build file can finally notify its own dependents that its phase 1 is complete and queue its phase 2 (G'ing the N).
  • Currently propagation happens synchronously on the main thread when phase 1 workers finish, but it could happen in parallel with other tasks.
    • Existing propagation operations are per-target, not per-build-file.
      • Collecting switches from configs
      • Collecting link inputs from deps
    • Each target's propagation could theoretically be a separate parallel task.
  • When the root build file reaches phase 2, there is nothing left to do but wait for all the parallel phase 2 tasks to complete.
While we let that ruminate, I'll say some things and then come back around to how it all ties in.

When get_metadata() is used in the inputs/sources of an action() target, this is an exception to the usual constraint that inputs anywhere under $root_build_dir must be outputs of a direct deps dependency of the action() target.  That is a synthetic constraint that doesn't matter to the eventual ninja behavior--all ninja dependencies are at file granularity, and the action's output file depend on all its input files.  AIUI this constraint is imposed by GN as a policy just to make it easier to understand the relationships in the build, not because there is any underlying material constraint.

One of the key features of the get_metadata() proposal is that an invocation of get_metadata() is not tied to any particular target.  It gives its own list of labels to start the walk, and the "walk keys" in the metadata can direct the walk distinctly from the deps/data_deps of each node in the walk.  But another key aspect is that when a metaresult is actually used, its use is context-specific.  So while a get_metadata() call and the metaresult object that represents it are not related to a specific target, most *uses* of a metaresult will be within a particular target.

When a write_file() appears inside a target definition, it's reasonable to construe that as being related to that target.  When a write_file() appears outside any target definition, that's not related to any particular target.

An important use of write_file() for us is not related to any particular target.  We think of the build itself as being an API.  That API is used by consumers of the build, which include infra recipes and other scripts and tools as well as humans.  The `gn gen` step acts as a "factory method" that takes GN build arguments as its parameters and yields an API object we call "the build".  The build can be interrogated by examining JSON files emitted by `gn gen` into $root_build_dir.  These tell the consumer what things the build can do, and what ninja target requests each thing.

I think write_file() is generally desirable.  GN is a source language written by humans and humans want to maintain the "source of truth" for various things in one place, and for some things that place is in GN.  So `gn gen` producing some sort of machine-readable output from the truths humans wrote in .gn files as and "end product" is the natural thing.  It's not at all natural to me to force such things into actions, which for such cases are just make-work at the ninja phase that never needed to happen after the `gn gen` phase.

In fact, I am somewhat bewildered by the preference for response files over write_file().  This is work that is fully decided at `gn gen` time and never changes thereafter.  Yet we have ninja write response files for the things it runs to consume (and then remove them).  `gn gen` should be done much less often than `ninja`.  Work done at `gn gen` is amortized over all the `ninja` runs that follow.  Why isn't it better to do that work only once per gen and then reuse it in repeated ninja runs?  From the perspective of overall efficiency, the preference should be to have `gn gen` write all those response files once and then repeated `ninja` runs just use them.

To chime in with the ninja perspective here: Response files exist because of ninja's $in and $out variables. Generators don't know the value of $in and $out, so they can't write response files containing references to $in and $out at generator time. If a response file doesn't contain references to built-in ninja variables I agree that it's possible to write them at generator time. (It means that the gn-written response file will likely be cold for the disk cache though, so there might be some build-time hit in addition to the build time speed up won by not having to write these files synchronously on the main loop.)
Reply all
Reply to author
Forward
0 new messages