Proposal for user-defined GN functions

David Turner

unread,

May 13, 2020, 8:14:36 AM5/13/20

to gn-dev

As far as I know, in both the Chromium and Fuchsia GN builds, there are a number of places where certain computations need to be exactly replicated between different target definitions.

The most obvious example is the location of output files determined by tool() instances, which need to be carefully replicated in other templates like compiled_action(), but I remember other examples in the Chromium for Android build.

This is generally performed by copy-pasting GN fragments, which is fragile over time, and always obvious when trying to maintain some critical build rules.

I would like to propose a way to define a GN user-defined "function" with the following properties:

User functions are, like templates, a new "item type" (as defined in the GN source code), i.e. they are not values (so no real lambdas, which is fine for me), and should be defined in .gni files. One could declare one with a syntax like:

function("my_function_name") {
...
}

Any other keyword (e.g. "user_function") would work, I don't have any preference.

User functions are purely functional: i.e. they take a scope as input (available as "invoker" within the function's body), and return a scope computed from it. The latter is to make it easy to compose function calls, and simplify function bodies (see below).

They should not be able to modify the caller's scope, or access global state.

This last point implies that forward_variables_from(invoker, [ "fields" ]) should only lookup in the invoker scope, not any parent scope from the caller. I believe this can be implemented easily by ensuring that the gn::Scope value passed at runtime as "invoker" doesn't have a parent scope value, unlike what happens for template invocation.

This probably implies that many GN functions should _not_ be allowed inside user function bodies (e.g. action(), static_library(), read_file() / write_file()), though we would still want string_split(), get_target_outputs(), get_label_info(), etc...

I think doing this distinction properly in the implementation (and documenting it) would be the most complicated part of implementing this feature properly.

Calling a function could be performed with a new built-in function, e.g. call(), which returns a scope value, as in:

result = call("my_function_name", <input-scope>)

This allows composing function calls, as in this fictious example:

# Computes the euclidian length of a 2D vector
#
# Parameters
# x: [integer] Horizontal coordinate
# y: [integer] Vertical coordinates
#
# Result
# value: [integer] The value of x*x + y*y
#
function("length_squared") {
value = call("sum",
call("square", {
x = invoker.x
}), call("square", {
x = invoker.y
})
)
}

For returning results from a function body, I can see two different possibilities:

A) Any non-private variable defined in the body is returned in the result scope, as in:

# Returns a scope with two keys |path| and |rebased_path|
function("my_function") {
_private_var = ....
path = ....
rebased_path = ....
}
B) Use of a hard-coded result variable that must be a scope as in:

# Returns a scope with two keys |path| and |rebased_path|
function("my_function") {
internal_var = ...
# The function's result is hard-coded to 'result' which must be defined in the body.
result = {
path = ...
rebased_path = ...
}
}

Does all this sounds reasonable?

- Digit

Nico Weber

unread,

May 13, 2020, 9:18:06 AM5/13/20

to David Turner, gn-dev

As background on the write-up below, I've written a GN build for the LLVM project from scratch, and I've worked on Chrome's build a lot – both writing lots of GN code, and reviewing changes to the build and answering questions about it.

Based on my experience, most people don't complain that GN isn't expressive enough. On the other hand, GN is already somewhat complicated and not super easy to learn. (A small handful of people very familiar with the build would use this. This will lead to situations where some GN files are written by experts and are inscrutable by everyone else, similar to some template metaprogramming header files.)

IMHO, if you need functions it's a sign you're doing too much stuff at GN time. (I do think the Chrome/Android build is at least close to the "too much GN" bar; the other platforms are better.)

From what I understand, it has historically been a non-goal of GN to be a "real" programming language. The idea was that if you need to do things you need a real programming language for, you should do that work in a real programming language that you call from GN.

I'd put issues I see with this proposal (and the spirit behind it) in 3 categories:

1. Understandability. People need to learn GN, and the more features GN has the harder this is.

(Also, for configuration there's this general tradeoff between wanting small and factored configuration descriptions so that they can be written well (at the cost of being "non-obvious"), and between wanting regular, explicit, repetitive configurations so that things can be easily read and modified by tools.)

2. Expressivity. Build config languages usually try to be less expressive in a computational sense. For example, in your proposal you allow recursion? Then GN is suddenly Turing-complete. (exec_script() can also run arbitrary code, but it can be put behind a whitelist.)

3. Implementation complexity. Next steps in this direction are a bytecode interpreter, a JIT, maybe some garbage collector –this seems complex and like the wrong direction for GN to me.

So I'd personally prefer to not add this, or things like it. (Then again, I feel the same about some other things that were added.)

Nico

Mirko Bonadei

unread,

May 13, 2020, 9:59:27 AM5/13/20

to gn-dev, di...@google.com

I add my two cents on this and I also would personally prefer not to add this feature.

It is understandable to find duplication painful but in a build system I think it is more acceptable (you can write tools to deal with it if you are aware of it) and it might also have its own value. For a build system for example, being easy to understand is a feature (since it is considered a commodity) and adding user-defined functions may just result is less local duplication but more ways of doing the same thing in a big codebase which in turns requires contributors to go one step further to understand the function, while some duplication is more straightforward.

Alfred Zien

unread,

May 13, 2020, 10:53:57 AM5/13/20

to Nico Weber, David Turner, gn-dev

Concept of function is widely understood along programmers, and imo, GN is somewhat unique in that sense, as you need to show some imagination and creativity to express constructs that are trivial in "real" programming language. Don't give me wrong, it's actually fun sometimes and forces you to look at simple problem from different angles :) Although, I highly doubt that those constructs are easier to understand than more common concepts as functions, while loops, etc.

About moving logic to "real" language – as hard as I tried to put every bit of logic to build time instead of gen time, it fails a lot of times, and as I understand, it is not only me, chromium code has very complex templates also (and those were written by people way smarter than me). exec_script() is no go for templates that are called hundreds of times.

Given that there are already techniques to make function-like constructs in GN (by continuation-passing style or just by writing result to file and reading it immediately), I will argue that adding a clearer solution for such a basic thing will actually improve understandability, and won't introduce a lot of complexity to implementation.

-- Alfred

--
To unsubscribe from this group and stop receiving emails from it, send an email to gn-dev+un...@chromium.org.

Brett Wilson

unread,

May 13, 2020, 11:24:05 AM5/13/20

to David Turner, gn-dev

On Wed, May 13, 2020 at 5:14 AM 'David Turner' via gn-dev <gn-...@chromium.org> wrote:

As far as I know, in both the Chromium and Fuchsia GN builds, there are a number of places where certain computations need to be exactly replicated between different target definitions.

The most obvious example is the location of output files determined by tool() instances, which need to be carefully replicated in other templates like compiled_action(), but I remember other examples in the Chromium for Android build.

Can you go into more detail about this problem? I was not aware of a particular issue around this.

Brett

Brett Wilson

unread,

May 13, 2020, 12:15:12 PM5/13/20

to David Turner, gn-dev

I agree that the Chrome Android and Fuchsia builds are not so good. But I don't think the answer is to keep adding features to GN to allow ever more complex concepts to be expressed more easily. Some features are certainly useful and good, but GN was designed to be as simple as possible. An explicit design goal was to make it impossible to write interesting build files. We've clearly failed at this, but it's still a goal to keep in mind.

I think if you want a real scripting language and all of the associated complexity and features, you fundamentally want a different build system and should move to cmake or something. GN's design does not scale well to these features.

A hint that this design exceeds the GN language capabilities are the hoops you had to jump through with new "call" functions, the magic "result" variable, and how the function parameters are packed into an "invoker" variable. Nobody would design a programming language like this and we shouldn't allow a slow addition of features to accidentally back us into something like that.

If we want to fix the Chrome Android build, I think the way to do this is to stand back and think how it should actually work from a high level and how it should fit in GN's model. We would not come up with the current design of 10 levels of templates that write files that allow the build graph to be reconstructed at build time by some scripts (or whatever it does). But I don't have enough knowledge about the Android toolchain to propose what it should do.

Brett

Petr Hosek

unread,

May 13, 2020, 1:59:31 PM5/13/20

to Brett Wilson, David Turner, gn-dev

I believe what David is referring to is reconstructing the name of the output. There's get_target_output() but that can only be used for action() and copy() targets (and only within the same BUILD.gn file). To workaround this limitation, many build systems would manually reconstruct the output filename, and this logic is often duplicated in many places. This is fairly common when passing the build outputs to external scripts, e.g. to build distribution packages, see for example: https://fuchsia.googlesource.com/fuchsia/+/4f08642bf450dc2cef73b58517a72f97ec85d676/build/config/BUILDCONFIG.gn#348

While I agree that extracting this logic into a function would reduce the duplication, I don't think it addresses the underlying issue. The problem is that you're still trying to replicate the internal logic for output name construction which is dependent on things like target platform, output type, etc. So there's always going to be duplication between GN and your function/custom logic.

I believe the best solution in this case would be to avoid this duplication altogether. We cannot fix get_target_output(), that function is fundamentally incompatible with GN's execution model because during its execution, targets and toolchains haven't yet been fully processed.

A few months ago, I tried a different approach which is to extend metadata processing to allow substitutions. The prototype implementation is here: https://gn-review.googlesource.com/c/gn/+/8420. You can see the usage of this feature here: https://fuchsia-review.googlesource.com/c/fuchsia/+/389558. I think that the reduction of complexity is quite convincing, and what's more important, it avoids the duplication of logic between the build and GN itself. I haven't seen any regression in terms of performance in my experiments, but it still adds extra complexity to GN like any new feature, so that's something you have to consider. The GN patch needs a bit more work, but I'd be happy to spend more time on it if you think that this would be a useful addition.

--

Brett Wilson

unread,

May 13, 2020, 2:09:47 PM5/13/20

to Petr Hosek, David Turner, gn-dev

Another option would be to require that the toolchain be defined before loading any targets. This would mean that we would add an extra synchronous phase before loading the root BUILD.gn file where we load the toolchain's BUILD.gn file. Then we can synchronously resolve the toolchain object for each target as it's loaded.

It seems intuitive that GN should be able to tell you this information, and since it's coming up in multiple places, I'm not opposed to it. I'm mostly just concerned about implementation complexity since the code that will need changing already one of the trickier parts of the program.

Brett

Roland McGrath

unread,

May 13, 2020, 2:27:36 PM5/13/20

to Brett Wilson, Petr Hosek, David Turner, gn-dev

It's already a sharp edge the way that toolchains are defined in the default toolchain context, which is also itself special when used for other purposes. In the Zircon build, I found it the best way to make things consistent to use the default toolchain basically only for defining toolchains and have every real target in a non-default toolchain, but that also has complications. I suspect that making the way toolchains are defined be more explicitly and cleanly separate from the environment where any target dependent on the toolchain (i.e. anything but action, copy, or group) is evaluated would clear up a lot of these issues. It also seems like it might be necessary for the synchronous definition semantics, though maybe I'm wrong about that. I'm not sure how backward-compatible the semantics can easily be made, though.

David Turner

unread,

May 13, 2020, 2:46:45 PM5/13/20

to Brett Wilson, gn-dev

Yes, the most simple case is the Chrome compiled_action() template, which needs to locate a built host executable to pass its path to //build/gn_run_binary.py. Look at how the "host_executable" variable is computed:

    # Get the path to the executable. Currently, this assumes that the tool

    # does not specify output_name so that the target name is the name to use.

    # If that's not the case, we'll need another argument to the script to

    # specify this, since we can't know what the output name is (it might be in

    # another file not processed yet).

    host_executable =

        get_label_info(host_tool, "root_out_dir") + "/" +

        get_label_info(host_tool, "name") + _host_executable_suffix

Another example of this logic duplication in Chromium is for finding the host protoc compiler.

The Fuchsia build version of compiled_action() is slightly different, essentially implementing the recommendation in the comment above.

While the Zircon build version is very different: it provides host_tool_action() instead, which relies on metadata attached to the host executable target.
This is done by ensuring that tools provided to this template must come from host_tool() (on line 1642 here) or prebuilt_host_tool() targets.
The former replicates the output path computations based on the toolchain type (which is different from the Chromium and Fuchsia ones).

There are also other instances in the Fuchsia and Zircon builds where these output paths are needed to be later collected in metadata, and Petr wrote about some of them.

Hope this helps,

Brett

David Turner

unread,

May 13, 2020, 2:51:10 PM5/13/20

to Petr Hosek, Brett Wilson, gn-dev

Le mer. 13 mai 2020 à 19:59, Petr Hosek <pho...@google.com> a écrit :

I believe what David is referring to is reconstructing the name of the output. There's get_target_output() but that can only be used for action() and copy() targets (and only within the same BUILD.gn file). To workaround this limitation, many build systems would manually reconstruct the output filename, and this logic is often duplicated in many places. This is fairly common when passing the build outputs to external scripts, e.g. to build distribution packages, see for example: https://fuchsia.googlesource.com/fuchsia/+/4f08642bf450dc2cef73b58517a72f97ec85d676/build/config/BUILDCONFIG.gn#348

Exactly.

While I agree that extracting this logic into a function would reduce the duplication, I don't think it addresses the underlying issue. The problem is that you're still trying to replicate the internal logic for output name construction which is dependent on things like target platform, output type, etc. So there's always going to be duplication between GN and your function/custom logic.

Actually, I was thinking we could create a function that would take target_type/platform/whatever as inputs, and return the appropriate info. Callers would have to pass the right inputs, but at least there would a single source of truth.
Note that I only want to use user-defined functions to this kind of code duplication over the build system, i.e. to simplify it. If there is another way to get what we need, I'd be happy for it.

I believe the best solution in this case would be to avoid this duplication altogether. We cannot fix get_target_output(), that function is fundamentally incompatible with GN's execution model because during its execution, targets and toolchains haven't yet been fully processed.

A few months ago, I tried a different approach which is to extend metadata processing to allow substitutions. The prototype implementation is here: https://gn-review.googlesource.com/c/gn/+/8420. You can see the usage of this feature here: https://fuchsia-review.googlesource.com/c/fuchsia/+/389558. I think that the reduction of complexity is quite convincing, and what's more important, it avoids the duplication of logic between the build and GN itself. I haven't seen any regression in terms of performance in my experiments, but it still adds extra complexity to GN like any new feature, so that's something you have to consider. The GN patch needs a bit more work, but I'd be happy to spend more time on it if you think that this would be a useful addition.

Interesting, I was not aware of this. That may be an acceptable route too.

In all cases, I find this conversation about GN´s expressiveness and extensibility fascinating and I'm very happy we can discuss this topic.

re:fi.64

unread,

May 13, 2020, 10:15:08 PM5/13/20

to David Turner, Brett Wilson, gn-dev

I'm not a Google employee at all, but just for the sake of discussion, but I've definitely had this come up quite a few times in a relatively small project.

--

Roland McGrath

unread,

May 13, 2020, 10:42:35 PM5/13/20

to re:fi.64, David Turner, Brett Wilson, gn-dev

Petr's substitution proposal addresses the problem cleanly for the metadata case. But using metadata for things like host_tool_action() is rather painful and requires a lot of holistic discipline in the build. So just making it cleaner and easier to define metadata with toolchain-dependent contents is a boon to existing uses of metadata that can get simpler and cleaner. But using metadata at all to solve problems like this is not a small thing, and certainly not easy and obvious to replicate. Making get_target_outputs able to deliver toolchain-dependent results and do so consistently is much easier to use and to understand. However, note that what's been discussed (and probably anything that would be acceptable) will not make get_target_outputs just simply work in all cases, because all targets outside the same BUILD.gn file are still unavailable because you need per-target details such as outputs for copy/action or output_name for toolchain-dependent targets (as well as the per-toolchain details we're talking about making available). So the compiled_action/host_tool_action case is not actually solved since generally the tool's target is not in the same BUILD.gn file with all its users via compiled_action. You still depend on adhering to a convention like output_name must match target_name so that outside users can deduce the actual output file. And it seems wrong to make get_target_outputs give you a might-be-a-lie rather than refuse to work in that case. So you'd be reduced to something like get_label_info("...($toolchain)", "default_output_dir") and such and reconstructing. That's certainly easier to use and understand than metadata solutions, but it's neither fully general like metadata solutions nor actually easy to use like get_target_outputs.

Shai Barack

unread,

May 15, 2020, 12:54:39 AM5/15/20

to gn-dev, mcgr...@chromium.org, David Turner, Brett Wilson, gn-dev, rym...@gmail.com

I'm really interested in Petr's proposed change here:

https://gn-review.googlesource.com/c/gn/+/8420

The ability to use expansions that are currently only available for tool variables in other templates would offer elegant solutions in areas where we currently struggle with usability.

Brett Wilson

unread,

May 15, 2020, 12:20:51 PM5/15/20

to Shai Barack, gn-dev, mcgr...@chromium.org, David Turner, rym...@gmail.com

On Thu, May 14, 2020 at 9:54 PM Shai Barack <sha...@google.com> wrote:

I'm really interested in Petr's proposed change here:
https://gn-review.googlesource.com/c/gn/+/8420

The ability to use expansions that are currently only available for tool variables in other templates would offer elegant solutions in areas where we currently struggle with usability.

I don't think this would solve David Turner's problem though, would it? I'm hesitant to say that's the approach the Chrome Android build should take.

Brett

David Turner

unread,

May 18, 2020, 4:21:12 PM5/18/20

to Brett Wilson, Shai Barack, gn-dev, mcgr...@chromium.org, rym...@gmail.com

It would solve one annoying problem we're facing, so it has some value, but it's true that other issues abound.

I'd like to take a step back and clarify what practical concerns we have in the Fuchsia team with GN as it currently exists, and why suggestions like "do complicated computations outside of GN" do not work well in practice (and actually makes things more complicated). Maybe GN isn't the tool we need after all, or maybe we can find ways to extend it in ways that keep the elegance and simplicity of the original design mostly there. Hopefully, this may apply to the Chrome/Android build too.

As background for the following, I'll mention that I worked several years on Chrome, where I hacked the Android-specific parts of the build significantly. I am now working on Fuchsia, which has two distinct build systems based on GN (one for Fuchsia, the platform, and one for Zircon, the kernel). More precisely, I'm working build unification, which means I have to understand and modify both of them.

Anyway, let's assume you're heading a small software development company of about 50 people, whose main product is a desktop application that runs on Windows, OS X and Linux. It's mostly written in C++, built with a custom build system built around Makefiles, or even CMake, but its limitations are showing and are preventing your team from making good progress. Also nobody understands how the build system works anymore, apart from one or two experts that are always too busy when stuff breaks; oh, and one of them just left for a startup. All the while, your project keeps growing and you're adding more developers.

Based on input from your lead developers, you decide to try to switch to GN instead for your build system. After some surprisingly short time to adapt, you have a new build system that works beautifully. Things are not completely perfect, of course [1], but your developers spent much less time fighting the build system, writing new rules is considerably simpler, especially for non-trivial targets, and GN provides useful commands like "analyze", "paths", "refs" which give you tons of information. You fall in love with Ninja, if you were not already using it. So far, everything's super great.

Note that the reason why GN works so well at this point is because it has full knowledge of the build graph, and understands how to process dependencies. More exactly, one could describe GN execution in the following steps:

1) Parse all GN build rules to generate a global build graph.
2) Perform computations over the full build graph, in order to perform sanity checks or prepare the build commands for the matching Ninja targets.

Two simple examples of such computations, that are hard-coded in the GN source code:

Verifying that a target with "testonly = true" is never depended upon by another target that has not the same flag set.
This simple check avoids tests targets from reaching production images, which is quite important for your release process.
Also GN will give you a very nice error message explaining the problem when this happens.

Note that generally speaking GN tries very hard to give useful messages in case of errors. It can do that because it has the full context to do it properly.
And that's yet another reason why we love this tool.
When building an executable, walk over all transitive dependencies, to collect the outputs of static_library(), source_set(), shared_library() and loadable_module() targets. And stop the walk at the latter two types.
This is required to compute the final link command for the executable.

So you keep on going with GN, very satisfied, your builds are blazing, the tests are running, and everybody's happy.

And then one day, management tells you they need an Android version of the application.

You take a look at the problem: code wise, you determine you can keep 80% of your native code, but you'll have to write Java for the UI, and deal with a completely different way to build software and run tests.
In technical terms, the issue you face is that you need GN to understand different kind of targets (easy), and perform new forms of computations over their dependencies (what?), for example:

Java targets can come in 3 flavors: 'java', 'android' and 'desktop', where the 'java' one use J2SE APIs that are common to both Android and Desktop Java.
It would be very useful to check that no 'android' target depends, even transitively, on a 'desktop' one, as well as the opposite.
When creating an Android package, all android_resource() targets that are reachable transitively from the top-level manifest must be collected and sent to the `aapt` Android SDK tool for processing.
Also modifying a resource doesn't mean its dependencies need to be rebuilt (except in certain cases), so sometimes they must be "deps", sometimes they would be "data_deps" except they should be part in the dependency walk.

But in reality, there are dozen more new build dependency rules to consider, and unfortunately, GN has not support for any of these new types of computations.

So you ask the GN developers for advice, and they kindly tell you "if you need to something complicated, do it outside of GN instead".

So you ask one of your developers to write a prototype to just try that.

The initial approach is to push the new dependency computations on top of GN, i.e. you implement a meta-meta-build system: something that reads a custom DSL of your choosing, describing Android and Java targets, and generates corresponding GN files after performing various checks and computations. The result being sent to "gn gen" as usual.

You quickly realize some noticeable drawbacks to this approach:

You have two DSLs instead of one, where each one has a different way to name your targets, and more importantly what the build graph looks like, because they do not describe dependencies in the same way.
Inspecting the build graph becomes really confusing. Going from Ninja target to GN target was already a little hard, but getting one more level if an adventure.
GN cannot give you meaningful errors in case of errors in many cases.
Auto-generated Java sources, generated from other Java programs are common. Translating this from META to GN is tricky, and when things do not work, are really hard to debug.
Sometimes a Java library depends on a C++ library (due to native functions), this makes your META DSL a bit more complicated than you initially planned.
Sometimes a C++ library depends on a Java library (due to JNI, Java source annotations, or whatever). This makes writing GN rules that reference them painful. It also means you just can you "gn gen" if you don't write on Java/Android code, just in case.

After a quick try, your developers tell you they hate this, and that they would strongly favor one DSL over two. You don't want to change your build system yet again, so you try the opposite approach, i.e. doing the new dependency computations at build time!

You realize that this is what the Chrome/Android GN-based build, actually did, so you study its implementation, and you see that:

Every Android-related target will generate a file containing a full description of its GN target. It's a JSON dictionary with dozens of different keys per target, if not more.
This is required because if one needs to compute dependency relationships at build time, the build graph needs to be exported somewhere accessible from the action scripts.
Every Android-related action script needs to be able to access the content of these files, and will generate Ninja dependency files directly from them.
Since GN doesn't know about the result of these computations, it cannot generate anything useful for Ninja, hence the action scripts must do it.

Compared to the meta-meta-build situation, you have a single DSL / set of truth for your targets (GN build rules), which is considerably better, however:

GN doesn't know the full build graph anymore, which drastically limits the usability of its inspection commands, and makes the build graph and process difficult to understand.
GN cannot give you informative error messages anymore: it lacks proper context.
The action script also lacks context and will not give you much information when something's wrong.
When something doesn't work, it becomes extremely difficult to understand why. And your developers spend far more time looking a build.ninja and .d files directly to understand what's going on, something they never needed to do before.
You now need experts who understand the weird relationships between your Android/Java related GN templates and the dozens of action scripts they interact with.

Having worked heavily on this system, I don't think it will be trivial to fix the situation by modifying GN to understand a dozen new Java and Android related target types and dependency computations. And that's because the Android tooling is constantly evolving, introducing new tools and associated workflow changes every year to support new features (e.g. App Bundles, that I implemented in the Chromium build system a few years ago).

For Fuchsia and Zircon, both build systems have tried heavily to avoid relying on action scripts to support the features they need. This requires performing various checks and computations at "gn gen" time, which is typically performed by dynamic lookups over various lists of stuff (since we can't use scopes as general dictionaries / sets). They also use far more than two toolchains in a single build (The Zircon build defines more than a hundred of them). They rely heavily on metadata collection, which is unfortunately only half of the battle (the result still needs to be processed by action scripts, which still generate dependency files for Ninja that GN doesn't know about). Also metadata collection is powerful but really error prone, e.g. a simply typo will go unnoticed since there are no checks about keys and values, where the keys are defined is completely remote from where they are used, and there is no way to constrain visibility of the keys to specific templates. And in case of error, oh my, things get weird. We also support Go packages and Rust crates which have specific dependency requirements. At least for Rust there are some changes in GN to adjust to it, but I'm unsure of their state (having not worked on this). For Go we now that most Go-level source changes are oblivious to GN which makes life difficult sometimes. Fortunately, we're mostly using it for host tools that do not change very frequently compared to the rest of the platform. Also, our requirements, with regards to dependencies computations, tend to change a lot over time.

I don't have a proper solution to this situation, but I hope this explains why we're here, and why "do complex computations outside of GN" doesn't work well in practice, at least for real projects that need more than generating programs and libraries from C++ and action scripts. I would love if we could describe all our targets with simple GN rules, that the tool would understand natively, but we're very far from here.

I would like to add that I love GN, it's one of the best build system tools I've used, and working on its code base has been a pleasure. I completely understand the elegance of the original design. I just think that it doesn't fit the needs of real projects anymore, because building software is a messy business, after all. I hope we find a way to extend it in ways that preserve this elegance as far as possible.

- Digit

[1] Just like with the previous build system, sometimes you have unexplainable build breaks after updating your workspace, but your team learns to do a clean build to fix the situation, or even implement clobber landmines like the Chrome team does to deal with this.

Brett

K. Moon

unread,

May 18, 2020, 4:56:53 PM5/18/20

to David Turner, Brett Wilson, Shai Barack, gn-dev, mcgr...@chromium.org, rym...@gmail.com

Thanks for the interesting read! I'm wondering what your thoughts are on something like Bazel, which embeds an entire programming language (Starlark), compared to GN, which seems to intentionally take a more stripped-down approach.

Harley Li

unread,

May 18, 2020, 5:08:01 PM5/18/20

to K. Moon, David Turner, Brett Wilson, Shai Barack, gn-dev, mcgr...@chromium.org, rym...@gmail.com

How about supporting external extensions to GN, by exposing APIs for them so that users can extend it according to their needs? I haven't worked on GN so I'm not sure it is feasible, but wanted to float the idea anyway.

--

Dirk Pranke

unread,

May 18, 2020, 5:35:23 PM5/18/20

to Harley Li, K. Moon, David Turner, Brett Wilson, Shai Barack, gn-dev, mcgr...@chromium.org, rym...@gmail.com

I've done the mental exercise of going through everything that David describes, and I mostly agree with his thinking.

I cannot say if David's proposal or Petr's will solve these problems, but it does feel to me like the Chromium build and particularly the Fuchsia build have probably pushed GN out of its comfort zone. I think GN is close enough to what Chromium needs that I would try to do incremental improvements to address its problems. I'm not very familiar with the Fuchsia build so I can't say whether they should more seriously evaluate something else.

It's unclear what adding external extensions to GN would mean, since GN is a static C++ binary. You could define a plugin API, but C++ plugin APIs are a nightmare to maintain. You could add some sort of IPC / IDL approach, I suppose. I wouldn't have high hopes for such an approach, though.

I suspect you'd be better off with an approach closer to Bazel and Starlark. And, indeed, I've wondered myself whether trying to push GN in a direction closer to Bazel and Starlark would be a good idea, although I wouldn't call it GN at that point, it'd be a new tool.

-- Dirk

Brett Wilson

unread,

May 18, 2020, 6:46:53 PM5/18/20

to David Turner, Shai Barack, gn-dev, mcgr...@chromium.org, rym...@gmail.com

Some points:

I never liked the way the Chrome Android build was done. I think we should have built in some Android build fundamentals rather than have everything in via script. But I did not (and still don't) know enough about Android building to suggest what should be done instead.

I think it will be better to build-in some Fuchsia packaging concepts than use metadata which has the drawbacks you mention. We have some builtin stuff for iOS bundles which is relatively simple in GN C++ code and enables stuff to be expressed cleanly.

I think custom verification for things like your android desktop dependency example should be done via external scripts that read the build graph. This patch https://gn-review.googlesource.com/c/gn/+/8400 is a good example of how this type of tool could be written.

I think the core problem is that nobody has expressed a vision for the way the Fuchsia build should work. As you pointed out, things like having a hundred toolchains in Zircon is not reasonable. But I think what's being proposed here isn't a solution to that problem. I see a lot of feature requests as at best expressing the existing complexity more easily.

Expressing the current complexity more easily is at best a local maximum not much above where we are now, and at worst will allow ever more complexity to be added at ever increasing rates, making the build actually worse.

It's as if the response to having 100 toolchains is to make it possible to express 100 toolchains in half the code. We should instead have a plan to remove the 100 toolchains. I realize this isn't what you're proposing, but I hope it expresses why I am so skeptical of these incremental features, even when they can reduce some complexity in certain places.

Brett

Carlos Pizano

unread,

May 18, 2020, 7:07:36 PM5/18/20

to gn-dev, Dirk Pranke, km...@chromium.org, David Turner, Brett Wilson, Shai Barack, gn-dev, mcgr...@chromium.org, rym...@gmail.com, Harley Li

Older Chrome devs would remember the previous build system (GYP) which allows (most) of python processing. After just a couple of years it was no longer possible to review a build change with any certainty.

Because of that experience I am dubious about that direction in the long term, even medium term. Any increase of generic expressive power we should be concerned about.

Looking at https://gn.googlesource.com/gn/+/master/docs/reference.md

the path that GN seem to be going in the direction of targeted features for rust, ios, etc.

Shai Barack

unread,

May 18, 2020, 7:49:43 PM5/18/20

to Carlos Pizano, gn-dev, Dirk Pranke, km...@chromium.org, David Turner, Brett Wilson, mcgr...@chromium.org, rym...@gmail.com, Harley Li

This is why Starlark is a dialect of Python but isn't Python. The differences are carefully selected such as to keep Starlark parallelizable and to guarantee that loops terminate - good properties for stuff you'd do in a build system at analysis time.

An overpowered extension point such as full-blown Python in a build system can be a double-edged sword, as Carlos points out. Already we have this problem in Fuchsia's use of GN, because a lot of our logic is done by shelling out to Python scripts, where GN itself can't deal with the complexity of the problem at hand. So in a manner of speaking we have the worst of both worlds - an underpowered build DSL, and an overpowered escape hatch.

Soong is extensible with full-blown Go, Gradle is extensible with full-blown Groovy, and they both have the same problem that Carlos described in GYP.

Dirk Pranke

unread,

May 18, 2020, 8:38:31 PM5/18/20

to Shai Barack, Carlos Pizano, gn-dev, km...@chromium.org, David Turner, Brett Wilson, mcgr...@chromium.org, rym...@gmail.com, Harley Li

I think the iOS target types in GN have a bit too much magic in them; they do allow you to do things relatively cleanly, but it's not clear what's going on under the covers. But, you could certainly try to address that with better documentation, and we haven't really done that.

I expect that would be true for whatever other new target types as well, and I think this is a different way of stating what Brett said: define how you want {an Android build, a Fuchsia build, etc} to work; we should really have that anyway because users need it.

Once you do that, it's not clear if implementing in C++ plus adding documentation as needed is much worse than implementing in terms of a more powerful DSL (or two DSLs) + new primitives (and it might be better), but it's almost certainly a lot less work.

-- Dirk

Sylvain Defresne

unread,

May 19, 2020, 6:09:48 AM5/19/20

to Dirk Pranke, Shai Barack, Carlos Pizano, gn-dev, km...@chromium.org, David Turner, Brett Wilson, mcgr...@chromium.org, rym...@gmail.com, Harley Li

I think I may be guilty with the under-documentation for the iOS target types. But I would say that they fill exactly the purpose that they were designed for which is building some types of artifacts that are really specific to iOS/macOS build. If there are any specific points that are unclear on how they work or should be used, I'd be happy to contribute to some documentation to explain it.

I don't know enough about how Fucshia is built and only a little on how Android works, but I agree with Brett and Dirk on the subject. I think someone needs to become an expert with the build, its requirements, and think of a strategic way of defining how it should be done.

The iOS build could probably also have used actions and mostly worked for some time, but I instead designed the two iOS/macOS specific targets that were needed to make expressing the build simpler and remove the need for more and more hacks over time.

-- Sylvain

Andrew Grieve

unread,

May 19, 2020, 10:12:50 AM5/19/20

to Sylvain Defresne, Dirk Pranke, Shai Barack, Carlos Pizano, gn-dev, km...@chromium.org, David Turner, Brett Wilson, mcgr...@chromium.org, rym...@gmail.com, Harley Li

I agree with the sentiments about Chromium's android templates. They are quite complex, but have changed significantly over time, to the point where I'm not convinced building their logic right into GN would have been advantageous. Maybe it's time now to convert them to C++, but I'd definitely prefer exploring some sort of middleground first. As mentioned already - Bazel used to have their Android rules in .java, and have deprecated them in favour of describing them in Starkark.

Here's a strawman for something that (I think) would address most of the chromium android pain points without building them directly into GN.

Context:

* Java targets need to produce a .dex file, but only if there is an android_apk target that depends on it.

* Java targets need to produce a .jar, but only if there is a java_binary or __dex target that depends on it.

* Java targets need to produce a header_jar, but only if there is a target that uses it in their classpath when compiling

Proposal:

template("java_library") {

action("$target_name}__header") {

metadata_deps = [{

name = "direct_header_deps"

labels = invoker.deps

data_keys = [ "header_jar_path" ]

walk_keys = [ "header_jar_barrier" ]

}

metadata = {

header_jar_path = [ outputs[0] ]

header_jar_barrier = []
}

...

}

action("$target_name}__compile") {

optional_target = true

deps = invoker.deps

metadata_deps = [{

name = "direct_header_deps"

labels = invoker.deps

data_keys = [ "header_jar_path" ]

walk_keys = [ "header_jar_barrier" ]

}

metadata = {

library_jar_path = [ outputs[0] ]
}

}

action("${target_name}__dex") {

optional_target = true

# non-metadata_dep causes target to be created (if this target is created).

deps = [":$target_name}__compile"]

metadata = {

library_dex_path = [ outputs[0] ]
}

}

group(target_name) {

# These would never be built when building target_name. They just allow metadata to pass through.

metadata_deps = ["$target_name}__header", "$target_name}__compile", "${target_name}__dex"]

}

action("apk_target") {

# Collect metadata so that it can be used by args.

# Causes direct deps to be formed between this target and metadata providers.

# Causes targets with activate_only_if_metadata_used=true to exist.

metadata_deps = [{

name = "transitive_dex"

labels = invoker.deps

data_keys = [ "dex_path" ]

}

# Allow using metadata directly in command-line without needing to write to a file:

args = ["--dex-files={{transitive_dex}}"}]

}]

* Would be useful for mojom template, where they define a "${target_name}__is_mojom" group to ensure all deps passed to them are of type "mojom".

* Would eliminate most needs of depfiles, since it would allow apk targets to say "I directly depend on all transitive targets that produce .dex files".

* The "optional target" idea kind-of already exists in GN through secondary toolchains. E.g. targets don't get defined in every toolchain, only in those where there is a dep on them.

Reply all

Reply to author

Forward