It would solve one annoying problem we're facing, so it has some value, but it's true that other issues abound.
I'd like to take a step back and clarify what practical concerns we have in the Fuchsia team with GN as it currently exists, and why suggestions like "do complicated computations outside of GN" do not work well in practice (and actually makes things more complicated). Maybe GN isn't the tool we need after all, or maybe we can find ways to extend it in ways that keep the elegance and simplicity of the original design mostly there. Hopefully, this may apply to the Chrome/Android build too.
As background for the following, I'll mention that I worked several years on Chrome, where I hacked the Android-specific parts of the build significantly. I am now working on Fuchsia, which has two distinct build systems based on GN (one for Fuchsia, the platform, and one for Zircon, the kernel). More precisely, I'm working build unification, which means I have to understand and modify both of them.
Anyway, let's assume you're heading a small software development company of about 50 people, whose main product is a desktop application that runs on Windows, OS X and Linux. It's mostly written in C++, built with a custom build system built around Makefiles, or even CMake, but its limitations are showing and are preventing your team from making good progress. Also nobody understands how the build system works anymore, apart from one or two experts that are always too busy when stuff breaks; oh, and one of them just left for a startup. All the while, your project keeps growing and you're adding more developers.
Based on input from your lead developers, you decide to try to switch to GN instead for your build system. After some surprisingly short time to adapt, you have a new build system that works beautifully. Things are not completely perfect, of course [1], but your developers spent much less time fighting the build system, writing new rules is considerably simpler, especially for non-trivial targets, and GN provides useful commands like "analyze", "paths", "refs" which give you tons of information. You fall in love with Ninja, if you were not already using it. So far, everything's super great.
Note that the reason why GN works so well at this point is because it has full knowledge of the build graph, and understands how to process dependencies. More exactly, one could describe GN execution in the following steps:
1) Parse all GN build rules to generate a global build graph.
2) Perform computations over the full build graph, in order to perform sanity checks or prepare the build commands for the matching Ninja targets.
Two simple examples of such computations, that are hard-coded in the GN source code:
- Verifying that a target with "testonly = true" is never depended upon by another target that has not the same flag set.
This simple check avoids tests targets from reaching production images, which is quite important for your release process.
Also GN will give you a very nice error message explaining the problem when this happens.
Note that generally speaking GN tries very hard to give useful messages in case of errors. It can do that because it has the full context to do it properly.
And that's yet another reason why we love this tool.
- When building an executable, walk over all transitive dependencies, to collect the outputs of static_library(), source_set(), shared_library() and loadable_module() targets. And stop the walk at the latter two types.
This is required to compute the final link command for the executable.
So you keep on going with GN, very satisfied, your builds are blazing, the tests are running, and everybody's happy.
And then one day, management tells you they need an Android version of the application.
You take a look at the problem: code wise, you determine you can keep 80% of your native code, but you'll have to write Java for the UI, and deal with a completely different way to build software and run tests.
In technical terms, the issue you face is that you need GN to understand different kind of targets (easy), and perform new forms of computations over their dependencies (what?), for example:
- Java targets can come in 3 flavors: 'java', 'android' and 'desktop', where the 'java' one use J2SE APIs that are common to both Android and Desktop Java.
It would be very useful to check that no 'android' target depends, even transitively, on a 'desktop' one, as well as the opposite.
- When creating an Android package, all android_resource() targets that are reachable transitively from the top-level manifest must be collected and sent to the `aapt` Android SDK tool for processing.
Also modifying a resource doesn't mean its dependencies need to be rebuilt (except in certain cases), so sometimes they must be "deps", sometimes they would be "data_deps" except they should be part in the dependency walk.
But in reality, there are dozen more new build dependency rules to consider, and unfortunately, GN has not support for any of these new types of computations.
So you ask the GN developers for advice, and they kindly tell you "if you need to something complicated, do it outside of GN instead".
So you ask one of your developers to write a prototype to just try that.
The initial approach is
to push the new dependency computations on top of GN, i.e. you implement a
meta-meta-build system: something that reads a custom DSL of your choosing, describing Android and Java targets, and generates corresponding GN files after performing various checks and computations. The result being sent to "gn gen" as usual.
You quickly realize some noticeable drawbacks to this approach:
- You have two DSLs instead of one, where each one has a different way to name your targets, and more importantly what the build graph looks like, because they do not describe dependencies in the same way.
- Inspecting the build graph becomes really confusing. Going from Ninja target to GN target was already a little hard, but getting one more level if an adventure.
- GN cannot give you meaningful errors in case of errors in many cases.
- Auto-generated Java sources, generated from other Java programs are common. Translating this from META to GN is tricky, and when things do not work, are really hard to debug.
- Sometimes a Java library depends on a C++ library (due to native functions), this makes your META DSL a bit more complicated than you initially planned.
- Sometimes a C++ library depends on a Java library (due to JNI, Java source annotations, or whatever). This makes writing GN rules that reference them painful. It also means you just can you "gn gen" if you don't write on Java/Android code, just in case.
After a quick try, your developers tell you they hate this, and that they would strongly favor one DSL over two. You don't want to change your build system yet again, so you try the opposite approach, i.e. doing the new dependency computations at build time!
You realize that this is what the Chrome/Android GN-based build, actually did, so you study its implementation, and you see that:
- Every Android-related target will generate a file containing a full description of its GN target. It's a JSON dictionary with dozens of different keys per target, if not more.
This is required because if one needs to compute dependency relationships at build time, the build graph needs to be exported somewhere accessible from the action scripts.
- Every Android-related action script needs to be able to access the content of these files, and will generate Ninja dependency files directly from them.
Since GN doesn't know about the result of these computations, it cannot generate anything useful for Ninja, hence the action scripts must do it.
Compared to the meta-meta-build situation, you have a single DSL / set of truth for your targets (GN build rules), which is considerably better, however:
- GN doesn't know the full build graph anymore, which drastically limits the usability of its inspection commands, and makes the build graph and process difficult to understand.
- GN cannot give you informative error messages anymore: it lacks proper context.
- The action script also lacks context and will not give you much information when something's wrong.
- When something doesn't work, it becomes extremely difficult to understand why. And your developers spend far more time looking a build.ninja and .d files directly to understand what's going on, something they never needed to do before.
- You now need experts who understand the weird relationships between your Android/Java related GN templates and the dozens of action scripts they interact with.
Having worked heavily on this system, I don't think it will be trivial to fix the situation by modifying GN to understand a dozen new Java and Android related target types and dependency computations. And that's because the Android tooling is constantly evolving, introducing new tools and associated workflow changes every year to support new features (e.g. App Bundles, that I implemented in the Chromium build system a few years ago).
For Fuchsia and Zircon, both build systems have tried heavily to avoid relying on action scripts to support the features they need. This requires performing various checks and computations at "gn gen" time, which is typically performed by dynamic lookups over various lists of stuff (since we can't use scopes as general dictionaries / sets). They also use far more than two toolchains in a single build (The Zircon build defines more than a hundred of them). They rely heavily on metadata collection, which is unfortunately only half of the battle (the result still needs to be processed by action scripts, which still generate dependency files for Ninja that GN doesn't know about). Also metadata collection is powerful but really error prone, e.g. a simply typo will go unnoticed since there are no checks about keys and values, where the keys are defined is completely remote from where they are used, and there is no way to constrain visibility of the keys to specific templates. And in case of error, oh my, things get weird. We also support Go packages and Rust crates which have specific dependency requirements. At least for Rust there are some changes in GN to adjust to it, but I'm unsure of their state (having not worked on this). For Go we now that most Go-level source changes are oblivious to GN which makes life difficult sometimes. Fortunately, we're mostly using it for host tools that do not change very frequently compared to the rest of the platform. Also, our requirements, with regards to dependencies computations, tend to change a lot over time.
I don't have a proper solution to this situation, but I hope this explains why we're here, and why "do complex computations outside of GN" doesn't work well in practice, at least for real projects that need more than generating programs and libraries from C++ and action scripts. I would love if we could describe all our targets with simple GN rules, that the tool would understand natively, but we're very far from here.
I would like to add that I love GN, it's one of the best build system tools I've used, and working on its code base has been a pleasure. I completely understand the elegance of the original design. I just think that it doesn't fit the needs of real projects anymore, because building software is a messy business, after all. I hope we find a way to extend it in ways that preserve this elegance as far as possible.
- Digit
[1] Just like with the previous build system, sometimes you have unexplainable build breaks after updating your workspace, but your team learns to do a clean build to fix the situation, or even implement
clobber landmines like the Chrome team does to deal with this.