"Trusting Trust" builds with Bazel or "how to build your build tools"

423 views
Skip to first unread message

Markus Teufelberger

unread,
Oct 2, 2017, 10:44:12 AM10/2/17
to bazel-discuss
Hi all,

As far as I understand it, Bazel solves the problem of building stuff on a very high level by letting one define (using the BUILD language) a set of inputs, a set of programs that do some transformations of these inputs and then a set of expected outputs.
The definition of inputs and outputs part is usually done via code + BUILD files and (for external code) via referring to other stuff via the WORKSPACE file.

What I'm not so clear about is the following:
I'd like to also specify exactly which tools are being used to transform the inputs to generate the outputs (e.g. which compiler is being used) and furthermore, I'd like to make sure that this is also done in a deterministic way. If I'm building something, ideally the only thing that should be necessary would be a small tool to interpret the gRPC call for the current action that can interact with the build cache etc., some resources (CPU, RAM, disk space, network) and a running OS kernel. Something like a Docker container "FROM scratch" and only https://github.com/bazelbuild/bazel-buildfarm added in there.

Currently it seems to me as if Bazel still "only" sandboxes the code, not the compiler(s). Some rule sets seem to already go into this direction - the rules for golang allow the user selecting a compiler version (https://github.com/bazelbuild/rules_go/blob/master/go/private/repositories.bzl) for example and don't just use whatever is installed locally.

BUT:

There is a reason why the golang binaries are downloaded as binary packages instead of (optionally) referring to a commit on https://go.googlesource.com/go or something similar and building from source:
At the moment it is not possible to build binaries before they are referenced in a WORKSPACE file. This leads to situations like https://github.com/GoogleCloudPlatform/distroless/blob/master/package_manager/package_manager.bzl, where a tool which is used as a helper for some custom WORKSPACE rules has to be fetched as a binary from a remote source even though the full source code for the tool and a BUILD file to build it is available in the same folder.

I'd like to propose yet another file ("BOOTSTRAP"?) which contains a build process for any tool that WORKSPACE and BUILD files later might require. In that file every binary that is explicitly or implicitly used during a bazel build/test/whatever command must be referred to. The base case would be to just say for every binary that the one that is already installed should be used. In cases like the helper script for distroless it would contain a WORKSPACE-rule like syntax to allow building the helper tool and making it available to the actual WORKSPACE that will be running afterwards. In the golang case it could contain a reference to either the binary distribution of go, or a way to build the whole thing from source (if someone wants to check against a bleeding edge version for example).
Having such a file would also make it easier to know which binaries actually need to be supplied when setting up a build server/container and which ones could be left out. As far as I understand, CROSSTOOL files should already be something in that direction, maybe it would be possible to extend them to be able to contain references to tools that still need to be built?

Depending on the implementation of recursive WORKSPACEs, this might even not need a separate file or a modification of CROSSTOOL, maybe just first building a "tools" WORKSPACE and then, after that the "myprogram" WORKSPACE could be already a pattern to achieve this. It seems that at the moment recursive WORKSPACEs are on hold though.

The advantages of also locally "building the builders" would include the ability to quickly change compiler versions, making it easier to get the exact same binary later ("what was the compiler version we used on our build server(s) 5 months ago?") and shipping fewer dependencies on build nodes (as I wrote above: Ideally only a running OS kernel and a program to handle the gRPC call should be enough). It would also make patterns like the golang one or the one in distroless easier, since there is no awkward "let someone else build and upload a binary somewhere, then reference that" step. The extreme case would be a source-only repository/toolchain - after the initial bootstrap build which still needs some locally installed tools, as long as the cache is up no worker node ever will actually need any binary aside from the gRPC handler installed to build anything inside Bazel.

As far as I understand the environment at Google, this seems to be only a small problem there, as they have the ability to quickly update build worker nodes to a single current/new compiler version. It is probably enough there to just assume that the local one is always the best one to use. Rebuilding the toolchain (or even just asking the cache for the current toolchain binaries) probably has more overhead than just supplying build nodes with a slightly larger base image.

The "Trusting Trust" part refers to the idea from Ken Thompson in 1984 that it would be possible to trust the output of a compiler binary more (a malicious compiler might insert some unwanted code) if it is built in a 3-stage way: First, compile it using one or several different local compilers to produce the binary c0 (different compilers will produce different versions of c0). Then compile the compiler again using c0, producing c1 (these must now be bit-identical if no local compiler did insert code into c0 that would modify c0's behavior). To be extra sure, you can even compile the compiler again using c1 to produce c2, which then must be identical to c1.
As the initial toolchain compilation would occur relatively rarely anyways (as soon as you know that c1 is good, you can compile further compilers with c1), it might be an interesting exercise to publish such a build process for as many locally installed compilers as possible. This is a one-time cost, but it would help every Bazel user to have more confidence in the tools they are using.

Unfortunately I won't be at the Bazel Summit, but maybe this is something that others find interesting too and want to discuss here or there. Of course it might also be the case that I'm just misunderstanding something and refering to tools' source code instead of binaries in the WORKSPACE is already possible. In that case it is definitely not obvious at least to me and I'd love to see (and even write) some better documentation for this.

Looking forward to your ideas on the subject,
Markus

Tl;dr:
It would be nice to be able to have a "build the build tools" step before WORKSPACE rules are called - either by having a separate "BOOTSTRAP" file or nested/recursive WORKSPACEs.
This would enable even better determinism of the generated binaries and also allow for very minimal build worker nodes.

Marcel Hlopko

unread,
Oct 9, 2017, 8:35:19 AM10/9/17
to Markus Teufelberger, bazel-discuss, dmar...@google.com, ulf...@google.com, lbe...@google.com

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/e42d118d-1b7b-43a5-93d7-397a673b4579%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
-- 
Marcel Hlopko | Software Engineer | hlo...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Damien Martin-Guillerez

unread,
Oct 10, 2017, 5:08:48 AM10/10/17
to Marcel Hlopko, Markus Teufelberger, bazel-discuss, ulf...@google.com, lbe...@google.com
Hi Markus,

IIUC the main problem you are facing is that repository rule cannot depends on artifact provided by Bazel?

FWIW at Google we check in binary version of some compiler / build tools but most of the time we just add them as dependency. We do not fetch anything from outside so we do not have the problem of workspace rule needing some tool. There is however a very tiny set of tools installed on the workers and that list have became shorter and shorter over time (moving them to the repo).

Also you might be interested in https://bootstrappable.org. It is outside of Bazel but they try to address the same problem.

Ulf Adams

unread,
Oct 10, 2017, 5:42:37 AM10/10/17
to Damien Martin-Guillerez, Marcel Hlopko, Markus Teufelberger, bazel-discuss, lbe...@google.com
On Tue, Oct 10, 2017 at 11:08 AM, Damien Martin-Guillerez <dmar...@google.com> wrote:
Hi Markus,

IIUC the main problem you are facing is that repository rule cannot depends on artifact provided by Bazel?

FWIW at Google we check in binary version of some compiler / build tools but most of the time we just add them as dependency. We do not fetch anything from outside so we do not have the problem of workspace rule needing some tool. There is however a very tiny set of tools installed on the workers and that list have became shorter and shorter over time (moving them to the repo).

Also you might be interested in https://bootstrappable.org. It is outside of Bazel but they try to address the same problem.

That link doesn't work. This link does: http://bootstrappable.org/
 

On Mon, Oct 9, 2017 at 2:35 PM Marcel Hlopko <hlo...@google.com> wrote:

On Mon, Oct 2, 2017 at 4:44 PM Markus Teufelberger <markusteu...@gmail.com> wrote:
Hi all,

As far as I understand it, Bazel solves the problem of building stuff on a very high level by letting one define (using the BUILD language) a set of inputs, a set of programs that do some transformations of these inputs and then a set of expected outputs.
The definition of inputs and outputs part is usually done via code + BUILD files and (for external code) via referring to other stuff via the WORKSPACE file.

What I'm not so clear about is the following:
I'd like to also specify exactly which tools are being used to transform the inputs to generate the outputs (e.g. which compiler is being used) and furthermore, I'd like to make sure that this is also done in a deterministic way. If I'm building something, ideally the only thing that should be necessary would be a small tool to interpret the gRPC call for the current action that can interact with the build cache etc., some resources (CPU, RAM, disk space, network) and a running OS kernel. Something like a Docker container "FROM scratch" and only https://github.com/bazelbuild/bazel-buildfarm added in there.

Currently it seems to me as if Bazel still "only" sandboxes the code, not the compiler(s). Some rule sets seem to already go into this direction - the rules for golang allow the user selecting a compiler version (https://github.com/bazelbuild/rules_go/blob/master/go/private/repositories.bzl) for example and don't just use whatever is installed locally.

BUT:

There is a reason why the golang binaries are downloaded as binary packages instead of (optionally) referring to a commit on https://go.googlesource.com/go or something similar and building from source:
At the moment it is not possible to build binaries before they are referenced in a WORKSPACE file. This leads to situations like https://github.com/GoogleCloudPlatform/distroless/blob/master/package_manager/package_manager.bzl, where a tool which is used as a helper for some custom WORKSPACE rules has to be fetched as a binary from a remote source even though the full source code for the tool and a BUILD file to build it is available in the same folder.

I'd like to propose yet another file ("BOOTSTRAP"?) which contains a build process for any tool that WORKSPACE and BUILD files later might require. In that file every binary that is explicitly or implicitly used during a bazel build/test/whatever command must be referred to. The base case would be to just say for every binary that the one that is already installed should be used. In cases like the helper script for distroless it would contain a WORKSPACE-rule like syntax to allow building the helper tool and making it available to the actual WORKSPACE that will be running afterwards. In the golang case it could contain a reference to either the binary distribution of go, or a way to build the whole thing from source (if someone wants to check against a bleeding edge version for example).
Having such a file would also make it easier to know which binaries actually need to be supplied when setting up a build server/container and which ones could be left out. As far as I understand, CROSSTOOL files should already be something in that direction, maybe it would be possible to extend them to be able to contain references to tools that still need to be built?

Depending on the implementation of recursive WORKSPACEs, this might even not need a separate file or a modification of CROSSTOOL, maybe just first building a "tools" WORKSPACE and then, after that the "myprogram" WORKSPACE could be already a pattern to achieve this. It seems that at the moment recursive WORKSPACEs are on hold though.

The advantages of also locally "building the builders" would include the ability to quickly change compiler versions, making it easier to get the exact same binary later ("what was the compiler version we used on our build server(s) 5 months ago?") and shipping fewer dependencies on build nodes (as I wrote above: Ideally only a running OS kernel and a program to handle the gRPC call should be enough). It would also make patterns like the golang one or the one in distroless easier, since there is no awkward "let someone else build and upload a binary somewhere, then reference that" step. The extreme case would be a source-only repository/toolchain - after the initial bootstrap build which still needs some locally installed tools, as long as the cache is up no worker node ever will actually need any binary aside from the gRPC handler installed to build anything inside Bazel.

As far as I understand the environment at Google, this seems to be only a small problem there, as they have the ability to quickly update build worker nodes to a single current/new compiler version. It is probably enough there to just assume that the local one is always the best one to use. Rebuilding the toolchain (or even just asking the cache for the current toolchain binaries) probably has more overhead than just supplying build nodes with a slightly larger base image.

We do not update the build workers, but upload all the necessary files instead. However, you cannot build gcc without a C compiler (or clang without a C++ compiler), so you do need existing binaries at some point. Also, most Linux tools require installation to fixed absolute paths, which we can't guarantee at this time (and Bazel cannot handle represent such a requirement right now), and they read a bunch of (undeclared) files, and have dynamic library dependencies (also at fixed absolute paths). For the tools we use at Google, we have manually patched all of them to support execution from relative paths, and manually declared all the necessary dependencies.

As far as Bazel is concerned, we want to be able to represent 'toolchains' in Bazel as an explicit first-class concept, making it possible to select the right toolchain for the host / target platform combination (supporting cross-compilation), as well as allowing toolchains to be bootstrapped from source, used from a local machine, or from a Docker image. That'll include setting best practices for rule developers. Note that the C++ and Java rules in Bazel already mostly follow this approach, so you can switch between different toolchains, and even bootstrap toolchains to some extent.
 

The "Trusting Trust" part refers to the idea from Ken Thompson in 1984 that it would be possible to trust the output of a compiler binary more (a malicious compiler might insert some unwanted code) if it is built in a 3-stage way: First, compile it using one or several different local compilers to produce the binary c0 (different compilers will produce different versions of c0). Then compile the compiler again using c0, producing c1 (these must now be bit-identical if no local compiler did insert code into c0 that would modify c0's behavior). To be extra sure, you can even compile the compiler again using c1 to produce c2, which then must be identical to c1.
As the initial toolchain compilation would occur relatively rarely anyways (as soon as you know that c1 is good, you can compile further compilers with c1), it might be an interesting exercise to publish such a build process for as many locally installed compilers as possible. This is a one-time cost, but it would help every Bazel user to have more confidence in the tools they are using.

I think most people don't care, so this isn't a large benefit. Sure, nice to have, but not something I'd invest time in at this time.
 

Unfortunately I won't be at the Bazel Summit, but maybe this is something that others find interesting too and want to discuss here or there. Of course it might also be the case that I'm just misunderstanding something and refering to tools' source code instead of binaries in the WORKSPACE is already possible. In that case it is definitely not obvious at least to me and I'd love to see (and even write) some better documentation for this.

Looking forward to your ideas on the subject,
Markus

Tl;dr:
It would be nice to be able to have a "build the build tools" step before WORKSPACE rules are called - either by having a separate "BOOTSTRAP" file or nested/recursive WORKSPACEs.
This would enable even better determinism of the generated binaries and also allow for very minimal build worker nodes.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.

Markus Teufelberger

unread,
Oct 10, 2017, 7:14:11 AM10/10/17
to bazel-discuss
Am Dienstag, 10. Oktober 2017 11:08:48 UTC+2 schrieb Damien Martin-Guillerez:
Hi Markus,

IIUC the main problem you are facing is that repository rule cannot depends on artifact provided by Bazel?

Yes, let's say I want to build something using the result from a previous build step (e.g. I want to build my web app using NodeJS which requires building the V8 engine with GCC and I want to build my own GCC myself before that). It could work by having an "install" step between different layers of tooling ("From now on, don't use the system tool, but the one supplied by BUILD rule X").

https://github.com/bazelbuild/bazel/blob/master/tools/build_defs/repo/git.bzl for example assumes some hopefully recent version of "git" (amongst other things) being available in the local $PATH. I'd like to be able to select which exact release from https://github.com/git/git/releases is used as tool in this rule and if it is not available, just build it from source.

Am Dienstag, 10. Oktober 2017 11:42:37 UTC+2 schrieb Ulf Adams:
That link doesn't work. This link does: http://bootstrappable.org/

Thanks, the "for distros" part on http://bootstrappable.org/best-practises.html is what I mean:
"It should be clear where the binary came from and how it was produced."
"Users can reproduce the binary to verify that it has not been tampered with."

This is currently true for the binaries that bazel produces, but not so much for the ones that are being used in the build process.

As far as I understand the environment at Google, this seems to be only a small problem there, as they have the ability to quickly update build worker nodes to a single current/new compiler version. It is probably enough there to just assume that the local one is always the best one to use. Rebuilding the toolchain (or even just asking the cache for the current toolchain binaries) probably has more overhead than just supplying build nodes with a slightly larger base image.

We do not update the build workers, but upload all the necessary files instead. However, you cannot build gcc without a C compiler (or clang without a C++ compiler), so you do need existing binaries at some point. Also, most Linux tools require installation to fixed absolute paths, which we can't guarantee at this time (and Bazel cannot handle represent such a requirement right now), and they read a bunch of (undeclared) files, and have dynamic library dependencies (also at fixed absolute paths). For the tools we use at Google, we have manually patched all of them to support execution from relative paths, and manually declared all the necessary dependencies.

I would expect binaries used in toolchains to be statically linked, maybe even like some multi-tool binaries like busybox. In Docker terms,  a "FROM scratch" container + build worker binary should be enough. It would mean larger binaries and some downsides of static linking, but on the other hand Bazel would anyways know exactly which binaries are necessary at all, so no need to for example upload "git" unless it is needed for a build step.

As far as Bazel is concerned, we want to be able to represent 'toolchains' in Bazel as an explicit first-class concept, making it possible to select the right toolchain for the host / target platform combination (supporting cross-compilation), as well as allowing toolchains to be bootstrapped from source, used from a local machine, or from a Docker image. That'll include setting best practices for rule developers. Note that the C++ and Java rules in Bazel already mostly follow this approach, so you can switch between different toolchains, and even bootstrap toolchains to some extent.

Where can I find out more about the "allowing toolchains to be bootstrapped from source" part? Where are the BUILD files to compile GCC and CLANG or at least descriptions about how I would go about boostrapping? Currently it seems to me that I might first need to e.g. bootstrap a compiler like GCC and drop that one into a container, then inside that container build the next layer (e.g. build Python from the cPython project) and so on (using the output from the Python compilation create some py_binaries). Is this how nested WORKSPACEs are supposed to work?

Thanks for the answers,
Markus

Damien Martin-Guillerez

unread,
Oct 10, 2017, 7:23:42 AM10/10/17
to Markus Teufelberger, bazel-discuss
On Tue, Oct 10, 2017 at 1:14 PM Markus Teufelberger <markusteu...@gmail.com> wrote:
Am Dienstag, 10. Oktober 2017 11:08:48 UTC+2 schrieb Damien Martin-Guillerez:
Hi Markus,

IIUC the main problem you are facing is that repository rule cannot depends on artifact provided by Bazel?

Yes, let's say I want to build something using the result from a previous build step (e.g. I want to build my web app using NodeJS which requires building the V8 engine with GCC and I want to build my own GCC myself before that). It could work by having an "install" step between different layers of tooling ("From now on, don't use the system tool, but the one supplied by BUILD rule X").

https://github.com/bazelbuild/bazel/blob/master/tools/build_defs/repo/git.bzl for example assumes some hopefully recent version of "git" (amongst other things) being available in the local $PATH. I'd like to be able to select which exact release from https://github.com/git/git/releases is used as tool in this rule and if it is not available, just build it from source.

Unfortunately this is not yet possible and we don't see that happening before 1.0 because it is a hard problem (having a load phase function depends on an execution phase function). It should be possible but it will have a lot of hairy parts.
 

Am Dienstag, 10. Oktober 2017 11:42:37 UTC+2 schrieb Ulf Adams:
That link doesn't work. This link does: http://bootstrappable.org/

Thanks, the "for distros" part on http://bootstrappable.org/best-practises.html is what I mean:
"It should be clear where the binary came from and how it was produced."
"Users can reproduce the binary to verify that it has not been tampered with."

This is currently true for the binaries that bazel produces, but not so much for the ones that are being used in the build process.

In the build processes you can depends on other build binary. The problem is in the fetch phase, where you get dependencies.
 

As far as I understand the environment at Google, this seems to be only a small problem there, as they have the ability to quickly update build worker nodes to a single current/new compiler version. It is probably enough there to just assume that the local one is always the best one to use. Rebuilding the toolchain (or even just asking the cache for the current toolchain binaries) probably has more overhead than just supplying build nodes with a slightly larger base image.

We do not update the build workers, but upload all the necessary files instead. However, you cannot build gcc without a C compiler (or clang without a C++ compiler), so you do need existing binaries at some point. Also, most Linux tools require installation to fixed absolute paths, which we can't guarantee at this time (and Bazel cannot handle represent such a requirement right now), and they read a bunch of (undeclared) files, and have dynamic library dependencies (also at fixed absolute paths). For the tools we use at Google, we have manually patched all of them to support execution from relative paths, and manually declared all the necessary dependencies.

I would expect binaries used in toolchains to be statically linked, maybe even like some multi-tool binaries like busybox. In Docker terms,  a "FROM scratch" container + build worker binary should be enough. It would mean larger binaries and some downsides of static linking, but on the other hand Bazel would anyways know exactly which binaries are necessary at all, so no need to for example upload "git" unless it is needed for a build step.

As far as Bazel is concerned, we want to be able to represent 'toolchains' in Bazel as an explicit first-class concept, making it possible to select the right toolchain for the host / target platform combination (supporting cross-compilation), as well as allowing toolchains to be bootstrapped from source, used from a local machine, or from a Docker image. That'll include setting best practices for rule developers. Note that the C++ and Java rules in Bazel already mostly follow this approach, so you can switch between different toolchains, and even bootstrap toolchains to some extent.

Where can I find out more about the "allowing toolchains to be bootstrapped from source" part? Where are the BUILD files to compile GCC and CLANG or at least descriptions about how I would go about boostrapping?

I think Ulf is talking about a vision following our toolchain plan. Basically we specify which toolchain / platform you need to build and to execute a binary and with remote execution we can just execute the command at the good place.

 
Currently it seems to me that I might first need to e.g. bootstrap a compiler like GCC and drop that one into a container, then inside that container build the next layer (e.g. build Python from the cPython project) and so on (using the output from the Python compilation create some py_binaries). Is this how nested WORKSPACEs are supposed to work?

Nope nested workspace still cannot depends on execution phase of the build.
 

Thanks for the answers,
Markus

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/5ff0ca3b-571b-4060-a0dc-a0225ffe150a%40googlegroups.com.

Markus Teufelberger

unread,
Oct 14, 2017, 12:01:26 PM10/14/17
to bazel-discuss


Am Dienstag, 10. Oktober 2017 13:23:42 UTC+2 schrieb Damien Martin-Guillerez:


On Tue, Oct 10, 2017 at 1:14 PM Markus Teufelberger <markusteu...@gmail.com> wrote:
Am Dienstag, 10. Oktober 2017 11:08:48 UTC+2 schrieb Damien Martin-Guillerez:
Hi Markus,

IIUC the main problem you are facing is that repository rule cannot depends on artifact provided by Bazel?

Yes, let's say I want to build something using the result from a previous build step (e.g. I want to build my web app using NodeJS which requires building the V8 engine with GCC and I want to build my own GCC myself before that). It could work by having an "install" step between different layers of tooling ("From now on, don't use the system tool, but the one supplied by BUILD rule X").

https://github.com/bazelbuild/bazel/blob/master/tools/build_defs/repo/git.bzl for example assumes some hopefully recent version of "git" (amongst other things) being available in the local $PATH. I'd like to be able to select which exact release from https://github.com/git/git/releases is used as tool in this rule and if it is not available, just build it from source.

Unfortunately this is not yet possible and we don't see that happening before 1.0 because it is a hard problem (having a load phase function depends on an execution phase function). It should be possible but it will have a lot of hairy parts.

I agree, it is definitely not an easy problem, I was jsut wondering if it is possible already or at least something that is under consideration.
 
Am Dienstag, 10. Oktober 2017 11:42:37 UTC+2 schrieb Ulf Adams:
That link doesn't work. This link does: http://bootstrappable.org/

Thanks, the "for distros" part on http://bootstrappable.org/best-practises.html is what I mean:
"It should be clear where the binary came from and how it was produced."
"Users can reproduce the binary to verify that it has not been tampered with."

This is currently true for the binaries that bazel produces, but not so much for the ones that are being used in the build process.

In the build processes you can depends on other build binary. The problem is in the fetch phase, where you get dependencies.

Another way to circumvent this might be implementing "convergence builds" (name made up by me):

Let's say you have several "layers" of tools: The system compiler builds a GCC binary which builds a cPython binary which creates a py_binary output.

Instead of just running "bazel build ..." once, you run "bazel converge ..." which does the following:

* It looks for an existing outputs.manifest mapping of build targets and their hash values.
* Then it runs "bazel build ..." and compares the build targets with their predefined hashes.
* If they match up, you're done.
* If they don't (or if the mapping didn't exist yet), create a closed environment (e.g. a Docker container), put all build targets in there (you also should make sure that you have defined build targets for bazel in that case btw.), add a outputs.manifest file in there and run "bazel converge ..." in there.

Probably it would also be useful to add a maximum recursion depth if implementing this in practice, since there could be a chance of cycles due to side effects (using compiler X the interpreter Y produces script Z which in turn causes compiler X to be built differently etc.).

Maybe it would even be enough to replace any file on the build host that has the same file name as a build target with that build target's contents, but then there might be issues with services etc. The downside of starting a completely new environment is the startup cost for bazel, the advantage would be that any tool that is not a build target is kept available and unspecified while a full containerization first would require a LOT of work to build basic stuff like bash, coreutils, a JVM... - in general all the tools that your Bazel rules implicitly or explicitly require during a build.

That way a change in the uppermost layer of your toolchain still would only cause a single actual build (the second call to all build actions would be already cached), but changing your base compiler still would only succeed if all your upper layers also build properly. Otherwise you have the risk that in one commit you change your compiler, then some time later it actually gets used to compile your interpreter and then later the output of your interpreter is messed up - because of the compiler change, not because of whatever unrelated code it is now being processed.
I wonder how Google prevents this from happening internally: If you patch clang/gcc/whatever, how do you make sure that this doesn't break a unit test in a JavaScript library because NodeJS now builds differently or how (fast) do you find out this did happen?
 
Currently it seems to me that I might first need to e.g. bootstrap a compiler like GCC and drop that one into a container, then inside that container build the next layer (e.g. build Python from the cPython project) and so on (using the output from the Python compilation create some py_binaries). Is this how nested WORKSPACEs are supposed to work?

Nope nested workspace still cannot depends on execution phase of the build.

:-(
Makes sense though, if load - execute - load is not possible.
Reply all
Reply to author
Forward
0 new messages