--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/e42d118d-1b7b-43a5-93d7-397a673b4579%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Markus,IIUC the main problem you are facing is that repository rule cannot depends on artifact provided by Bazel?FWIW at Google we check in binary version of some compiler / build tools but most of the time we just add them as dependency. We do not fetch anything from outside so we do not have the problem of workspace rule needing some tool. There is however a very tiny set of tools installed on the workers and that list have became shorter and shorter over time (moving them to the repo).Also you might be interested in https://bootstrappable.org. It is outside of Bazel but they try to address the same problem.
On Mon, Oct 9, 2017 at 2:35 PM Marcel Hlopko <hlo...@google.com> wrote:On Mon, Oct 2, 2017 at 4:44 PM Markus Teufelberger <markusteu...@gmail.com> wrote:Hi all,As far as I understand it, Bazel solves the problem of building stuff on a very high level by letting one define (using the BUILD language) a set of inputs, a set of programs that do some transformations of these inputs and then a set of expected outputs.The definition of inputs and outputs part is usually done via code + BUILD files and (for external code) via referring to other stuff via the WORKSPACE file.What I'm not so clear about is the following:I'd like to also specify exactly which tools are being used to transform the inputs to generate the outputs (e.g. which compiler is being used) and furthermore, I'd like to make sure that this is also done in a deterministic way. If I'm building something, ideally the only thing that should be necessary would be a small tool to interpret the gRPC call for the current action that can interact with the build cache etc., some resources (CPU, RAM, disk space, network) and a running OS kernel. Something like a Docker container "FROM scratch" and only https://github.com/bazelbuild/bazel-buildfarm added in there.Currently it seems to me as if Bazel still "only" sandboxes the code, not the compiler(s). Some rule sets seem to already go into this direction - the rules for golang allow the user selecting a compiler version (https://github.com/bazelbuild/rules_go/blob/master/go/private/repositories.bzl) for example and don't just use whatever is installed locally.BUT:There is a reason why the golang binaries are downloaded as binary packages instead of (optionally) referring to a commit on https://go.googlesource.com/go or something similar and building from source:At the moment it is not possible to build binaries before they are referenced in a WORKSPACE file. This leads to situations like https://github.com/GoogleCloudPlatform/distroless/blob/master/package_manager/package_manager.bzl, where a tool which is used as a helper for some custom WORKSPACE rules has to be fetched as a binary from a remote source even though the full source code for the tool and a BUILD file to build it is available in the same folder.I'd like to propose yet another file ("BOOTSTRAP"?) which contains a build process for any tool that WORKSPACE and BUILD files later might require. In that file every binary that is explicitly or implicitly used during a bazel build/test/whatever command must be referred to. The base case would be to just say for every binary that the one that is already installed should be used. In cases like the helper script for distroless it would contain a WORKSPACE-rule like syntax to allow building the helper tool and making it available to the actual WORKSPACE that will be running afterwards. In the golang case it could contain a reference to either the binary distribution of go, or a way to build the whole thing from source (if someone wants to check against a bleeding edge version for example).Having such a file would also make it easier to know which binaries actually need to be supplied when setting up a build server/container and which ones could be left out. As far as I understand, CROSSTOOL files should already be something in that direction, maybe it would be possible to extend them to be able to contain references to tools that still need to be built?Depending on the implementation of recursive WORKSPACEs, this might even not need a separate file or a modification of CROSSTOOL, maybe just first building a "tools" WORKSPACE and then, after that the "myprogram" WORKSPACE could be already a pattern to achieve this. It seems that at the moment recursive WORKSPACEs are on hold though.The advantages of also locally "building the builders" would include the ability to quickly change compiler versions, making it easier to get the exact same binary later ("what was the compiler version we used on our build server(s) 5 months ago?") and shipping fewer dependencies on build nodes (as I wrote above: Ideally only a running OS kernel and a program to handle the gRPC call should be enough). It would also make patterns like the golang one or the one in distroless easier, since there is no awkward "let someone else build and upload a binary somewhere, then reference that" step. The extreme case would be a source-only repository/toolchain - after the initial bootstrap build which still needs some locally installed tools, as long as the cache is up no worker node ever will actually need any binary aside from the gRPC handler installed to build anything inside Bazel.As far as I understand the environment at Google, this seems to be only a small problem there, as they have the ability to quickly update build worker nodes to a single current/new compiler version. It is probably enough there to just assume that the local one is always the best one to use. Rebuilding the toolchain (or even just asking the cache for the current toolchain binaries) probably has more overhead than just supplying build nodes with a slightly larger base image.
The "Trusting Trust" part refers to the idea from Ken Thompson in 1984 that it would be possible to trust the output of a compiler binary more (a malicious compiler might insert some unwanted code) if it is built in a 3-stage way: First, compile it using one or several different local compilers to produce the binary c0 (different compilers will produce different versions of c0). Then compile the compiler again using c0, producing c1 (these must now be bit-identical if no local compiler did insert code into c0 that would modify c0's behavior). To be extra sure, you can even compile the compiler again using c1 to produce c2, which then must be identical to c1.As the initial toolchain compilation would occur relatively rarely anyways (as soon as you know that c1 is good, you can compile further compilers with c1), it might be an interesting exercise to publish such a build process for as many locally installed compilers as possible. This is a one-time cost, but it would help every Bazel user to have more confidence in the tools they are using.
--Unfortunately I won't be at the Bazel Summit, but maybe this is something that others find interesting too and want to discuss here or there. Of course it might also be the case that I'm just misunderstanding something and refering to tools' source code instead of binaries in the WORKSPACE is already possible. In that case it is definitely not obvious at least to me and I'd love to see (and even write) some better documentation for this.Looking forward to your ideas on the subject,MarkusTl;dr:It would be nice to be able to have a "build the build tools" step before WORKSPACE rules are called - either by having a separate "BOOTSTRAP" file or nested/recursive WORKSPACEs.This would enable even better determinism of the generated binaries and also allow for very minimal build worker nodes.
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/e42d118d-1b7b-43a5-93d7-397a673b4579%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Markus,IIUC the main problem you are facing is that repository rule cannot depends on artifact provided by Bazel?
That link doesn't work. This link does: http://bootstrappable.org/
As far as I understand the environment at Google, this seems to be only a small problem there, as they have the ability to quickly update build worker nodes to a single current/new compiler version. It is probably enough there to just assume that the local one is always the best one to use. Rebuilding the toolchain (or even just asking the cache for the current toolchain binaries) probably has more overhead than just supplying build nodes with a slightly larger base image.We do not update the build workers, but upload all the necessary files instead. However, you cannot build gcc without a C compiler (or clang without a C++ compiler), so you do need existing binaries at some point. Also, most Linux tools require installation to fixed absolute paths, which we can't guarantee at this time (and Bazel cannot handle represent such a requirement right now), and they read a bunch of (undeclared) files, and have dynamic library dependencies (also at fixed absolute paths). For the tools we use at Google, we have manually patched all of them to support execution from relative paths, and manually declared all the necessary dependencies.
As far as Bazel is concerned, we want to be able to represent 'toolchains' in Bazel as an explicit first-class concept, making it possible to select the right toolchain for the host / target platform combination (supporting cross-compilation), as well as allowing toolchains to be bootstrapped from source, used from a local machine, or from a Docker image. That'll include setting best practices for rule developers. Note that the C++ and Java rules in Bazel already mostly follow this approach, so you can switch between different toolchains, and even bootstrap toolchains to some extent.
Am Dienstag, 10. Oktober 2017 11:08:48 UTC+2 schrieb Damien Martin-Guillerez:Hi Markus,IIUC the main problem you are facing is that repository rule cannot depends on artifact provided by Bazel?Yes, let's say I want to build something using the result from a previous build step (e.g. I want to build my web app using NodeJS which requires building the V8 engine with GCC and I want to build my own GCC myself before that). It could work by having an "install" step between different layers of tooling ("From now on, don't use the system tool, but the one supplied by BUILD rule X").https://github.com/bazelbuild/bazel/blob/master/tools/build_defs/repo/git.bzl for example assumes some hopefully recent version of "git" (amongst other things) being available in the local $PATH. I'd like to be able to select which exact release from https://github.com/git/git/releases is used as tool in this rule and if it is not available, just build it from source.
Am Dienstag, 10. Oktober 2017 11:42:37 UTC+2 schrieb Ulf Adams:That link doesn't work. This link does: http://bootstrappable.org/Thanks, the "for distros" part on http://bootstrappable.org/best-practises.html is what I mean:"It should be clear where the binary came from and how it was produced.""Users can reproduce the binary to verify that it has not been tampered with."This is currently true for the binaries that bazel produces, but not so much for the ones that are being used in the build process.
As far as I understand the environment at Google, this seems to be only a small problem there, as they have the ability to quickly update build worker nodes to a single current/new compiler version. It is probably enough there to just assume that the local one is always the best one to use. Rebuilding the toolchain (or even just asking the cache for the current toolchain binaries) probably has more overhead than just supplying build nodes with a slightly larger base image.We do not update the build workers, but upload all the necessary files instead. However, you cannot build gcc without a C compiler (or clang without a C++ compiler), so you do need existing binaries at some point. Also, most Linux tools require installation to fixed absolute paths, which we can't guarantee at this time (and Bazel cannot handle represent such a requirement right now), and they read a bunch of (undeclared) files, and have dynamic library dependencies (also at fixed absolute paths). For the tools we use at Google, we have manually patched all of them to support execution from relative paths, and manually declared all the necessary dependencies.I would expect binaries used in toolchains to be statically linked, maybe even like some multi-tool binaries like busybox. In Docker terms, a "FROM scratch" container + build worker binary should be enough. It would mean larger binaries and some downsides of static linking, but on the other hand Bazel would anyways know exactly which binaries are necessary at all, so no need to for example upload "git" unless it is needed for a build step.As far as Bazel is concerned, we want to be able to represent 'toolchains' in Bazel as an explicit first-class concept, making it possible to select the right toolchain for the host / target platform combination (supporting cross-compilation), as well as allowing toolchains to be bootstrapped from source, used from a local machine, or from a Docker image. That'll include setting best practices for rule developers. Note that the C++ and Java rules in Bazel already mostly follow this approach, so you can switch between different toolchains, and even bootstrap toolchains to some extent.Where can I find out more about the "allowing toolchains to be bootstrapped from source" part? Where are the BUILD files to compile GCC and CLANG or at least descriptions about how I would go about boostrapping?
Currently it seems to me that I might first need to e.g. bootstrap a compiler like GCC and drop that one into a container, then inside that container build the next layer (e.g. build Python from the cPython project) and so on (using the output from the Python compilation create some py_binaries). Is this how nested WORKSPACEs are supposed to work?
Thanks for the answers,Markus
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/5ff0ca3b-571b-4060-a0dc-a0225ffe150a%40googlegroups.com.
On Tue, Oct 10, 2017 at 1:14 PM Markus Teufelberger <markusteu...@gmail.com> wrote:Am Dienstag, 10. Oktober 2017 11:08:48 UTC+2 schrieb Damien Martin-Guillerez:Hi Markus,IIUC the main problem you are facing is that repository rule cannot depends on artifact provided by Bazel?Yes, let's say I want to build something using the result from a previous build step (e.g. I want to build my web app using NodeJS which requires building the V8 engine with GCC and I want to build my own GCC myself before that). It could work by having an "install" step between different layers of tooling ("From now on, don't use the system tool, but the one supplied by BUILD rule X").https://github.com/bazelbuild/bazel/blob/master/tools/build_defs/repo/git.bzl for example assumes some hopefully recent version of "git" (amongst other things) being available in the local $PATH. I'd like to be able to select which exact release from https://github.com/git/git/releases is used as tool in this rule and if it is not available, just build it from source.Unfortunately this is not yet possible and we don't see that happening before 1.0 because it is a hard problem (having a load phase function depends on an execution phase function). It should be possible but it will have a lot of hairy parts.
Am Dienstag, 10. Oktober 2017 11:42:37 UTC+2 schrieb Ulf Adams:That link doesn't work. This link does: http://bootstrappable.org/Thanks, the "for distros" part on http://bootstrappable.org/best-practises.html is what I mean:"It should be clear where the binary came from and how it was produced.""Users can reproduce the binary to verify that it has not been tampered with."This is currently true for the binaries that bazel produces, but not so much for the ones that are being used in the build process.In the build processes you can depends on other build binary. The problem is in the fetch phase, where you get dependencies.
Currently it seems to me that I might first need to e.g. bootstrap a compiler like GCC and drop that one into a container, then inside that container build the next layer (e.g. build Python from the cPython project) and so on (using the output from the Python compilation create some py_binaries). Is this how nested WORKSPACEs are supposed to work?Nope nested workspace still cannot depends on execution phase of the build.