Failed attemptsTo see whether we can get the best of both worlds (fast builds, while still getting access to all outputs), I've been experimenting with letting bazel-out/ be backed by a virtual file system (FUSE). Below are three solutions I've worked on over the last couple of months that I think we should NOT pursue:1. A FUSE file system that's directly integrated into Bazel. I eventually abandoned this approach for a couple of reasons. First and foremost, integrating FUSE into Bazel makes it a lot harder to run Bazel in unprivileged environments (Docker/Kubernetes containers). Lots of people tend to do this, e.g. for CI. Second, I remember reading a "Vision on Bazel storage" document about a year ago that stated that adding a direct dependency on FUSE was undesirable. Furthermore, FUSE is not cross platform. For systems like macOS it may be smarter to run a user space NFS/9p server, as that doesn't require kernel extensions.
2. A separate FUSE daemon that exposes the entire CAS under a single directory, where files can be accessed under the name "${hash}-${sizeBytes}". Bazel would then no longer download files from the CAS, but simply emit symbolic links pointing into the FUSE mount. PRs: #1 #2. This approach eventually worked okayish from the Bazel side, but tends to confuse build actions that call realpath() on input files. This effectively breaks dynamic linkage with rpath, Python module loading, etc.
3. A FUSE daemon like the one above, that in addition to a CAS directory also offers a tmpfs-like scratch space directory. By storing the bazel-out/ directory inside scratch space directory, Bazel can emit hardlinks. This keeps build actions happy. The downside of this approach is that it's relatively slow due to the high number of FUSE operations. Operations that need to modify many files, such as runfile links directory creation and 'bazel clean' take ages.
Successful attemptAfter three unsuccessful attempts, I've now ended up with a solution that I think works well. Namely, I have created a daemon that runs both a FUSE file system and a gRPC server. Initially the FUSE mount is empty and immutable. Every time Bazel starts a new build, it notifies the FUSE daemon over gRPC to request a new build directory. Basically an mkdir(), except that all sorts of additional metadata is exchanged. Bazel then uses this FUSE-backed directory to store all of its outputs. Every time Bazel needs to do something that is inefficient to do over FUSE, it calls into the daemon over gRPC. Examples of such operations include:- Batch create hundreds of symlinks when creating .runfiles directories,- Creating lazy-loading CAS-backed files and directories, based on digests stored in ActionResult messages,- Computing digests of files. The daemon can return these instantly for lazy-loading CAS-backed files.- ...Once the build is finished, Bazel performs a final gRPC call against the daemon to finalize the results. My implementation doesn't do anything fancy with that right now, but it could use that occasion to snapshot/archive the output directory. That would allow a user to time travel between bazel-out/ directories and compare their results.The changes I made to Bazel can be found in a branch in my fork on GitHub. The most notable change is the addition of GrpcRemoteOutputService.The gRPC protocol schemaThe schema that my copy of Bazel uses to communicate with my FUSE daemon can for the time being be found in my fork of Bazel. As you can see, it's a relatively simple protocol. It depends on REv2, but not on anything specific to Bazel. My suspicion is that this protocol could also be used by other build clients, such as Pants. I'd love to hear from the maintainers of such tools whether they agree. Because of that, I would like to see if we could upstream this into the remote-apis repository, as opposed to keeping it in the Bazel tree. Thoughts on this?During the last remote execution monthly meeting Ed Baunton and/or Sander Striker asked how this protocol is different from BuildGrid's local_cas protocol. The answer to that is simple: there doesn't seem to be any overlap whatsoever. The local_cas protocol is oblivious of builds; there is no scoping/context. Furthermore, it can only be used to stage trees based on Directory objects stored in the CAS, while we need to stage files, symlinks and Tree objects (contained in OutputFile, OutputSymlink and OutputDirectory messages). In summary. local_cas protocol seems to be designed for use on workers, while the remote output service protocol that I've designed is for use on clients.My plan for now is to continue implementing this. Eventually I will send out PRs against Bazel to add support for this protocol. Furthermore, I will release the source code of my FUSE daemon (which is largely built on top of Buildbarn's frameworks) at github.com/buildbarn/bb-clientd.
--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CABh_MKkkFeHtHJ12%3DghqReXh1MXdB11ir4s2mNQgGBbRZx1a3A%40mail.gmail.com.
Interesting! For now my plan was indeed to stick to the same layout as
Bazel, even though hardlinks would be better in my case (more compact
to store). Still, it's worth investigating how runfiles in general may
be optimized going forward. Right now I observe that runfiles links
creation basically causes a quadratic explosion (number of files in a
common dependency * number of tests). Maybe we can eventually figure
out ways to bring this closer to linear...
> - What happens if you do a clean build and then want to do an incremental build a few days later, or want to access some remote outputs a few days later? Will the remote outputs still be there to lazily download? Do you need an API to tell the remote cache to persist files / keep files alive while the local workspace exists?
I haven't implemented this part yet, but my idea here is to let
BatchStat() do FindMissingBlobs() calls under the hood (and
temporarily memoize the results). This means that the file system will
automatically start to hide files that have gone absent remotely from
Bazel's point of view.
--
Ed Schouten <e...@nuxi.nl>
Hello everyone,TL;DR: I want to add a new protocol to the remote-apis repository that Bazel can use to talk to a FUSE helper daemon. Any feedback, objections, ...?As some of us have experienced, Bazel remote execution may generate lots of network traffic. The "Builds without the Bytes" effort has addressed this, but comes with the downside that outputs of builds can no longer be accessed. In fact, I don't think you even get insight in which outputs the build would have yielded. For typical software development workflows, this solution is impractical. You often want to access build artifacts without really knowing which ones you're going to access up front.Failed attemptsTo see whether we can get the best of both worlds (fast builds, while still getting access to all outputs), I've been experimenting with letting bazel-out/ be backed by a virtual file system (FUSE). Below are three solutions I've worked on over the last couple of months that I think we should NOT pursue:1. A FUSE file system that's directly integrated into Bazel. I eventually abandoned this approach for a couple of reasons. First and foremost, integrating FUSE into Bazel makes it a lot harder to run Bazel in unprivileged environments (Docker/Kubernetes containers). Lots of people tend to do this, e.g. for CI. Second, I remember reading a "Vision on Bazel storage" document about a year ago that stated that adding a direct dependency on FUSE was undesirable. Furthermore, FUSE is not cross platform. For systems like macOS it may be smarter to run a user space NFS/9p server, as that doesn't require kernel extensions.2. A separate FUSE daemon that exposes the entire CAS under a single directory, where files can be accessed under the name "${hash}-${sizeBytes}". Bazel would then no longer download files from the CAS, but simply emit symbolic links pointing into the FUSE mount. PRs: #1 #2. This approach eventually worked okayish from the Bazel side, but tends to confuse build actions that call realpath() on input files. This effectively breaks dynamic linkage with rpath, Python module loading, etc.3. A FUSE daemon like the one above, that in addition to a CAS directory also offers a tmpfs-like scratch space directory. By storing the bazel-out/ directory inside scratch space directory, Bazel can emit hardlinks. This keeps build actions happy. The downside of this approach is that it's relatively slow due to the high number of FUSE operations. Operations that need to modify many files, such as runfile links directory creation and 'bazel clean' take ages.Successful attemptAfter three unsuccessful attempts, I've now ended up with a solution that I think works well. Namely, I have created a daemon that runs both a FUSE file system and a gRPC server. Initially the FUSE mount is empty and immutable. Every time Bazel starts a new build, it notifies the FUSE daemon over gRPC to request a new build directory. Basically an mkdir(), except that all sorts of additional metadata is exchanged. Bazel then uses this FUSE-backed directory to store all of its outputs. Every time Bazel needs to do something that is inefficient to do over FUSE, it calls into the daemon over gRPC. Examples of such operations include:- Batch create hundreds of symlinks when creating .runfiles directories,- Creating lazy-loading CAS-backed files and directories, based on digests stored in ActionResult messages,- Computing digests of files. The daemon can return these instantly for lazy-loading CAS-backed files.- ...Once the build is finished, Bazel performs a final gRPC call against the daemon to finalize the results. My implementation doesn't do anything fancy with that right now, but it could use that occasion to snapshot/archive the output directory. That would allow a user to time travel between bazel-out/ directories and compare their results.The changes I made to Bazel can be found in a branch in my fork on GitHub. The most notable change is the addition of GrpcRemoteOutputService.The gRPC protocol schemaThe schema that my copy of Bazel uses to communicate with my FUSE daemon can for the time being be found in my fork of Bazel. As you can see, it's a relatively simple protocol. It depends on REv2, but not on anything specific to Bazel. My suspicion is that this protocol could also be used by other build clients, such as Pants. I'd love to hear from the maintainers of such tools whether they agree. Because of that, I would like to see if we could upstream this into the remote-apis repository, as opposed to keeping it in the Bazel tree. Thoughts on this?During the last remote execution monthly meeting Ed Baunton and/or Sander Striker asked how this protocol is different from BuildGrid's local_cas protocol. The answer to that is simple: there doesn't seem to be any overlap whatsoever.
The local_cas protocol is oblivious of builds; there is no scoping/context. Furthermore, it can only be used to stage trees based on Directory objects stored in the CAS, while we need to stage files, symlinks and Tree objects (contained in OutputFile, OutputSymlink and OutputDirectory messages). In summary. local_cas protocol seems to be designed for use on workers, while the remote output service protocol that I've designed is for use on clients.
My plan for now is to continue implementing this. Eventually I will send out PRs against Bazel to add support for this protocol. Furthermore, I will release the source code of my FUSE daemon (which is largely built on top of Buildbarn's frameworks) at github.com/buildbarn/bb-clientd.
--
The gRPC protocol schemaThe schema that my copy of Bazel uses to communicate with my FUSE daemon can for the time being be found in my fork of Bazel. As you can see, it's a relatively simple protocol. It depends on REv2, but not on anything specific to Bazel. My suspicion is that this protocol could also be used by other build clients, such as Pants. I'd love to hear from the maintainers of such tools whether they agree. Because of that, I would like to see if we could upstream this into the remote-apis repository, as opposed to keeping it in the Bazel tree. Thoughts on this?During the last remote execution monthly meeting Ed Baunton and/or Sander Striker asked how this protocol is different from BuildGrid's local_cas protocol. The answer to that is simple: there doesn't seem to be any overlap whatsoever.
The local_cas protocol is oblivious of builds; there is no scoping/context.
Furthermore, it can only be used to stage trees based on Directory objects stored in the CAS, while we need to stage files, symlinks and Tree objects (contained in OutputFile, OutputSymlink and OutputDirectory messages).