Remote Asset API intention and capabilities

353 views
Skip to first unread message

George Gensure

unread,
Dec 18, 2020, 6:24:18 PM12/18/20
to remote-exe...@googlegroups.com
The Remote Asset API has existed for some time now, without a clear client consumer that I'm aware of to point at and say "make it work to this thing's specification."

Sander has spoken to me briefly (bazelcon 2019) about far reaching goals, and I thought we could use some well-aired discussion around what this system might be capable of, a candidate client for demonstrating its features, and where we want to go with the API eventually.

To kick things off, the bazel client implementation mostly uses this as a convenient repository cache, which is the only use (that I've found) within its code. It uses the fetchBlob method exclusively from the Fetch service, allowing an implementing service (buildfarm has a PR out for this now) to provide read-through caching of repository content.

What else should the implementor be doing with this service? Should we attempt to extract archives, retain things in CAS? Are there any real implementations of the ttl behavior? What else should bazel or other clients be using this API for?

Hopefully we can have some of this discussion here result in actionable tasks on the API for documentation's sake.

Happy Holidays to all,
-George Gensure

John Millikin

unread,
Dec 18, 2020, 6:35:40 PM12/18/20
to George Gensure, remote-exe...@googlegroups.com
A bit of background context: the original proposal for this API was called Remote Repository Cache, and its only method was FetchBlob. This is also the only functionality I implemented within Bazel -- it is unclear whether the remaining methods should (or can) be used from Bazel.

At Stripe we use a remote downloader to enforce security properties of downloaded resources, for example requiring them to specify a checksum and redirecting fetches through an Artifactory instance.

Peter Ebden

unread,
Dec 21, 2020, 6:25:27 AM12/21/20
to John Millikin, George Gensure, Remote Execution APIs Working Group
We use it just for downloading remote files (so FetchBlob is the only RPC we use as well). That is mainly to avoid trying to turn them into curl invocations or similar, but also has advantages to enforce checksums (and we've thought about whitelisting potential destinations).

We have a couple of other theoretical use cases (e.g. retrieving part or all of a Git repo) but it seems unlikely we'd be able to extract the specific invocations from the higher-level command that's being run (`go get` etc).

On Fri, 18 Dec 2020 at 23:35, 'John Millikin' via Remote Execution APIs Working Group <remote-exe...@googlegroups.com> wrote:
A bit of background context: the original proposal for this API was called Remote Repository Cache, and its only method was FetchBlob. This is also the only functionality I implemented within Bazel -- it is unclear whether the remaining methods should (or can) be used from Bazel.

At Stripe we use a remote downloader to enforce security properties of downloaded resources, for example requiring them to specify a checksum and redirecting fetches through an Artifactory instance.

--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/CAJmdR_pjEuCxf3GM8Ewub13LZ9uD%3D8WGN0QsBGe0eVtHLYx6Ag%40mail.gmail.com.

Sander Striker

unread,
Dec 22, 2020, 8:59:25 AM12/22/20
to George Gensure, Remote Execution APIs Working Group
Hi,


On Sat, Dec 19, 2020 at 12:24 AM George Gensure <wer...@gmail.com> wrote:
The Remote Asset API has existed for some time now, without a clear client consumer that I'm aware of to point at and say "make it work to this thing's specification."

Sander has spoken to me briefly (bazelcon 2019) about far reaching goals, and I thought we could use some well-aired discussion around what this system might be capable of, a candidate client for demonstrating its features, and where we want to go with the API eventually.

John Millikin pointed out Remote Repository Cache in this thread.  Related, Eric Burnett wrote down his Build tool “Storage backend” vision, and presented One minute Presubmits at the London Build Meetup in October 2019.  This led to iteration on the Remote Asset API proposal which lists example client use.
 
To kick things off, the bazel client implementation mostly uses this as a convenient repository cache, which is the only use (that I've found) within its code. It uses the fetchBlob method exclusively from the Fetch service, allowing an implementing service (buildfarm has a PR out for this now) to provide read-through caching of repository content.

What else should the implementor be doing with this service? Should we attempt to extract archives, retain things in CAS? Are there any real implementations of the ttl behavior? What else should bazel or other clients be using this API for?

I'm happy to respond in a more expanded fashion, should the example client use above require more explanation.
 
Hopefully we can have some of this discussion here result in actionable tasks on the API for documentation's sake.

Cheers,

Sander
 
Happy Holidays to all,
-George Gensure

--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.

Eric Burnett

unread,
Jan 5, 2021, 9:57:55 AM1/5/21
to Sander Striker, George Gensure, Remote Execution APIs Working Group
Just catching up post-break (Happy New Year everyone!) -

Sander already linked docs summarizing my vision (tl;dr: getting all heavy lifting moved off-host and cacheable), so I won't repeat it here for brevity, but am happy to answer questions or discuss further.

The other thing I'll add is that forward progress is subject to it being someone's most important problem to work on. Speaking for RBE, in the last few quarters we've been more focussed on migrating workloads to bazel, or bringing remote execution to their existing tooling, and less on advancing the state of performance for the full adopters. Our long-term vision hasn't really changed, we've just not yet reached the point where intermediate downloads are the most pressing bottleneck to prioritize for our customers, so we haven't gotten to doing much with this API yet. (In a similar state are "relaxing requirements for canonical merkle trees" and "further virtualizing the input/output filesystems". Though happily, Ed is driving progress on the latter).

In a similar vein, do you have a problem in mind motivating you? Or is this just a concern that this API doesn't have one.

mario...@gmail.com

unread,
Jan 15, 2021, 4:50:08 PM1/15/21
to Remote Execution APIs Working Group
Hello, and happy new year, also catching up with this thread!

Thanks for starting this discussion, George.

Adding one of our use cases to the list: to provide tools an interface to pick up the latest base image digests used for our hermetic builds.

Currently, our base images are tarballs as we use buildbox-run-userchroot for execution (similar to docker images).
We wrote scripts to extract and upload the contents of our tarballs to CAS and write the root digest of the images in a config file.
Clients receive the config file containing the root digest of the image, and a wrapper script that invokes recc reads the config file to inject the build image root digest as a platform property into the Action.
The build image is of course directly related to the outcome of the Action and has to be deterministic for cache hits and also for re-builds when a new image is used.

Using the Asset API, we will have the ability to resolve "linux-build-image:latest" to the relevant digest in a more standardized and native to the tools way, so we won't need all the bespoke scripts we use today.

This will also save us from having to ship an updated config for the new images to be picked up by users.

We can also use the same functionality on our worker hosts to prime their local cache with some common images, as opposed to only downloading the image when it is used.

Another benefit of that is that we can also utilize the API to extract the tarballs and upload our images directly, providing a frictionless user experience there as well.
Reply all
Reply to author
Forward
0 new messages